辅导案例-CSCI 540:

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
CSCI 540: Machine Learning
Spring 2020

Project#1: Anatomy of a Model
Total 100 points

Out: 1/14/2020
Due: 1/21/2020@11:59:59pm


Goal

Students will reinforce concepts concerning the key aspects of a machine learning model,
namely, the hypothesis, the learning algorithm, the target function, and sample data. Students
will also familiarize themselves with the MATLAB computing environment.

Tasks

In class, we discussed the anatomy of a learning model as consisting of a target function, data
samples, hypothesis set, and learning algorithm.




The hypothesis set consist of a family of functions taking on a specific form. It is the job of the
learning algorithm to search through the hypothesis set for the hypothesis, h, that best
approximates the target function, f. Upon finding the final best hypothesis, g, the learning
algorithm halts.

Assuming hypothesis set
• line in the plane H1: h(x) = w2 x2 + w1x1 + w0 x0


For H1 we have x0=1 and " ∈ ℝ for = 0,1,2
Each hypothesis in H1 is described by a parameter vector ++⃗ = -., /,01. By this definition,
the kth hypothesis in H1 is described by a point in 3-deminsional parameter space. For example,
parameter vector (1,2,3) would correspond to h(x) = 3x2 +2 x1 + 1 x0. Your hypotheses will be
used to perform classification of a two-class problem using the classifier we discussed in class,
namely ℎ() = (9) which returns +1 or -1 as a prediction.

You are given a data set generator that creates and writes out synthetic data to a comma
separated value (CSV) data file. This data consists of two dimensional features for a two-class
problem, namely the positive class (+1) and the negative class (-1). The data generator is
included on the course blackboard to demonstrate to you how the data is generated. For your
assignment, do not change the parameters of the data generator and do not change the
number of data points. A data file has been provided. The data file was created by the data
generator MATLAB code. The data generator is provided to you for pedantic reasons. Please
note that in general, you typically have no knowledge about the target function and have no
control over the data samples as we have discussed in lecture.

Each data point consists of two features and a class label. In the CSV file, the first column
contains the first feature, the second column contains the second feature, and the third column
contains the third feature. Each datum from the CSV file is represented as a vector ⃗" = (",/, ",0), that is the first feature for the ith vector is ",/ and the second feature for the ith
vector is ",0. Remember that a model is constructed using a dummy variable x0=1.

Using a representation for the parameter vectors for each hypothesis set, devise a scheme for
finding the “best” model for the synthetic data set. This will involve

1. In MATLAB, devising a 2-dimensional parameter space representation for H1.
2. In MATLAB devise a method for initializing the starting hypothesis in H1.
3. In MATLAB devise a method for evaluating or computing the error for a hypothesis in
H1. Error corresponds to the number of data points for which the classifier is incorrect.
4. In MATLAB, using your evaluation method, devise a method for searching through
hypotheses in H1 for the best hypothesis. You will run your search for a fixed maximum
number of iterations.
5. Plot the data set along with the final best hypothesis

Note: You may use any search method of your choosing. You are well served to keep it simple.


Questions: Written as PDF or MS-Word Only

1. Discuss what approach you took to search hypothesis space for H1
2. Does your best hypothesis from H1 change when you re-run your learning algorithm?
Why or why not?


Submission
1. Create a single ZIP archive (no tar, gz, rar, or 7-zip) consisting of your written work and
your MATLAB Code. Your MATLAB code MUST include everything needed to run your
submission including data file(s). Written work must be PDF or MS-Word only. Other
formats not accepted and will receive a zero.
2. Test your submission by unzipping your code and verifying that it runs. Code that does
not run will not be graded and you will receive a zero for it.
3. Submit your assignment using BLACKBOARD only! Email submissions will not be
accepted and you will receive a zero.
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468