程序代写案例-CS6140

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
CS6140: Machine Learning
Homework Assignment # 2
Assigned: 02/16/2021 Due: 03/01/2021, 11:59pm, through Canvas
Three problems, 100 points in total. Good luck!
Prof. Predrag Radivojac, Northeastern University
Problem 1. (20 points) Naive Bayes classifier. Consider a binary classification problem where there are
eight data points in the training set. That is,
D = {(−1,−1,−1,−), (−1,−1, 1,+), (−1, 1,−1,+), (−1, 1, 1,−), (1,−1,−1,+), (1,−1, 1,−), (1, 1,−1,−), (1, 1, 1,+)} ,
where each tuple (x1, x2, x3, y) represents a training example with input vector (x1, x2, x3) and class label y.
a) (10 points) Construct a naive Bayes classifier for this problem and evaluate its accuracy on the training
set. Measure accuracy as the fraction of correctly classified examples.
b) (10 points) Transform the input space into a higher-dimensional space
(x1, x2, x3, x1x2, x1x3, x2x3, x1x2x3, x
2
1, x
2
2, x
2
3, x
2
1x2, x1x
2
2, x1x
2
3, x
2
2x3, x2x
2
3)
and repeat the previous step.
Carry out all steps manually and show all your calculations. Discuss your main observations.
Problem 2. (25 points) Consider a binary classification problem in which we want to determine the optimal
decision surface. A point x is on the decision surface if P (Y = 1|x) = P (Y = 0|x).
a) (10 points) Find the optimal decision surface assuming that each class-conditional distribution is defined
as a two-dimensional Gaussian distribution:
p(x|Y = i) = 1
(2pi)d/2|Σi|1/2 · e
− 12 (x−mi)TΣ−1i (x−mi)
where i ∈ {0, 1}, m0 = (1, 2), m1 = (6, 3), Σ0 = Σ1 = I2, P (Y = 0) = P (Y = 1) = 1/2, Id is the
d-dimensional identity matrix, and |Σi| is the determinant of Σi.
b) (5 points) Generalize the solution from part (a) using m0 = (m01,m02), m1 = (m11,m12), Σ0 = Σ1 =
σ2I2 and P (Y = 0) 6= P (Y = 1).
c) (10 points) Generalize the solution from part (b) to arbitrary covariance matrices Σ0 and Σ1. Discuss
the shape of the optimal decision surface.
Problem 3. (55 points) Consider a multivariate linear regression problem of mapping Rd to R, with two
different objective functions. The first objective function is the sum of squared errors, as presented in class;
i.e.,
∑n
i=1 e
2
i , where ei = w0+
∑d
j=1 wjxij−yi. The second objective function is the sum of square Euclidean
distances to the hyperplane; i.e.,
∑n
i=1 r
2
i , where ri is the Euclidean distance between point (xi, yi) to the
hyperplane f(x) = w0 +
∑d
j=1 wjxj .
1
2 Homework Assignment # 2
a) (10 points) Derive a gradient descent algorithm to find the parameters of the model that minimizes
the sum of squared errors.
b) (20 points) Derive a gradient descent algorithm to find the parameters of the model that minimizes
the sum of squared distances.
c) (20 points) Implement both algorithms and test them on 3 datasets. Datasets can be randomly
generated, as in class, or obtained from resources such as UCI Machine Learning Repository. Compare
the solutions to the closed-form (maximum likelihood) solution derived in class and find the R2 in all
cases on the same dataset used to fit the parameters; i.e., do not implement cross-validation. Briefly
describe the data you use and discuss your results.
d) (5 points) Normalize every feature and target using a linear transform such that the minimum value
for each feature and the target is 0 and the maximum value is 1. The new value for feature j of data
point i can be found as
xnewij =
xij −mink∈{1,2,...,n} xkj
maxk∈{1,2,...,n} xkj −mink∈{1,2,...,n} xkj ,
where n is the dataset size. The new value for the target i can be found as
ynewi =
yi −mink∈{1,2,...,n} yk
maxk∈{1,2,...,n} yk −mink∈{1,2,...,n} yk .
Measure the number of steps towards convergence and compare with the results from part (c). Briefly
discuss your results.
Homework Assignment # 2 3
Directions and Policies
Submit a single package containing all answers, results and code. Your submission package should be
compressed and named firstnamelastname.zip (e.g., predragradivojac.zip). In your package there should
be a single pdf file named main.pdf that will contain answers to all questions, all figures, and all relevant
results. Your solutions and answers must be typed1 and make sure that you type your name and Northeastern
username (email) on top of the first page of the main.pdf file. The rest of the package should contain all
code that you used. The code should be properly organized in folders and subfolders, one for each question
or problem. All code, if applicable, should be turned in when you submit your assignment as it may be
necessary to demo your programs to the teaching assistants. Use Matlab, Python, R, Java, or C/C++.
However, you are encouraged to use languages with good machine learning libraries (e.g., Matlab, Python,
R), which may be handy in future assignments.
Unless there are legitimate circumstances, late assignments will be accepted up to 5 days after the due date
and graded using the following rules:
on time: your score × 1
1 day late: your score × 0.9
2 days late: your score × 0.7
3 days late: your score × 0.5
4 days late: your score × 0.3
5 days late: your score × 0.1
For example, this means that if you submit 3 days late and get 80 points for your answers, your total number
of points will be 80 × 0.5 = 40 points.
All assignments are individual, except when collaboration is explicitly allowed. All the sources used for
problem solution must be acknowledged; e.g., web sites, books, research papers, personal communication
with people, etc. Academic honesty is taken seriously! For detailed information see Office of Student Conduct
and Conflict Resolution.
1We recommend Latex; in particular, TexShop-MacTeX combination for a Mac and TeXnicCenter-MiKTex combination
on Windows. An easy way to start with Latex is to use the freely available Lyx. You can also use Microsoft Word or other
programs that can display formulas professionally.

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468