辅导案例-STAT 542

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

STAT 542: Homework 11
FALL 2020, by Ruoqing Zhu (rqzhu)
Due: Monday, Nov 23, 11:59 PM CT
Contents
Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
About HW11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Question 1 [50 Points] Linear SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Question 2 [50 Points] Non-linear SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Instruction
Students are encouraged to work together on homework. However, sharing, copying, or providing any part of
a homework solution or code is an infraction of the University’s rules on Academic Integrity. Any violation
will be punished as severely as possible. Final submissions must be uploaded to compass2g. No email or
hardcopy will be accepted. For late submission policy and grading rubrics, please refer to the course
website.
• You are required to submit two files:
– Your .rmd RMarkdown (or Python) file, which should be saved as HWx_yourNetID.Rmd. For
example, HW1_rqzhu.Rmd.
– The result of knitting your RMarkdown file as HW1_yourNetID.pdf. For example, HW1_rqzhu.pdf.
Please note that this must be a .pdf file. .html format cannot be accepted.
• Include your Name and NetID in your report.
• If you use this file or the example homework .Rmd file as a template, be sure to remove this instruction
section.
• Your .Rmd file should be written such that, if it is placed in a folder with any data you utilize, it will
knit properly without modification.
• Make sure that you set seed properly so that the results can be replicated.
• For some questions, there will be restrictions on what packages you can use. Please read the requirements
carefully.
About HW11
We continue the SVM example from HW10 to perform linear and nonlinear classification using the penalized
loss framework. Both questions are based on the following logistic loss function:
L(y, f(x)) = log(1 + e−yf(x)).
The rest of the job is to solve this optimization problem if given the functional form of f(x). To do this, we
will utilize a general-purpose optimization package/function. For example, in R, you can use the optim()
function. Read the documentation of this function (or equivalent ones in Python) and set up the objective
function properly to solve for the parameters. If you need an example of how to use the optim function, read
the corresponding part in the example file provided on our course website here (Section 10).
1
Question 1 [50 Points] Linear SVM
When f(x) is a linear function, SVM can be solved by optimizing the penalized loss:
argmin
β0,β
n∑
i=1
L(yi, β0 + xTi β) + λ‖β‖2
You should generate the data using the code provided below code (or write similar code in Python), and
answer these questions:
• Write a function to define the penalized loss objective function. The R function optim() can run faster
if you further define the gradient function. Hence, you should also define the gradient function properly
and implement it in the optimization.
• Choose a reasonable λ value so that your optimization can run properly. In addition, I recommend
using the BFGS method in optimization.
• After solving the optimization problem, plot all data and the decision line
• If needed, modify your λ so that the model fits reasonably well and re-plot. You don’t have to tune λ
as long as you obtain a reasonable decision line.
• Report your training error.
set.seed(20)
n = 100 # number of data points for each class
p = 2 # dimension
# Generate the positive and negative examples
xpos <- matrix(rnorm(n*p,mean=0,sd=1),n,p)
xneg <- matrix(rnorm(n*p,mean=1.5,sd=1),n,p)
x <- rbind(xpos,xneg)
y <- c(rep(-1, n), rep(1, n))
plot(x,col=ifelse(y>0,"darkorange", "deepskyblue"), pch = 19, xlab = "x1", ylab = "x2")
legend("topleft", c("Positive","Negative"), col=c("darkorange", "deepskyblue"),
pch=c(19, 19), text.col=c("darkorange", "deepskyblue"))
−2 0 2 4
−
2
−
1
0
1
2
3
x1
x2
Positive
Negative
2
Question 2 [50 Points] Non-linear SVM
You should generate the data using the code provided below code (or write similar code in Python), and
answer these questions: + Use the kernel trick to solve for a nonlinear decision rule. Consider using the
penalized optimization of the following form:
n∑
i=1
L(yi,wTKi) + λwTKw
where Ki is the ith column of the n×n kernel matrix K, and the (i, j)th entry of K is K(xi, xj), for any two
sample points. For this problem, we consider K(·, ·) as the Gaussian kernel. For the bandwidth of the kernel
function, you can borrow ideas from our previous homework. You do not have to optimize the bandwidth.
• Pre-calculate the n× n kernel matrix K of the observed data.
• Write a function to define the objective function. You should also define the gradient function properly
and implement it in the optimization.
• Choose a reasonable λ value so that your optimization can run properly.
• It could be difficult to obtain the decision line itself. However, its relatively easy to obtain the fitted
label. Hence, calculate the fitted label (in-sample prediction) of your model and report the classification
error.
• Plot the data using the fitted labels. This would allow you to visualize (approximately) the decision line.
If needed, modify your λ so that the model fits reasonably well and re-plot. You don’t have to tune λ
as long as your result is reasonable. You can judge if your model is reasonable based on the true model.
set.seed(1)
n = 400
p = 2 # dimension
# Generate the positive and negative examples
x <- matrix(runif(n*p), n, p)
side <- (x[, 2] > 0.5 + 0.3*sin(3*pi*x[, 1]))
y <- sample(c(1, -1), n, TRUE, c(0.9, 0.1))*(side == 1) + sample(c(1, -1), n, TRUE, c(0.1, 0.9))*(side == 0)
plot(x,col=ifelse(y>0,"darkorange", "deepskyblue"), pch = 19, xlab = "x1", ylab = "x2")
legend("topleft", c("Positive","Negative"),
col=c("darkorange", "deepskyblue"), pch=c(19, 19), text.col=c("darkorange", "deepskyblue"))
3
0.0 0.2 0.4 0.6 0.8 1.0
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
x1
x2
Positive
Negative
4

欢迎咨询51作业君