辅导案例-CS534-Assignment 2

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

CS534 — Implementation Assignment 2 — Due 11:59PM Oct 21st, 2020
General instructions.
1. Please use Python 3 (preferably version 3.6+). You may use packages: Numpy, Pandas, and mat-
plotlib, along with any from the standard library (such as ’math’, ’os’, or ’random’ - for example).
2. You should complete this assignment alone. Please do not share code with other students, or copy
program files/structure from any outside sources like Github. Your work should be your own.
3. Your source code and report will be submitted through Canvas.
4. You need to follow the submission instructions for file organization (located at the end of the report).
5. Please run your code before submission on one of the OSU EECS servers (i.e. babylon01.eecs.oregonstate.edu).
You can make your own virtual environment with the packages we’ve listed in either your user directory or
on the scratch directory. If you’re unfamiliar with any of this process, or have limited access, please contact
one of the TA’s.
6. Be sure to answer all the questions in your report. You will be graded based on your code as well
as the report. In particular, the clarity and quality of the report will be worth 10 pts. So please
write your report in clear and concise manner. Clearly label your figures, legends, and tables. It should be
a PDF document.
7. In your report, the results should always be accompanied by discussions of the results. Do
the results follow your expectation? Any surprises? What kind of explanation can you provide?
1
Logistic regression with L2 and L1 regularizations
(total points: 90 pts + 10 report pts)
For this assignment, you need to implement and test logistic regression, which learns from a set of
N training examples {xi, yi}Ni=1 an weight vector w that maximize the log likelihood objective. You will
examine two different regularization methods: L2 (ridge) and L1 (Lasso).
Data. This dataset consists of health insurance customer demographics, as well as collected information
related to the customers’ driving situation. Your goal is to use this data to predict whether or not a customer
may be interested in purchasing vehicular insurance as well (this is your ”Response” variable). The dataset
description (dictionary) is included. Do not use existing code from outside sources for any portions
of this assignment. This would be a violation of the academic integrity policy.
The data is provided to you in both a training set: pa2 train.csv, and a validation set: pa2 dev.csv,
with an X and y for both (X being features, y being labels). You have labels for both sets of data. You do
not have to perform preprocessing on this dataset, nor modify the features, this has been done for you.
Preprocessing Information In order to train on this data, we have pre-processed it into an appropriate
format. This is done for you in this assignment to ensure results are similar across submissions (easier
to grade). You should be familiar with this process already from the first assignment. In particular, we
have treated [Gender, Driving License, Region Code, Previously Insured, Vehicle Age, Vehi-
cle Damage, Policy Sales Channel] as categorical features. We have converted those with multiple
categories (some that originally contained textual descriptions) into one-hot vectors. Note that we left Age
as an ordinal numeric feature. You are to leave these as is and not modify further for this assignment, but
understand the process. The numeric and ordinal features [Age, Annual Premium, Vintage] are also
scaled to the range of [0, 1]. Additionally, the dataset should be relatively class balanced (close to the same
number of 1’s and 0’s for Response). This was not the case in the raw data, so we downsampled for easier
training purposes. There are other ways to handle class imbalance, beyond the scope of this assignment, but
it is a common problem in real-world data.
General guidelines for training. For all parts, you should set a upper limit on the number of training
iterations (e.g., 10k) and train your model until either the convergence condition is met, i.e., the improvement
of the objective is small, or you hit the iteration limit. If you find that your algorithm needs more than 10k
iterations to converge, feel free to use higher values. It is a good practice to monitor objective during the
training to ensure that it is not diverging. You will need to adjust your learning rate based on the observed
training behavior.
2
Part 1 (45 pts) : Logistic regression with L2 (Ridge) regularization. Recall, Logistic regression
with L2 regularization aims to minimize the following loss function1:
1
N
N∑
i=1
[−yi log σ(wTxi)− (1− yi)(1− log σ(wTxi)]+ λ d∑
j=1
w2j (1)
See the following algorithm for batch gradient descent 2 optimization of Equation 1.
Algorithm 1: Gradient descent for Ridge logistic regression
Input: {(xi, yi)Ni=1}(training data), α(learning rate), λ(regularization parameter)
Output: learned weight vector w
Initialize w;
while not converged do
w← w + αN
∑N
i=1(yi − σ(wTxi))xi ; // normal gradient without the L2 norm
for j = 1 to d do
wj ← wj − αλwj ; // L2 norm contribution
end
end
For this part of the assignment, you will need to do the following:
(a) Implement Algorithm 1 and experiment with different regularization parameters λ ∈ {10−i : i ∈ [0, 5]}.
(b) Plot the training accuracy and validation accuracy of the learned model as the λ value varies. What
trend do you observe for the training accuracy as we increase λ? Why is this the case? What trend do
you observe for the validation accuracy? What is the best λ value based on the validation accuracy?
(c) For the best model selected in (b), sort the features based on |wj |. What are top 5 features that are
considered important according to the learned weights? How many features have wj = 0? If we use
larger λ value, do you expect more or fewer features to have wj = 0?
1In class we presented the log likelihood function as the objective to maximize. It is, however, more common to put a
negative in the front and turn it into a loss function, which is called “negative loglikelihood”.
2Our lecture presented gradient ascent, here since we are working with loss function, we use gradient descent instead.
3
Part 2 (45 pts). Logistic Regression with L1 (Lasso) regularization For this part, you will
need to implement L1 regularized logistic regression. Recall that the loss function for L1 regularized logistic
regression is:
1
N
N∑
i=1
[−yi log σ(wTxi)− (1− yi)(1− log σ(wTxi)]+ λ d∑
j=1
|wj | (2)
The following algorithm minimizes Equation 2 via a procedure called proximal gradient descent. For
L1 regularized loss functions, Proximal gradient descent often leads to substantially faster convergence than
simple gradient (or subgradient in this case since the L1 norm is not differentiable everywhere) descent.
You can refer to Ryan Tibshirani’s note (http://www.stat.cmu.edu/~ryantibs/convexopt/lectures/
prox-grad.pdf) for an introduction to this method.
Algorithm 2: Proximal gradient descent for LASSO logistic regression
Input: {(xi, yi)Ni=1}(training data), α (learning rate), λ (regularization parameter)
Output: learned weight vector w
Initialize w;
while not converged do
w← w + α 1N
∑N
i=1(yi − σ(wTxi))xi ; // normal gradient descent without the L1 norm
for j = 1 to d do
wj ← sign(wj) max (|wj | − αλ, 0) ; // soft thresholding each wj: if |wj | < αλ, wj ← 0
end
end
For this part of the assignment, you will need to do the following:
(a) Implement Algorithm 2 and experiment with different regularization parameters λ ∈ {10−i : i ∈ [0, 5]}.
(b) Plot the training accuracy and validation accuracy of the learned model as the λ value varies. What
trend do you observe for the training accuracy as we increase λ? Why is this the case? What trend
do you observe for the validation accuracy? What is the best value based on the validation accuracy?
(c) For the best model, sort the features based on |wj |. What are top 5 features that are considered
important? How many features have wj = 0? If we use larger λ value, do you expect more or fewer
features to have wj = 0?
(d) Compare and discuss the differences in your results for Part 1 and Part 2, both in terms of the
performance and sparsity of the solution.
Submission. Your submission should include the following:
1) Your source code. One file for each Part. The files should be named (for example) part1.py, and
should run with simply python part1.py. You do not need to generate plots in the submission code, please
just include those in your report;
2) Your report (see general instruction items 6 and 7 on page 1 of the assignment), which should begin with
a general introduction section, followed by one section for each part of the assignment;
3) Please submit the report PDF, along with a .zip containing the code to Canvas. The PDF should be
outside the .zip so it’s easier to view the report.
4

欢迎咨询51作业君