程序代写案例-STATS 415 001

STATS 415 001 FA 2021 Quizzes Mock midterm!
Mock midterm
Started: Oct 13 at 9:11pm
Quiz Instruc!ons
Quiz saved at 9:11pm
This exam has 5
T/F ques!ons, 5 single-choice ques!ons, 3 mul!ple-choice ques!ons and 2 data analysis ques!ons. The final
two data analysis ques!ons require submission of rmd files that answer all the ques!ons.

2 ptsQues!on 1
True
False
The larger K you choose in the K-nearest-neighbors (KNN) method, the more accurate the resul!ng classifier
will become.

2 ptsQues!on 2
True
False
A credit card company would like to hire you to build an automa!c system that tells whether a given
applicant should be issued with a credit card. This task is an unsupervised learning problem.

2 ptsQues!on 3
True
False
The more covariates you recruit in your linear regression model, the less training error you will get.

2 ptsQues!on 4
True
False
The sample median of solves a least square problem in the sense that () { } ∈[]
() = ∈(−∞, ∞) ∑=1 ( − )
2

2 ptsQues!on 5
True
False
The sample mean is sensi!ve to outliers.

3 ptsQues!on 6
The ordinary least squares (OLS) es!mator does not have any bias.
The higher the on the training data is, the stronger generaliza!on capabitlity of the learned model has.
The residuals are independent across the observa!ons.
The residuals are uncorrelated across the observa!ons.
Which of the following is true about linear regression?
2

3 ptsQues!on 7
It will not change
It will increase by 1
It will increase by some amount, but we do not have enough informa!on to determine by how much
Suppose I have a sample of 100 data points, all of them unique posi!ve numbers. If I mul!ply 10 largest
values by 100, how will the sample mean change?

3 ptsQues!on 8
It is a discrimina!ve method rather than a genera!ve one in the sense that it does not involve the data genera!on
mechanism.
It can be applied to prac!cal data that are not exactly Gaussian.
It is mo!vated by the Bayes Rule.
Compared with general QDA, LDA requires fewer parameters to be es!mated.
Which of the following is false about the linear discriminant analysis (LDA)?

3 ptsQues!on 9
The smaller K is, the lower the training MSE
The larger K is, the more robust the resul!ng classifier is.
The larger K is, the lower the tes!ng error is.
The smaller K is, the more sensi!ve the resul!ng classifier is.
Which of the following is false about KNN?

3 ptsQues!on 10
A sca"er plot
Side-by-side boxplots of Y for each level of X
Side-by-side boxplots of X for different ranges of Y
Suppose you have two variables, X and Y, and are interested in visualizing their rela!onship. X is categorical
with three levels, and Y is a con!nuous variable that takes values between 0 and 1. What is the most
appropriate plot to make?

5 ptsQues!on 11
In a two-class classifica!on task, the feature vectors from the two classes are Gaussian and share the same covariance
structure.
Method (A): LDA v.s. Model (B): QDA.

In a two-class classifica!on task, the feature vectors from the two classes are Gaussian and have different means.
Method (A): LDA v.s. Model (B): QDA.
In a classifica!on problem, the features are independent of each other.
Method (A): QDA v.s. Method (B): Naive Bayes
In a regression problem with a con!nuous response Y and a single covariate , the sca"er plot of and is presented
as follows:
Method (A): KNN v.s. Method (B): Linear regression
In each of the following setups, !ck the box if Method (A) is preferred to Method (B).


5 ptsQues!on 12
(M3) is problema!c because we need to introduce extra dummy variables to express , which is discrete.
In (M1), must be non-nega!ve.
According to (M2), one unit increase in , given all the other predictors fixed, implies increase in . In words, one
more USD expense on produc!on will improve the ra!ng by on average.
The in-sample of (M3) will be smaller than that of (M1) and (M2).
Suppose you are trying to predict the average ra!ng ( , a con!nuous response variable ranging from 0 to 10)
of a movie based on whether the movie is a roman!c movie ( : 1 if yes and 0 otherwise), its produc!on cost
( : con!nuous, in USD) and the number of views ( : a discrete non-nega!ve variable). Consider the
following three models under the fixed design setup:
(M1):
(M2):
(M3):
where . Which of the following arguments is true?

1
2 3
= + + ,0 1 1
= + + + + ,0 11 2 2 312
= + + + + + ,0 11 2 2 33 412
() = 0
3
0
2 2
2
2

5 ptsQues!on 13
The projec!on matrix is symmetric.
The projec!on matrix is always full-rank and thus inver!ble.
Given a design matrix , whose rows correspond to the feature vectors of observa!ons, and a
response vector , which of the following arguments is true?
∈ ℝ ×
∈ ℝ
= ( ∈⊤ )−1⊤ ℝ×

( − = − )2
( − = 0)100 100
( − ) ≥ 0⊤
≥ 0⊤

30 ptsQues!on 14
Upload
Our goal in this problem is to predict the housing price using its different features. Download and read
"housing.csv" in the folder "Mock midterm datasets" using R. Submit your R markdown file to answer the
following ques!ons.
(1) Choose the first 10,000 observa!ons of the loaded data frame as the training data and the rest as the
tes!ng data. Build a linear model on the training data by regressing the housing price on these variables:
bedrooms, bathrooms, sq$_living and sq$_lot, including an intercept term. Report the coefficients you
obtained. What are the training and tes!ng MSEs?
(2) Add the zipcode to your linear model. Report the training and tes!ng MSEs of your new model. How many
parameters are there in this new model? (Hint: Pay a"en!on to the data type of the zipcode.)
Choose a File

30 ptsQues!on 15
Upload
Download and read "iris.csv" in the folder "Mock midterm datasets" using R. Submit a R markdown file that
answers the following ques!ons.
(1) Give the sca"er plot of the features "Petal.Width" and "Petal.Length" of all the observa!ons. Use three
different colors to represent the three species, and describe correpondence between the colors and species
in the legend of the plot.
(2) Choose the first 100 observa!ons of the loaded data frame as the training data and the rest as the tes!ng
data. Apply LDA and QDA using only the features "Petal.Width" and "Petal.Length". Report the training and
tes!ng predic!on accuracy of the LDA and QDA methods respec!vely.
(3) Apply the K-NN method on the same training and tes!ng dataset as in (2). Give the plot of the tes!ng
MSE with respect to choice of K.

Choose a File
Submit Quiz
Ques!ons
Time Running: Hide
1 Hour, 19 Minutes, 42 Seconds
? Ques!on 1
? Ques!on 2
? Ques!on 3
? Ques!on 4
? Ques!on 5
? Ques!on 6
? Ques!on 7
? Ques!on 8
? Ques!on 9
Fall 2021
Home
Announcements
Assignments
Grades
People
Pages
Quizzes
Modules
Piazza
Account
Dashboard
Courses
Groups
Calendar
Inbox
History
Help
Well-being
14

欢迎咨询51作业君
51作业君 51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: ITCSdaixie