辅导案例-IEOR 4650

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

IEOR 4650: Business Analytics
Fall 2016 Professor Jacob Leshno
Midterm
Instructions
1. The exam is 70 minutes long. You must stop working when time is up.
2. You are allowed to use one handwritten page of notes, front and back.
3. No electronics can be used during the the exam, except for plain calculators. You may
not use cell phones as calculators.
4. Write all your answers on this exam. Points will be deducted for solutions that are not
clear or explained.
5. There is a total of more than 100 points, and you may and should answer as many
questions as possible.
6. Sign the honor pledge below.
HONOR PLEDGE: I pledge that I have neither received or given information regarding this
exam before, during, or after my examination.
Name: UNI: Signature:
1
2
3
4
5
Total
1
1. Linear Regression [27pts]
You are part of the sales forcast team, and your task is to predict sales in different markets for
2017. To make the prediction you downloaded a small part of the company’s historical sales
database. In the following question Y is sales, which is your continuous dependent variable.
You are trying to predict Y using features X1, . . . , Xp. The small dataset you downloaded
contains n observations containing Y,X1, . . . , Xp which you divided into training, validation
and test datasets.
Using the training data, you ran a linear regression to predict Y using the features
X1, . . . , Xp and got the coefficients βˆ = (βˆ0, βˆ1, . . . , βˆp). One of your teammates worked on
the forecast model last quarter and after comprehensive analysis decided to use a similar
linear model but with different coefficients β˜ = (β˜0, β˜1, . . . , β˜p).
(a) Is the MSE on the training data using βˆ likely to be higher or lower than the MSE on
the training data using β˜? Why? [3pts]
(b) Is the MSE on the test data using βˆ likely to be higher or lower than the MSE on the
test data using β˜? Why? [3pts]
(c) What is the difference between MSE and R2? [3pts]
You realize that there is another factor Xp+1 which is important but was not in your
database. Luckily, you realize you can create it from the data you have using the
formula
Xp+1 = X2 +X5 −X3.
You create Xp+1 and re-estimate the coefficients βˆ′ = (βˆ0, βˆ1, . . . , βˆp, βˆp+1) using the
training data.
(d) Is the MSE on the training data using βˆ likely to be higher or lower than the MSE on
the training data using βˆ′? Why? [3pts]
2
(e) Is the MSE on the test data using βˆ likely to be higher or lower than the MSE on the
training data using βˆ′? Why? [3pts]
There is a different important factor Xp+2 which was missing from your dataset. The
factor Xp+2 can be created using the formula
Xp+2 = X4/(X2 +X6).
You create Xp+2 and re-estimate the coefficients βˆ′′ = (βˆ0, βˆ1, . . . , βˆp, βˆp+1, βˆp+2) using
the training data.
(f) Is the MSE on the training data using βˆ likely to be higher or lower than the MSE on
the training data using βˆ′′? Why? [3pts]
(g) Is the MSE on the test data using βˆ likely to be higher or lower than the MSE on the
training data using βˆ′′? Why? [3pts]
(h) Among the models discussed above, how would you go about finding the final linear
model to use for predictions? [3pts]
(i) How would you assess your final model from part (h)? [3pts]
3
2. Linear Model Selection [20pts]
Consider the following R code and output.
> dim(baseballData)
[1] 420 41
> library(leaps)
> k = 10
> m = 5
> folds = sample(1:k, nrow(baseballData), replace=TRUE)
> cv.errors = matrix(NA, k, m, dimnames=list(NULL, paste(1:m)))
> for(j in 1:k){
+ regfit_full = regsubsets( Rank ˜ . , data = baseballData[folds!=j,],
+ nvmax = m, really.big = T)
+ for(i in 1:m){
+ # the function predict.regsubsets uses the fitted model of the selected id
+ # to give predictions for the data
+ pred = predict.regsubsets( regfit_full, baseballData[folds==j,], id = i)
+ cv.errors[j,i] = mean( (baseballData$Rank[folds==j]-pred)ˆ2 )
+ }
+ }
> colMeans(cv.errors)
1 2 3 4 5
2.0528483 1.9630331 0.8725024 0.8732944 0.8103300
(a) Describe the model selection process performed by the code. What are the parameters
used and selection method? [5pts]
(b) Which model is selected? [5pts]
4
(c) When we replace the line
> m = 5
into the line
> m = 20
the code fails to properly run, and even after waiting half an hour there is no output.
Why? [5pts]
(d) Suggest an alternative method we can use to select a good linear model that uses only
a small number of covariates. (give a brief description) [5pts]
5
3. Financial Analytics [20pts]
We run a regression to predict stock returns using 1-day lagged returns of the same stock
and got the following results:
> summary(lm_1day)
Call:
lm(formula = Return ˜ X1D)
Residuals:
Min 1Q Median 3Q Max
-5.0834 -0.4376 -0.0987 0.5737 3.3240
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.005323 0.092759 -0.057 0.9543
X1D 0.180536 0.087385 2.066 0.0409 *
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
Residual standard error: 1.037 on 123 degrees of freedom
Multiple R-squared: 0.03354,Adjusted R-squared: 0.02568
F-statistic: 4.268 on 1 and 123 DF, p-value: 0.04093
(a) What is the R2 of the model? Does it describe in-sample or out-of-sample fit? [5pts]
(b) Describe the regression results in words. Give detail as to how the regression model
fits the data and the relationship between the dependent and independent variables.
[5pts]
(c) Can this prediction be used to create a profitable investment portfolio? [5pts]
6
(d) Suppose you are given data on many stocks, and find that this regression model per-
forms equally well for all of them. How would leverage this to construct an investment
portfolio? Explain where the improved performance comes from. [5pts]
(hint: follow the approach we used in class)
4. Nearest Neighbors [25pts]
You often go to restaurants with your friend J.D. and you collected the following data about
your J.D.’s preferences over restaurants. You ranked each restaurant on Service, Desserts
and Drinks on a scale of -5 to 5.
Name Service Desserts Drinks J.D. Liked?
Talligant 5 2 3 1
Novoue 1 -2 0 0
Illustrat -2 3 1 0
KamaKara 3 2 4 1
Hibachi 3 -3 2 0
Toto -2 0 0 0
Savangan 0 0 3 1
Hourantan 5 2 0 1
A new restaurant named G63 recently opened up, and you are trying to decide whether
you should recommend to your friend J.D. The restaurant G63 ranks 0 on Service, 0 on
Desserts and 0 on Drinks.
(a) Using a 3-NN, what is the predicted probability that J.D. will like G63? Assume all
attributes are equally important. [5pts]
7
(b) Since J.D. just went on a diet, you think that you the Dessert attribute should be
ignored. If so, what are the 3 nearest neighbors and what is your prediction? [5pts]
(c) The local magazine came out with a detailed evaluations of restaurants, ranking all
these restaurants on 30 additional attributes. Will adding these additional attributes
improve the prediction of the 3-NN algorithm? Explain. [5pts]
(d) Another friend can share with you his information about whether J.D. liked 53 other
restaurants around the country, as well as their rankings on the 3 attributes in the
table. Will adding these additional restaurants improve the prediction of the 3-NN
algorithm? Explain. [5pts]
(e) Giving J.D. a bad recommendation will make him very upset, because he already
knows many good restaurants that he would like to visit. Therefore, you should only
recommend the restuarant to J.D. if you’re at least 90% sure that he will like it. Given
this information, how should you translate the prediction from the 3-NN model into a
decision of whether to recommend the restaurant G63 to J.D.? [5pts]
8
5. Decision Trees [20pts]
(a) Draw the decision tree corresponding to the figure above. The number inside the boxes
indicate the mean of Y within that region. [8pts]
9
(b) Suppose we have a tree and we want to assign a value to each leaf to get a prediction
for a continuous outcome variable. How do we assign a value to each leaf? [4pts]
(c) Explain the advantage of pruning a decision tree. [4pts]
(d) Why it is bad to have a leaf that contains only a single data point? [4pts]
10

欢迎咨询51作业君