IEOR 4650: Business Analytics Fall 2016 Professor Jacob Leshno Midterm Instructions 1. The exam is 70 minutes long. You must stop working when time is up. 2. You are allowed to use one handwritten page of notes, front and back. 3. No electronics can be used during the the exam, except for plain calculators. You may not use cell phones as calculators. 4. Write all your answers on this exam. Points will be deducted for solutions that are not clear or explained. 5. There is a total of more than 100 points, and you may and should answer as many questions as possible. 6. Sign the honor pledge below. HONOR PLEDGE: I pledge that I have neither received or given information regarding this exam before, during, or after my examination. Name: UNI: Signature: 1 2 3 4 5 Total 1 1. Linear Regression [27pts] You are part of the sales forcast team, and your task is to predict sales in different markets for 2017. To make the prediction you downloaded a small part of the company’s historical sales database. In the following question Y is sales, which is your continuous dependent variable. You are trying to predict Y using features X1, . . . , Xp. The small dataset you downloaded contains n observations containing Y,X1, . . . , Xp which you divided into training, validation and test datasets. Using the training data, you ran a linear regression to predict Y using the features X1, . . . , Xp and got the coefficients βˆ = (βˆ0, βˆ1, . . . , βˆp). One of your teammates worked on the forecast model last quarter and after comprehensive analysis decided to use a similar linear model but with different coefficients β˜ = (β˜0, β˜1, . . . , β˜p). (a) Is the MSE on the training data using βˆ likely to be higher or lower than the MSE on the training data using β˜? Why? [3pts] (b) Is the MSE on the test data using βˆ likely to be higher or lower than the MSE on the test data using β˜? Why? [3pts] (c) What is the difference between MSE and R2? [3pts] You realize that there is another factor Xp+1 which is important but was not in your database. Luckily, you realize you can create it from the data you have using the formula Xp+1 = X2 +X5 −X3. You create Xp+1 and re-estimate the coefficients βˆ′ = (βˆ0, βˆ1, . . . , βˆp, βˆp+1) using the training data. (d) Is the MSE on the training data using βˆ likely to be higher or lower than the MSE on the training data using βˆ′? Why? [3pts] 2 (e) Is the MSE on the test data using βˆ likely to be higher or lower than the MSE on the training data using βˆ′? Why? [3pts] There is a different important factor Xp+2 which was missing from your dataset. The factor Xp+2 can be created using the formula Xp+2 = X4/(X2 +X6). You create Xp+2 and re-estimate the coefficients βˆ′′ = (βˆ0, βˆ1, . . . , βˆp, βˆp+1, βˆp+2) using the training data. (f) Is the MSE on the training data using βˆ likely to be higher or lower than the MSE on the training data using βˆ′′? Why? [3pts] (g) Is the MSE on the test data using βˆ likely to be higher or lower than the MSE on the training data using βˆ′′? Why? [3pts] (h) Among the models discussed above, how would you go about finding the final linear model to use for predictions? [3pts] (i) How would you assess your final model from part (h)? [3pts] 3 2. Linear Model Selection [20pts] Consider the following R code and output. > dim(baseballData) [1] 420 41 > library(leaps) > k = 10 > m = 5 > folds = sample(1:k, nrow(baseballData), replace=TRUE) > cv.errors = matrix(NA, k, m, dimnames=list(NULL, paste(1:m))) > for(j in 1:k){ + regfit_full = regsubsets( Rank ˜ . , data = baseballData[folds!=j,], + nvmax = m, really.big = T) + for(i in 1:m){ + # the function predict.regsubsets uses the fitted model of the selected id + # to give predictions for the data + pred = predict.regsubsets( regfit_full, baseballData[folds==j,], id = i) + cv.errors[j,i] = mean( (baseballData$Rank[folds==j]-pred)ˆ2 ) + } + } > colMeans(cv.errors) 1 2 3 4 5 2.0528483 1.9630331 0.8725024 0.8732944 0.8103300 (a) Describe the model selection process performed by the code. What are the parameters used and selection method? [5pts] (b) Which model is selected? [5pts] 4 (c) When we replace the line > m = 5 into the line > m = 20 the code fails to properly run, and even after waiting half an hour there is no output. Why? [5pts] (d) Suggest an alternative method we can use to select a good linear model that uses only a small number of covariates. (give a brief description) [5pts] 5 3. Financial Analytics [20pts] We run a regression to predict stock returns using 1-day lagged returns of the same stock and got the following results: > summary(lm_1day) Call: lm(formula = Return ˜ X1D) Residuals: Min 1Q Median 3Q Max -5.0834 -0.4376 -0.0987 0.5737 3.3240 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.005323 0.092759 -0.057 0.9543 X1D 0.180536 0.087385 2.066 0.0409 * --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 Residual standard error: 1.037 on 123 degrees of freedom Multiple R-squared: 0.03354,Adjusted R-squared: 0.02568 F-statistic: 4.268 on 1 and 123 DF, p-value: 0.04093 (a) What is the R2 of the model? Does it describe in-sample or out-of-sample fit? [5pts] (b) Describe the regression results in words. Give detail as to how the regression model fits the data and the relationship between the dependent and independent variables. [5pts] (c) Can this prediction be used to create a profitable investment portfolio? [5pts] 6 (d) Suppose you are given data on many stocks, and find that this regression model per- forms equally well for all of them. How would leverage this to construct an investment portfolio? Explain where the improved performance comes from. [5pts] (hint: follow the approach we used in class) 4. Nearest Neighbors [25pts] You often go to restaurants with your friend J.D. and you collected the following data about your J.D.’s preferences over restaurants. You ranked each restaurant on Service, Desserts and Drinks on a scale of -5 to 5. Name Service Desserts Drinks J.D. Liked? Talligant 5 2 3 1 Novoue 1 -2 0 0 Illustrat -2 3 1 0 KamaKara 3 2 4 1 Hibachi 3 -3 2 0 Toto -2 0 0 0 Savangan 0 0 3 1 Hourantan 5 2 0 1 A new restaurant named G63 recently opened up, and you are trying to decide whether you should recommend to your friend J.D. The restaurant G63 ranks 0 on Service, 0 on Desserts and 0 on Drinks. (a) Using a 3-NN, what is the predicted probability that J.D. will like G63? Assume all attributes are equally important. [5pts] 7 (b) Since J.D. just went on a diet, you think that you the Dessert attribute should be ignored. If so, what are the 3 nearest neighbors and what is your prediction? [5pts] (c) The local magazine came out with a detailed evaluations of restaurants, ranking all these restaurants on 30 additional attributes. Will adding these additional attributes improve the prediction of the 3-NN algorithm? Explain. [5pts] (d) Another friend can share with you his information about whether J.D. liked 53 other restaurants around the country, as well as their rankings on the 3 attributes in the table. Will adding these additional restaurants improve the prediction of the 3-NN algorithm? Explain. [5pts] (e) Giving J.D. a bad recommendation will make him very upset, because he already knows many good restaurants that he would like to visit. Therefore, you should only recommend the restuarant to J.D. if you’re at least 90% sure that he will like it. Given this information, how should you translate the prediction from the 3-NN model into a decision of whether to recommend the restaurant G63 to J.D.? [5pts] 8 5. Decision Trees [20pts] (a) Draw the decision tree corresponding to the figure above. The number inside the boxes indicate the mean of Y within that region. [8pts] 9 (b) Suppose we have a tree and we want to assign a value to each leaf to get a prediction for a continuous outcome variable. How do we assign a value to each leaf? [4pts] (c) Explain the advantage of pruning a decision tree. [4pts] (d) Why it is bad to have a leaf that contains only a single data point? [4pts] 10
欢迎咨询51作业君