辅导案例-FINC 430

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

FINC 430 Midterm Exam Spring 2020

1. Decide whether each model is parametric or non-parametric (your answer should be either
parametric or non-parametric for the following models).
a.
b.
c. KNN regression
d.
a. parametric, b. parametric, c. non-parametric, d. parametric
2. Decide whether each model is linear or non-linear (your answer should be either linear or non-linear
for the following models).
a.
b.
c. KNN regression
d.
a. linear, b. linear (or nonlinear), c. non-linear, d. linear (or nonlinear)
3. Calculate approximate values of parameters for the following LDA classification:

(you should fill out the seven blanks in the table below)
Class 1 (yellow) Class 2 (green) Class 3 (blue)
Mean vector <1, 3> <-1, -1> <2, 1>
Covariance matrix Any 2x2 matrix with all positive values is correct, but 0 or negative value is
wrong.
Class membership
probabilities
1/3 1/3 1/3

4. Which classification method is better between LDA and QDA for the following 3-class classification
problem? (your answer should be either LDA or QDA), and why?
Either of the following answers is correct:
a. LDA, covariance matrices are similar among classes (or linear boundary is appropriate to
perform classification)
b. QDA, covariance matrices are different among classes

5. Fill the missing part in the following Python program to calculate , where X
is BTC, and Y is ETH.

import pandas as pd
df = pd.read_csv("Log returns.csv")
print(df)

Unnamed: 0 BTC ETH
0 12/31/2018 -0.040653 -0.057338
1 1/1/2019 0.039229 0.066670
2 1/2/2019 0.016204 0.090924
3 1/3/2019 -0.025809 -0.040519
4 1/4/2019 0.010956 0.040081
.. ... ... ...
394 1/29/2020 -0.011010 -0.011679
395 1/30/2020 0.022367 0.058923
396 1/31/2020 -0.017096 -0.026763
397 2/1/2020 0.005135 0.022159
398 2/2/2020 -0.006410 0.025335

[399 rows x 3 columns]

import numpy as np
cov = np.array(df.cov())
alpha = (cov[1][1] - cov[0][1]) / (cov[0][0] + cov[1][1] - 2*cov[0][1])
or alpha = (cov[1][1] - cov[1][0]) / (cov[0][0] + cov[1][1] - 2*cov[1][0])

Hint: your answer should be similar, but different from the following code:
alpha = (cov['ETH']['ETH'] - cov['BTC']['ETH']) \
/ (cov['BTC']['BTC'] + cov['ETH']['ETH'] - 2*cov['BTC']['ETH'])

6. The table below provides a training data set containing six observations, three predictors, and one
qualitative response variable.

Suppose we wish to use this data set to make a prediction for when ଵ = 0,ଶ = 1, ଷ = 2 using
-nearest neighbors.

a. Compute the Euclidean distance between each observation and the test point, ଵ = 0, ଶ =
1, ଷ = 2.

b. What is our prediction with = 1? (your answer should be either Red or Green) Why?
Our prediction is Y=Green because that is the response value of the first nearest neighbor to
the point X1=0, X2=1, X3=2

c. What is our prediction with = 3? (your answer should be either Red or Green) Why?
Red, because majority of the 3 nearest neighbors are Red.

7. Suppose that we wish to predict whether a given stock will issue a dividend this year (“Yes” or “No”)
based on , last year’s percent profit. We examine a large number of companies and discover that
the mean value of for companies that issued a dividend was ത = 7, while the mean for those that
didn’t was ത = −1. In addition, the variance of for these two sets of companies was ොଶ = 25.
Finally, 60% of companies issued dividends. Assuming that follows a normal distribution, predict
the probability that a company will issue a dividend this year given that its percentage return
was = 2 last year.

Answer: 0.52

8. The below table shows the results of a linear regression model: = ଴ + ଵଵ + ଶଶ + , where
is ETH, ଵ is BTC, and ଶ is USDT.

a. What are the (estimated) values of ଴, ଵ, and ଶ?
i. ଴ = −0.0014
ii. ଵ = 0.9786
iii. ଶ = 0.5199
b. What are the p-values of ଴, ଵ, and ଶ?
i. p-value of ଴ is 0.251
ii. p-value of ଵ is 0.000
iii. p-value of ଶ is 0.185
c. Are ଴, ଵ, and ଶ statistically significant with significance level of 0.05?
i. ଴ is not statistically significant
ii. ଵ is statistically significant
iii. ଶ is not statistically significant
d. What is 95% confidence interval of ଵ?
[0.910 1.047]

9. I made an object by using the following Python code:
data = ('abc', 'def', 'ghi')

and I tried to change data[2] from ‘ghi’ to ‘jkl’, but error occurred as follows:

a. Explain the reason of the error (your answer should not be “’tuple’ object does not support
item assignment”).
Tuples are “immutable”, i.e., they cannot be modified after creation.

b. Modify the original code “data = ('abc', 'def', 'ghi')” to avoid the error.
data = ['abc', 'def', 'ghi']

10. The below diagram shows KNN classification.

a. What is KNN prediction at ? with K=3? (your answer should be either Red or Green)
Green (or B)

b. What is KNN prediction at ? with K=7? (your answer should be either Red or Green)
Red (or A)

11. The below Python program generates a bootstrap data set:

import pandas as pd
df = pd.read_csv("Log returns.csv")
print(df)

Unnamed: 0 BTC ETH
0 12/31/2018 -0.040653 -0.057338
1 1/1/2019 0.039229 0.066670
2 1/2/2019 0.016204 0.090924
3 1/3/2019 -0.025809 -0.040519
4 1/4/2019 0.010956 0.040081
.. ... ... ...
394 1/29/2020 -0.011010 -0.011679
395 1/30/2020 0.022367 0.058923
396 1/31/2020 -0.017096 -0.026763
397 2/1/2020 0.005135 0.022159
398 2/2/2020 -0.006410 0.025335

[399 rows x 3 columns]

from sklearn.utils import resample
df_resmp = resample(df)

Calculate the probability that df is the same as df_resmp.
The resample( ) function allows repetition; hence, the number of outputs of df_resmp is 399ଷଽଽ.
Besides, there is only one event that df is the same as df_resmp. Therefore, the probability is
399ିଷଽଽ.

12. The below table shows the results of a logistic regression. Calculate the six missing values.

z = coef / std_err
[0.025 0.975] = [coef - 1.96*std_err coef + 1.96*std_err]

13. The below plot shows the results of KNN regression with K=8.

a. Is the parameter value of K=8 appropriate?
No

b. Why do you think so? (explain in terms of bias and variance)
Low bias, high variance (or overfitting)

c. Answer the following question if you think K=8 is inappropriate: What is an appropriate
parameter value of K?
Any value greater than 8

14. List two regression methods to model the relationship between X and Y in the below scatter plot,
and explain why you choose it.

Polynomial regression: this method is appropriate to represent a nonlinear relationship (e.g.,
quadratic function)
KNN regression: this method is appropriate to represent a nonlinear relationship
15. List two classification methods to build a classifier, given the below training data, and explain why
you choose it.

LDA: covariance matrices are similar among classes (or linear boundary is appropriate to perform
classification)
QDA: QDA works even if covariance matrices are similar
KNN classification: KNN works for linear boundary as well as for nonlinear boundary