辅导案例-ECOM20001

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
ECOM20001: Econometrics 1
Final Exam 2019 S1 Solutions
Question 1: Multiple Choice (20 marks)
(1 mark per question)
1. Which of the following is not one the broad categories in which econometric analyses
typically fall under?
d. All of the above are broad categories for econometric analyses
2. Suppose you have a sample average of Y¯ = 12 with sample variance s2Y = 2 and
sample size n = 25. What is the 90% confidence interval for the population mean?
b. [11.53,12.47]
3. You estimate a single linear regression:
Yi = β0 + β1Xi + ui
and obtain a 99% confidence interval for β1 of [-0.50,0.10]. Which of the following
statements is necessarily true?
d. None of the above are necessarily true
4. Which of the following is correct about the value of R¯2 in a multiple linear regression
model?
c. It can be negative
5. Suppose you run the following regression:
Yi = β0 + β1Xi + ui
Further suppose there is an omitted variable Zi that is positively correlated with
Yi and negatively correlated with Xi. Finally, suppose we obtained a regression
coefficient βˆ1 < 0. Based on the omitted variable Zi, which of the following is true?
b. βˆ1 exhibits a negative omitted variable bias, causing it to be too large in
magnitude
6. Suppose you run the following regression:
ln(Yi) = β0 + β1 ln(Xi) + ui
Which is the correct interpretation of β1?
a. A 1% increase in Xi has an associated 1% increase in Yi
7. Suppose your OLS estimates suffer from imperfect multicollinearity, but where the
OLS assumptions hold. Your OLS estimates will be:
b. unbiased and inefficient
Page 1 of 15
8. What problem does classical measurement error in a regressor create for regression
analysis?
b. it causes regression coefficients to be biased toward 0
9. Consider the following estimation results from a polynomial regression:
Yi = 42
(12.72)
+ 12.9
(14.50)
Xi − 0.34
(17.92)
X2i + 0.01
(8.22)
X3i ; R¯
2 = 0.23, SER = 42.11
What is the conclusion that can be drawn from these regression results?
c. A scatter plot of Yi on Xi is likely to exhibit less than 2 changes (or inflec-
tion points) in curvature in the relationship, which can give rise to imperfect
multicollinearity in this regression
10. When testing a joint hypothesis with a multiple linear regression model, you should:
b. use the F-statistic and reject at least one of the hypotheses if the statistic
exceeds the critical value
Page 2 of 15
Question 2: Short Answer Questions (20 marks)
a. Consider the following estimated single linear regression model using a sample of
n = 100 observations:
Y = 10
(0.2)
− 0.5
(0.1)
ln(X); R¯2 = 0.15
where the coefficient estimates’ standard errors are in parantheses under the esti-
mates. Interpret the regression coefficient on ln(X) and compute the 99% confidence
interval on the coefficient. (6 marks)
The interpretation of the regression coefficient is a 1% increase in X is associ-
ated with a 0.5× 0.01 = 0.05 unit decrease in Y
The 99% confidence interval, with a critical value of 2.58 for testing a two-sided
hypothesis test that a regression coefficient equals 0. The 99% CI for the coefficient
on X in the above regression is thus:
[0.5− 2.58 ∗ 0.1, 0.5 + 2.58 ∗ 0.1] = [0.242, 0.758]
b. Write down the formula of the Homoskedasticity-only F-statistic, explain the steps
you would take to compute it, and state the null and alternative hypotheses for
the test that corresponds to the statistic. Briefly comment on the connection be-
tween the regression R2 and F-statistic for joint hypothesis testing highlighted by
the Homoskedasticity-only F-statistic. (6 marks)
The Homoskedasticity-only F-statistic is:
F act =
(SSRrestricted − SSRunrestricted)/q
SSRunrestricted/(n− k − 1) =
(R2unrestricted −R2restricted)/q
(1−R2unrestricted)/(n− k − 1)
where n is the number of observations, k is the number of regressors, q is the
number of restrictions. To compute the two SSR’s in the formula we have to run
two different regressions:
– SSRunrestricted is the unrestricted regression SSR (with k regressors) not im-
posing any restrictions,
– SSRrestricted is the regression SSR (with k − q unrestricted regressors) that
imposes the q restrictions under the null hypothesis H0
Specifically the null hypothesis that is tested with this F-statistic is:
H0 : βj = βj,0, βm = βm,0, . . . for a total of q restrictions
with alternative:
H1 : βj 6= 0 for at least one j, j = 1, . . . , k
Intuitively, if there is a large increase in SSR (or reduction in the R2) when the
q restrictions are imposed, the more likely the null hypothesis is to be false (and
rejected using the F-statistic).
Page 3 of 15
c. Suppose you estimate a single linear regression model Y = β0 +β1X+u and obtain
an OLS regression coefficient estimate of βˆ1. Holding all other aspects of the data
fixed except sample n size, explain why you are more likely to find that the OLS
regression coefficient estimate βˆ1 is statistically significantly different from 0 using
a two-sided hypothesis test as n grows. Make explicit use of appropriate equations
and/or formulas in answering your question. (8 marks)
The variance error formula for the OLS slope estimate is :
σ̂2
βˆ1
=
1
n
var((Xi − X¯)uˆi))
(var(Xi))2
with the standard error being:
SE(βˆ1) =

σ̂2
βˆ1
and the t-statistic formula for the test of the null that β1 equals 0 is given by:
tact =
βˆ1
SE(βˆ1)
Now suppose we hold the OLS estimate fixed at βˆ1, and consider only an increase
in sample size n. This causes SE(βˆ1) =

σ̂2
βˆ1
to fall as n is in the denominator
of σ̂2
βˆ1
. This causes tact to increase as SE(βˆ1) is in the denominator of tact. Now,
given that the p-value for the hypothesis test stated in the question is computed as
2Φ(−|tact|), this implies that increasing n will cause −|tact| to become smaller, and
hence the p-value to become smaller. As the p-value becomes smaller, by definition,
you are more likely to find βˆ1 is statistically significantly different from 0.
Page 4 of 15
Question 3: MetricsBars (10 marks)
A chocolate bar retailer asks you to evaluate the impact of a marketing campaign they
ran for their MetricsBars chocolate bar. They provide you with a raw dataset called
marketing.csv, which contains the following variables:
salesi: sales of MetricsBars in market i in dollars ($)
campaigni: a dummy variable equalling 1 if the marketing campaign was run in
market i, and is equal to 0 otherwise.
incomei: average individual income in market i in dollars ($)
These data are provided for a sample of n = 1000 markets, where 500 markets were
randomly chosen to have the marketing campaign, and the other 500 markets did not
have the marketing campaign. The sample average of salesi is $1,000,000 and the sample
average of incomei is $50,000.
MetricBars wants to inform the following question:
If household income increases from $50,000 to $70,000, how much more ef-
fective will the marketing campaign be at increasing MetricsBars sales in per-
centage terms?
Starting from the raw data in marketing.csv, write down the pseudo-code in R you
would develop to provide a relevant 95% confidence interval that empirically informs this
question. List the steps you would include in your code, and if it helps in describing
your answer, you may state explicit R code though this is not necessary for obtaining full
marks. Be precise and explicitly describe any variable scaling you would use, regressions
run in your code, hypothesis tests required, test statistics used, or any other calculations
necessary for computing the 95% confidence interval.
Page 5 of 15
Steps in the pseudo-code are as follows:
1. Load the AER() package, load the data, and rescale average houshold income by
$10000: incomei = incomei/10000
2. Compute a new logarithmic variable:
log_salesi = ln(salesi)
3. Compute a new interaction variable:
campaign_incomei = campaigni × incomei
4. Run the following regression, allowing for heteroskedastic standard errors:
log_salesi = β0 + β1campaigni + β2campaign_incomei + β3incomei + ui
5. From the estimated log-linear regression, the percentage change in sales associated
with the marketing campaign when household earnings goes from $50,000 to $70,000
is computed in 2 steps. Impact of marketing campaign at $50,000 is:
∆Yˆ1 = (β0 + β11 + β25 + β35)− (β0 + β35) = β1 + β35
and impact of marketing campaign at $70,000 is:
∆Yˆ2 = (β0 + β11 + β27 + β37)− (β0 + β37) = β1 + β37
where note the 5 and 7 in the calculation corresponds to the $10,000 scaling of
incomei above. So the change in the effectiveness of the marketing campaign when
income goes from $50,000 to $70,000 is:
∆Yˆ2 −∆Yˆ1 = 2β3
6. To compute the standard error of ∆Yˆ , we first run the following joint-hypothesis
test:
H0 : 2β3 = 0; vs. H1 : 2β3 6= 0
We compute the F-statistic from R associated with this test, call it F . With this,
we can compute SE(∆Yˆ ) as:
SE(∆Yˆ ) =
|∆Yˆ |√
F
7. Finally, we can compute the 95% CI of the change in the impact of the marketing
campaign when income goes from $50,000 to $70,000 as:
[∆Yˆ − 1.96× SE(∆Yˆ ),∆Yˆ + 1.96× SE(∆Yˆ )]
Page 6 of 15
Question 4: Wages, Experience, and Education (20 Marks)
The Department of Jobs and Small Business has approached you to study the relationship
between wages, experience, and education. They have provided you a dataset which
contains the following information from a random sample of n = 801 individuals:
wagei: hourly wage earned by individual i in dollars ($)
experi: experience of individual i working measured as the number of years they
have been working in the labour market
degreei: a dummy variable equalling 1 if individual i has a university degree and
equals 0 if they do not have a university degree.
agei: age of individual i in years
urbani: dummy variable equalling 1 if individual i lives in an urban location, and
equals 0 if they live in a regional location.
The Department wants to understand how wages change with work experience in the
labour market. Figure 1 on the next page produces summary statistics for these data,
and regression output from R for three different regressions that focus on the relationship
between wages and experience. Based on this output, answer the following questions.
Throughout assume a 5% level of significance in conducting hypothesis tests.
a. What percentage of individuals in the data set have a university degree? (2 marks)
71.4% of individuals in the data set have a university degree.
b. Interpret the statistical significance, sign and magnitude of the regression coefficient
estimate on experi in Regression 1. (3 marks)
A 1-year increase in work experience is associated with a $0.41 increase in an in-
dividual’s hourly wage. This relationship is statistically significant at the 5% level
as it has a p-value of 0.007 < 0.05 (could also compare the 21.18 t-statistic to the
1.96 critical value).
c. The regression coefficient on experi changes substantially between Regressions 1
and 2. Carefully explain what might drive this large change in the regression coef-
ficient on experi between Regressions 1 and 2. (4 marks)
There are two important correlations to consider when assessing the omitted vari-
able bias:
– agei is likely positively correlated with experi as older people tend to have
more work experience, call this correlation +
– agei is likely positively correlated with wagei as older people tend to have
higher earnings over time, call this correlation +
Therefore, the sign of the bias when agei is in the error term in Regression 1 is
sign(bias) = + × + = +. That is, by not controlling for agei we have positive
upward bias in the coefficient on experi in Regression 1. Once we remove this bias
Page 7 of 15
in Regression 2 by controlling for agei, we see as argued that the coefficient on
experi becomes smaller.
d. Interpret the statistical significance, sign and magnitude of the regression coefficient
estimate on degreei in Regression 3. (3 marks)
Holding experience, age, and urban location fixed, relative to an individual without
a university degree, and individual with a university degree is expected to have a
$18.05 higher hourly wage. The difference is statistically significantly different from
0 at the 5% level based on the p-value which is less than 0.01 (2 e-16).
e. Based on the regression output in Figure 1, is there evidence of a nonlinear rela-
tionship between wagei and experi? Carefully state the null and alternative for the
relevant hypothesis test for conducting this test, and highlight what regression co-
efficient estimate(s), t-statistic(s), and p-value(s) in Figure 1 allow you to conduct
such a test for a nonlinear relationship. (3 marks)
Holding other factors fixed, the coefficient on exper_sqi in the quadratic regression
in Regression 3 is what determines if there is a nonlinear relationship between wagei
and experi. The relevant hypothesis to test is the null that this coefficient equals
0 against the two-sided alternative that it is not equal to 0. From the regression
output, we see that this test has a corresponding p-value of 0.021 < 0.05 which
implies it is statistically significant at the 5% level. Therefore, statistically, the
regression implies a nonlinear relationship between wagei and experi.
f. Based on Regression 3, holding age and urban status fixed, for an individual start-
ing with 0 years experience and without a university degree, how many years will
they have to work in the labour market to “catch up” in terms of expected wages
to an individual with 0 years experience but with a university degree? (5 marks)
Based on the results in Regression 3, wages grow with experience at a rate of
0.38 exper - 0.014 exper2. Moreover, holding other factors fixed, an individual with
a university degree is expected to have a $18.05 higher wage than an individual
without a university degree. Therefore, starting from no experience, the number of
years experience required for someone without a degree to catch an individual with
a degree and no experience solves the following equation:
18.05 = 0.38exper − 0.014exper2
In this particular case, however, the regression yields a quadratic formula that does
not have an real solutions. This implies that the model predicts that starting from
no experience, an individual without a degree will never catch and individual with
a degree and no experience.
Page 8 of 15
Figure 1: Summary Statistics and Estimation Results for the Wage
Regressions
Page 9 of 15
Question 5: Understanding the Demand for Sunscreen (20 Marks)
The Department of Health wants to understand the relationship between sunscreen de-
mand month-to-month, and the number of tourists visiting Australia. To conduct this
analysis, you have been provided the following dataset:
sunscreent: total bottles of sunscreen sold in terms of 1000s of bottles in month t
lag_sunscreent: one-month lag of sunscreent (e.g., lag_sunscreent = sunscreent−1)
touristst: total number of tourists visiting Australia in terms of 1000s of tourists
in month t
lag_touristst: one-month lag of touristst (e.g., lag_touristst = touristst−1)
montht: month of year for month t, taking on one of 12 values in the following list:
{Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec}
You have this information for T = 135 months. Figures 2 and 3 on the next two pages
respectively present time series plots and regression output generated by R that analyzes
these data. Based on this information, please answer the following questions:
a. Based on Figure 2 explain whether sunscreent appears to be stationary and whether
it exhibits seasonality. Similarly, based on Figure 2 explain whether touristst ap-
pears to be stationary and whether it exhibits seasonality. (2 marks)
Both time series plots do not exhibit a persistent trend, which suggests both are
stationary. Also both time series plots exhibit cyclical patterns, which points to
strong seasonality. The seasonality is somewhat clearer for touristst, but we also
see regular spikes in sunscreen sales.
b. Explain what the residuals from in Regression 1 in Figure 3 would represent com-
pared to the raw sunscreent time series in the data set. (2 marks)
Note: For the remainder of the question, note that Figure 3 is displayed over pages
11 and 12 below.
Just before Regression 1 in the R code, a set of month-of-year time dummies is being
created. Therefore, the residuals from this regression would remove any month-of-
year seasonal patterns in sunscreent, implying that this time series would exhibit
far less cyclicality than the raw time series for sunscreent.
c. Explain how Regression 1 avoids a potential dummy variable trap based on the
dummy variables created in the code, and the dummy variables included in the
regression. (2 marks)
The regression drops the December dummy variable to avoid the dummy variable
trap via the constant in the regression. If all 12 month dummies were included,
they would be colinear with the constant.
d. Suppose that the last data point at T = 135 in the sample is January, has a value of
sunscreen135 = 8, and a value of tourists135 = 12. Based on Regression 2 in Figure
Page 10 of 15
3, what would be your out-of-sample forecast for sunscreent in period T = 136 and
the 95% forecast interval assuming IID normal errors in the regression equation in
Regression 2? Work with 3 digits after the decimal in conducting your calculations.
(4 marks)
From the regression equation, the out-of-sample forecast would be:̂sunscreenT+1 = 0.941 + 0.178× 8 + 0.118× 12 + 0.028 = 3.908
The SER in the regression is 0.096, which implies a 95% forcast interval of:
[3.908− 1.96 ∗ 0.096, 3.908 + 1.96 ∗ 0.096] = [3.719, 4.096]
e. Compute the Bayes-Schwartz Information Criterion for Regressions 1 and 2 in Fig-
ure 3 and explain which is the preferable time series model based on this criterion.
Work with 3 digits after the decimal in conducting your calculations. (6 marks)
The BIC for the two regressions, with K = 12 and K = 14 parameters respec-
tively, can be computed as:
– Regression 1:
BIC(12) = ln
[
SSR(12)
135
]
+ 12
ln(135)
135
– Regression 2:
BIC(14) = ln
[
SSR(14)
135
]
+ 14
ln(135)
135
We can compute the SSRs from the SER reported in the regression using the SER
formula:
SER = suˆ =

s2uˆ; s
2
uˆ =
SSR
n− k − 1
implying
SSR = SER2 × (n− k − 1)
and thus from the regression output for Regressions 1 and 2
SSR(12) = 0.1× 0.1× (135− 11− 1) = 1.23
and
SSR(14) = 0.096× 0.096× (135− 13− 1) = 1.115
So the BICs are computed as:
– Regression 1:
BIC(12) = ln
[
1.23
135
]
+ 12
ln(135)
135
= −4.26
– Regression 2:
BIC(14) = ln
[
1.115
135
]
+ 14
ln(135)
135
= −4.29
which implies Regression 2 is the preferred model according to the BIC since it
yields a smaller value.
Page 11 of 15
f. Conduct a Granger Causality test to determine whether touristst Granger Causes
sunscreent based on the regression results in Regression 2 in Figure 3. Assume a
5% level of significance in conducting the test. Work with 3 digits after the decimal
in conducting your calculations. (4 marks)
Letting β2 be the coefficient on lag_touristst, the Granger Causality test in this
application has the following null and alternative hypothesis:
H0 : β2 = 0 vs. H1 : β2 6= 0
From the regression results, we see this has a t-statistic of 2.4867, implying a F-
statistic for the test (which is the square of the t-statistic) of F = 2.487× 2.487 =
6.185. The degrees of freedom for the test is q = 1 and (T −max{p, q1, . . . , qk} −
p−∑k`=1 q` − 1 = 135− 1− 13− 1 = 120).
From the critical values table for the 95th percentile of the F(1,120) distribution
at the end of the exam paper, we have a critical value of 3.92 for the test. Since
6.185 > 3.92, we conclude from the test at the 5% level that touristst does in fact
“Granger cause” sunscreent.
Page 12 of 15
Figure 2: Time Series Plots of Sunscreen Sales and Tourists by Month
0 20 40 60 80 100 120 140
1.
1
1.
2
1.
3
1.
4
1.
5
1.
6
1.
7
Sunscreen Sales by Month
Month
Su
ns
cr
ee
n
Sa
le
s
in
1
00
0s
0 20 40 60 80 100 120 140
1.
0
1.
5
2.
0
2.
5
Number of Tourists by 1000s
Month
N
um
be
r o
f T
o
u
ris
ts
in
1
00
0s
Page 13 of 15
Figure 3: Estimation Results for the Time Series Regressions
Page 14 of 15
Estimation Results for the Time Series Regressions (Figure 3 continued)
Page 15 of 15

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468