辅导案例-ECMT6002

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

ECMT6002: Econometric Applications
School of Economics, University of Sydney,
Semester 2, 2019
Practice Questions (for Final Exam) - Answer key
INSTRUCTIONS
– The final exam is comprehensive and accounts for 50% of the final mark of ECMT6002.
– This is a closed-book exam. Non-programmable calculators are permitted but smart phones
should be turned off.
– Answer five (and five only) out of six questions clearly and concisely in the provided answer
booklet.
– Write down you SID on the exam paper and on the answer booklet.
– Return the exam paper with the answer booklet when you complete the exam.
Note: These sample questions are not meant to be exhaustive, covering every aspects of the unit;
rather they illustrate the format and type of questions to expect in the exams.
1
1. Let save denote the fraction of annual family income devoted to savings. We are interested in
determining whether the rich save a larger fraction of their income, other things equal. A simple
model is
ln(save) = β0 + β1age+ β2children+ β3earnings+ u (1.1)
where age denotes the age of the family head, children is the number of dependent children in
the household, and earning is the annual earnings of the family head (in $100,000). Based on a
sample, the following OLS estimates are as follows:
ln(save) = 0.039
(0.004)
+ 0.038
(0.017)
age− 0.125
(0.051)
children+ 0.025
(0.001)
earnings (1.2)
n = 366, R2 = 0.212
(a) Based on the model (1.1), what is the interpretation of β1?
The coefficient β1 measures the change in expected log of annual family income devoted to
savings due to an increase of the age of the family head, holding other factors (children,
earning) constant. Equivalently, 100β1 is the approximate percentage change in saving
due to an increase of the age of the family head, holding other factors (children, earning)
constant.
(b) Based on the estimates (1.2), what is the exact percentage effect of an extra child on the
expected level of save?
The exact percentage change due to an increase in children by 1 is given by
100(eβ24x − 1)% = 100(e−.125 − 1)% = 100(0.88− 1)% = −12%
that is, the exact percentage effect of an extra child on the expected level of save is −12%.
(c) Based on the estimates (1.2), test whether β2 is statistically significant at the 5% level,
against a two-sided alternative?
To test the hypotheses
H0 : β2 = 0
H1 : β2 6= 0
we use the t test statistic
t =
βˆ2
sd(βˆ2)
∼ tn−k−1
Decision Rule: Reject H0 in favor of H1 if |t| > c, where c = tn−k−1,α is the critical
value for the t distribution with df = n − k − 1 at a 5% significance level. In this case,
df = n− k − 1 = 366=3=1 = 362 and c = 1.96.
Decision: Since t = |=0.125/0.051| = 2.45 > 1.96, we reject the null H0 : β2 = 0 in favor
of the alternative H1 : β2 6= 0 at the 5% significance level.
Conclusion: The number of dependent children in the household, holding other factors
constant, has a significant effect on expected saving.
(d) Test whether model (1.2) has any explanatory power. Perform a test of the overall signif-
icance of the regression using a 1% level of significance. Note the F-test statistic is given
by
F =
(
R2ur −R2r
)
/q
(1−R2ur)/(n− k − 1) ∼ Fq,n−k−1
2
To test the hypotheses
H0 :β1 = 0, · · · β3 = 0
H1 :H0 is false
We use F test statistic
F =
R2UR/k
(1−R2UR)/(n− k − 1)
∼ Fk,n−k−1
Decision Rule: Reject H0 in favor of H1 if F > c, where c = Fk,n−k−1,α is the critical
value for the F distribution with df (k, n− k− 1) at α = 1% significance level. In this case,
k = 3, (n− k − 1) = 362 and c = 3.78.
Decision: Since F = 0.212/3(1−0.212)/362 = 32.46 > c, we reject the null H0 in favor of H1 at the
1% significance level.
Conclusion: Therefore model (1.2) has significant explanatory power.
(e) We are concerned the model contains heteroskedasticity. Outline the steps required to
estimate the model by generalized least squares (GLS).
The FGLS procedure to correct for heteroskedasticity
i. Estimate the model (1.1) by OLS and generate the squared residuals uˆ2;
ii. Create log(uˆ2) by squaring the residuals and then taking their natural log;
iii. Regress log(uˆ2) on age, children, earning; and obtain the fitted values εˆ;
iv. Exponentiate the fitted values to form hˆ = exp(εˆ);
v. Using 1/
√
hˆ as the weight to estimate the transformed model.
2. The following time series regression model relates the growth in real per capita consumption
(gcon) to the growth in real per capita income (ginc), the real interest rate (rint) and the
expected inflation rate (infle ):
gcont = β0 + δ0ginct + δ1ginct−1 + β1rintt + β2inflet + ut (2.1)
(a) What is the interpretation of δ0 and δ1?
δ0 measures the expected change in the growth in real per capita consumption in period t
due to one unit increase of the growth in real per capita income in the same period, holding
other factors constant. δ1 measures the expected change in the growth in real per capita
consumption in period t due to one unit increase of the growth in real per capita income
in the previous period (t− 1), holding other factors constant.
(b) Briefly explain the reason for including the lagged term ginct−1.
The growth in real per capita income in the previous period (t− 1) may affect the growth
in real per capita consumption in this period (t). This is likely to happen as consumers try
to smooth consumption over time.
(c) What conditions must the time process gcont satisfy for it to be weakly dependent? What
are the consequences of estimating a time series model using OLS with data that are not
weakly dependent?
Weak dependence restricts the strength of relationship between elements of the time series
process, gcont and gcont+h as the distance between them, h, gets large. Weak dependence
requires that gcont and gcont+h are “almost independent” as h → ∞. Applying OLS to
gcont when it is not weakly dependent can lead to “spurious regression” results.
3
(d) Why the time trend may not be necessary in (2.1)? Explain briefly.
In (2.1), the dependent variable gcont and the key regressors ginct and ginct−1 are the
growth from one period to the next period (that is, first difference).
(e) If the variables in (2.1) are measured quarterly, how the model (2.1) can be improved?
If the variables in (2.1) are measured quarterly, we need to control for seasonality. A simple
way is to add a set of dummies for the first three quarters into (2.1):
gcont =β0 + δ0ginct + δ1ginct−1 + β1rintt + β2inflet
+ δ1Q1t + δ2Q2t + δ3Q3t + ut
where Qjt = 1 if t is in the j-th quarter.
3. We are interested in analyzing the effect of the government building a new hospital on housing
prices in the suburb of Sydenham. Rumors that a new hospital would be built in Sydenham
began after 2006, and the hospital was built and began operating in 2008. We have data on the
prices of houses sold in Sydenham in 2006 and another sample on houses that sold in Sydenham
in 2010. The hypothesis we wish to test is that the price of houses located near the site of new
hospital would rise above the price of more distant houses. The data for each year includes the
dummy variable near which is equal to one if the house is located within 2 kilometers of the
new hospital. House prices, for both years of data, were measured in 2010 prices. The variable
rprice denotes the real house price (scaled by $100,000). The following simple regression model
was estimated using only the year 2010 sample of data
̂rprice = 10.131
(0.309)
+ 2.688
(0.788)
near (3.1)
n = 96, R2 = 0.199
while the following was estimated using only the 2006 sample of data
̂rprice = 9.252
(0.265)
+ 1.412
(0.671)
near (3.2)
n = 105, R2 = 0.106
(a) Explain one by one the interpretation of the estimates in model (3.2)?
The estimate 9.252 ($925,200) is the average selling price of houses in 2006 (without any
location effect); the coefficient estimate 1.412 is the location effect, that is, being near the
particular location (for the new hospital) is expected to increase the price by $141,200,
holding other factors constant. Also see lecture notes 8.
(b) Based on the estimates in (3.1) and (3.2), from 2006 to 2010, what is the average price
change for all houses in Sydenham?
Note that the average selling price for houses is 9.252 in 2008 and 10.131 in 2010. The
average price change for all houses in Sydenham is then 10.131 − 9.252 = 0.879, that is
$87,900.
(c) Explain why we cannot infer from the estimates in (3.1) that the location of the hospital
caused the price of houses located nearby to increase? What evidence from model (3.2)
supports this conclusion?
Before the rumor, for the houses nearby the location of the hospital to be built, on average,
4
the price was already higher than other houses, which is evident from (3.2). Therefore, we
cannot infer from the estimates in (3.1) that the location of the hospital caused the price
of houses located nearby to increase.
(d) Using the information from models (3.1) and (3.2), calculate the difference-in-differences
estimate of the impact of the new hospital on the price of nearby houses?
The difference-in-differences estimate of the impact of the new hospital on the price of
nearby houses is 2.688− 1.412 = 1.276, that is $126,600.
(e) Propose a linear regression model that can directly estimate the effect of new hospital on
housing price.
We define a new dummy for timing (of treatment): d2010it = 1 for t = 2010 and d2010it = 0
for t = 2006, the linear regression model can be written as
rprice = β0 + β1nearit + β2d2010it + β3nearit · d2010it + uit
4. An influential study tested for evidence of racial discrimination in the market for mortgages
(home loans). The dependent variable in the study was approve (an indicator variable =1 if
the loan is approved) and the explanatory variables considered were white (=1 if the applicant
is white), obrat (other financial obligations as a % of income), loanprc (the amount of loan /
price of the property), unem (the unemployment rate in applicants industry of employment),
male (=1 if the applicant is male), and cosign (=1 if there is a cosigner on the loan). The study
estimated the equation
Pr(approve = 1) = G(β0 + β1white+ β2obrat+ · · ·+ β6cosign) (4.1)
using the Probit and Logit models. The table below presents coefficient estimates for several
specification.
Table 4.1. Estimates of Binary Choice Models
Dependent Variable: employi Probit (1) Probit (2) Logit
white .714
(.120)
.660
(.123)
1.178
(.217)
obrat −.082
(.005)
−.038
(.011)
loanprc −1.345
(.315)
−2.414
(.583)
unem −.046
(.023)
−.083
(.040)
male −.019
(.009)
−.024
(.010)
cosign .081
(.355)
.156
(.659)
constant .566
(.105)
2.536
(.362)
4.570
(.679)
Observations (n) 1000 1000 1000
Log-Likelihood (LLF) -369.29 -348.46 -348.29
(a) The Probit and Logit models were estimated using the Maximum Likelihood Estimator
(MLE). What is the basic idea of Maximum Likelihood Estimation for obtaining coefficient
estimates?
The MLE for β is to find the value of β such that the log-likelihood function is maximized
where the log-likelihood is derived from (4.1).
5
(b) Based on the estimates in Table 4.1, test the null hypothesis that the set of explanatory
variables obrat, loanprc, unem, male and cosign are jointly insignificant in the Probit
model after controlling for white. Use the Likelihood Ratio Test and a 1% significance
level. The Probit (1) contains the estimation results for the restricted model, and Probit
(2) contains the unrestricted estimates.
The null and alternative hypotheses respectively are H0 : β2 = · · · = β6 = 0 and H1 :
H0 is false. We use the LR Test Statistic:
LRT = −2(lnLR − lnLUR) ∼ χ2q
= −2 ((−369.29)− (−348.46))
= 41.66
Rejection Rule: Reject H0 in favour of H1 if LRT > c, where c is the critical value
for the χ2 distribution with q = 5 and a 1% significance level. Now LRT = 41.66 and
c = 15.09. Decision: Since LRT > c, we reject the null at the 1% significance level.
Conclusion: The set of explanatory variables obrat, loanprc, unem, male and cosign are
jointly insignificant in the Probit model, after controlling for white.
(c) What is the interpretation of the coefficient β1 in Probit (2)? Is the partial effect of white on
the probability of loan approval comparable across the Probit (2) and Logit, respectively?
Explain.
We know that β1 itself cannot be interpreted as the marginal effect since the marginal effect
is given by
∂p(x)
∂xj
= f
(
x′β
)× βj
where f (x′β) is the density of standard normal (for Probit) or the density of logistic (for
Logit). The partial effect of white on the probability of loan approval across the Probit (2)
and Logit can be compared using the rule of thumb
βˆLPM ≈ 0.4βˆProbit
βˆLPM ≈ 0.25βˆLogit
βˆLogit ≈ 1.6βˆProbit
(d) Based on the estimates in Table 4.1, would you conclude that there is strong evidence of
racial discrimination in mortgage approvals? Explain briefly.
Although β1 cannot be interpreted as the marginal effect but they have the same positive
sign. Furthermore, β1 is significant in Probit (1), Probit (2) or Logit (as suggested by
the corresponding t values). Therefore, race has a statistically significant impact on the
likelihood of mortgage approvals. This is a strong evidence of racial discrimination in
mortgage approvals.
5. A recent survey of retirees asked individuals whether their overall happiness had changed follow-
ing retirement from the labour force. We are interested understanding what are the important
factors contributing to changes in well-being with retirement. The survey included measures
of individual’s Income (in $100), Wealth (in $10000), PoorHealth (which is an indicator of
whether health has declined since retirement) and Married (an indicator of marital status). Let
the change in overall happiness following retirement, Hi be a function of individual characteristics
and an idiosyncratic error term ui:
H∗i = β1Marriedi + β2Incomei + β3Wealthi + β4PoorHealthi + ui.
Although actual H∗i is not observed, individuals report Hi which indicates that H
∗
i falls into one
6
of 3 ordered categories {worse off, the same, better off}:
Hi = 1 if H
∗
i ≤ c1
Hi = 2 if c1 < H
∗
i ≤ c2
Hi = 3 if c2 < H
∗
i
Using the Ordered Logit model for Hi, the estimates are presented below:
Table 5.1. Ordered Logit Estimates.
Dependent Variable: Pr(Hi = j), j = 1, 2, 3
β̂ (se)
Married .306 (.080)
Income .009 (.010)
Wealth .118 (.103)
PoorHealth −.314 (.078)
c1 −1.213 (.198)
c1 .853 (.115)
Observations (n) 1344
R˜2 0.078
Log-Likelihood (LLF) -1031.09
(a) What is the interpretation of β̂1, the coefficient on the Married indicator variable? What
do we learn from the estimated value of 0.306 about the effects of Married on happiness
in retirement?
The coefficient in an Ordered Logit model cannot be directly interpreted by itself. For the
ordered Logit model with 3 categories, the marginal effects are given by
∂ Pr(Hi = 1)
∂x1
= −f (c1 − x′iβ)β1
∂ Pr(Hi = 2)
∂x1
=
[
f
(
c1 − x′iβ
)− f (c2 − x′iβ)]β1
∂ Pr(Hi = 3)
∂x1
= f
(
c2 − x′iβ
)
β1
where f is the density of logistic distribution. From the estimated value of 0.306 about
the effects of Married on happiness in retirement, we know β̂1 has the same sign as the
marginal effect of Married on the top category better off (Hi = 3). This suggests that
being married has a positive impact on the likelihood of being better off after retirement.
The magnitude of the marginal effect is proportional to β̂1 = 0.306.
(b) Based on the set of estimates in Table 5.1, what can we conclude about the relative impor-
tance of the different explanatory variables (Married; Income; Wealth and PoorHealth) in
determining post-retirement well-being? Explain.
As shown above, the magnitude of the marginal effect is proportional to the corresponding
coefficient. So the relative importance of the different explanatory variables depends on
their coefficient estimates. Based on the set of estimates in Table 5.1, in terms of magni-
tude, PoorHealth has the largest effect, followed by Married, and then by Wealth, Income
has the smallest effect.
7
(c) Using Stata to compute marginal effects, I found that the marginal effect of PoorHealth on
the probability of being better-off in retirement was -0.105 (with a standard error of 0.026).
Construct the 95% confidence interval for marginal effect of PoorHealth on the probability
of being better-off in retirement. Is zero in the confidence interval? Hint: M̂E4/se(M̂E4)
follows a tn−k−2 distribution (where -2 is for estimating the cutoff points).
The 95% confidence interval [ME4,ME4] for β4 is constructed by
ME4 = M̂E4 − se(M̂E4)tn−k−1,α/2 = −1.105− 0.051 = −1.156
ME4 = M̂E4 + se(M̂E4)tn−k−1,α/2 = −1.105 + 0.051 = −1.054
where M̂E4 = −0.105, se(βˆ2) = 0.026, and tn−k−1,α/2 = 1.96. So the 95% confidence
interval for ME4 is [−1.156,−1.054] and zero is not in the confidence interval.
(d) An alternative to the Ordered Logit model is the Multinomial Logit (MNL) model specifica-
tion. What are the advantages of the MNL specification? What, if any, are the limitations
of the MNL estimator for analyzing the determinants of changes in happiness with retire-
ment, H∗?
The main advantage of the MNL is that the MNL allows explanatory variables to have
different effects on different categories. The limitation of the MNL is that the MNL will be
less efficient than the Ordered Logit if the categories are indeed ordered.
6. Suppose that annual earnings (earnings) and alcohol consumption (alcohol) are determined by
the simultaneous equations models (SEM) below:
log(earnings) = β0 + β1alcohol + β2educ+ u1 (6.1)
alcohol = γ0 + γ1log(earnings) + γ2educ+ γ3log(price) + u2 (6.2)
where educ is the year of schooling, and price is a local price index for alcohol (including state
and local taxes). Assume that educ and price are determined outside of the SEM, and that
β1, β2, γ1, γ2, and γ3 are all different from zero. Both equations (6.1) and (6.2) are treated as
structural equations.
(a) Explain briefly why the regressors alcohol in (6.1) and earnings in (6.2) are both endoge-
nous.
To see alcohol in equation (6.1) is endogenous (that is, correlated to u1), plug (6.1) into
equation (6.2):
alcohol = γ0 + γ1 (β0 + β1alcohol + β2educ+ u1) + γ2educ+ γ3log(price) + u2
⇒ (1− γ1β1)alcohol = (γ0 + γ1β0) + (γ1β2 + γ2)educ+ γ3log(price) + (γ1u1 + u2), (6.3)
and we see that alcohol is correlated to u1 with coefficient γ1/(1− γ1β1) given that γ1 6= 0.
Similarly, plug (6.2) into equation (6.1) and it is clear that earnings in equation (6.2) is
correlated to u2 and therefore endogenous.
(b) Is equation (6.1) identified? How would you estimate the equation if it is identified?
Since alcohol in equation (6.1) is endogenous, at least one instrument is necessary to identify
(6.1). One candidate is log(price) from equation (6.2). It is exogenous (by assumption),
relevant to earnings (as can be seen from the reduced form for log(earnings) in (c)), and
excluded from (6.1). Therefore, equation (6.1) is identified. Usually we use the 2SLS to
estimate (6.1). Briefly state the estimation procedure of the 2SLS (see lecture notes).
8
(c) Find the reduced form equation for log(earnings) and explain briefly why the OLS applied
to this equation is consistent.
To find the reduced form equation for log(earnings), plug (6.2) into equation (6.1)
log(earnings) = β0 + β1alcohol + β2educ+ u1
= β0 + β1(γ0 + γ1log(earnings) + γ2educ+ γ3log(price) + u2) + β2educ+ u1
⇒ (1− β1γ1)log(earnings) = (β0 + β1γ0) + (β1γ2 + β2) educ+ β1γ3log(price) + (u1 + β1u2)
⇒ log(earnings) = (β0 + β1γ0)
(1− β1γ1) +
(β1γ2 + β2)
(1− β1γ1) educ+
β1γ3
(1− β1γ1) log(price) +
(u1 + β1u2)
(1− β1γ1)
given that (1 − β1γ1) 6= 0. The OLS applied to this equation is consistent because all the
regressors (educ, log(price)) are exogenous (by assumption).
(d) How is the interpretation of β1 in (6.1) on educ different from that of the coefficient on
educ in the reduced form equation for log(earnings) in (c)? Explain briefly.
Since equation (6.1) is treated as a structural equation, we may interpret (6.1) in a causal,
ceteris paribus fashion. In contrast, the reduced form equation (6.3) does not have a causal
interpretation.
(e) Is equation (6.2) identified? How would you estimate the equation if it is identified?
Similarly, log(earnings) in equation (6.2) is endogenous. Different from the case with (6.1),
there is no instrument for log(earnings). Therefore (6.2) is not identified (under identified).
To estimate the under identified equation (6.2), we need to find additional variables which
can be used as the instrument for log(earnings).
End of Exam Paper
9
10
11
Table for Critical Values of the Chi-Squared Distribution
12