辅导案例-GR 5411

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
1
Columbia University GR 5411 Econometrics I
MA in Economics Seyhan Erden

Problem Set 5
due on Dec. 7th at 10am through Gradescope
___________________________________________________________________


1. (Practice question, not graded) Consider the regression model = + . Partition as [( *] and as [(, *, ], where ( has ( columns and * has * columns. Suppose that *, = 0/0×(. Let = 3/0 0/0×/56.
(a) Show that 7,(,)7 = :7;,[(,)<(]<(:7;
(b) Consider the regression =>?@A@ = C + ((> + ⋯+ FF> + FG((> + ⋯+FGII> + > where > is the regression error term. Let = [1 ( * … I] where 1 is
the × 1 vector with OP element (>, and so forth. Let =?@A@ denote the vector of two
stage least squares residuals. (i) Show that ,=?@A@ = 0 (ii) Show that the method for
computing the statistic, using homoskedasticity-only F statistic and that using the
formula = =,S==,S= ( − − )⁄
produce the same value for the -statistic. (hint: use the results in part (a) and (b.i))



2. (10p) Consider a simple linear model: > = ( + *> + >
with (>, >) ≠ 0. Let be an exogenous, relevant instrument for this model, and
assume that is binary – taking on values of 0 or 1. Show the algebraic formula for the
OLS estimators and IV estimators for both 1 and *

3. (10p) Consider the linear model: = +



where is a × 1 parameter vector to be estimated.
Assume there exists instruments , with > columns, that satisfy:
2
[|] = 0 and [,|] = *.
Premultiplication of the linear model above by ′ yields the transformed model ′ = ′ + ′

Show that the efficient GLS estimator for in this transformed model is the same as the
2SLS estimator discussed in class.

4. (Practice question, not graded) Instrumental variable estimation of a labor supply equation:
Cornwell and Rupert (1988)1 analyzed return to schooling in panel data set of 595
observations on heads of households. The estimating equation is ()>O = ( + *>O + l>O* + m>O + o>O + r>O + tℎ>O +y>O + {>O + (C>O + ((>O + (*>O + (l>O + >O
Description of Variables Years of full time work experience. Weeks worked. 1 if blue-collar occupation, 0 otherwise. 1 if the individual works in a manufacturing industry, 0 otherwise. ℎ 1 if the individual resides in the south, 0 otherwise. 1 if the individual resides in an SMSA, 0 otherwise. 1 if the individual is married, 0 otherwise. 1 if the individual’s wage is set by a union contract, 0 otherwise. Years of education as of 1981 1 if the individual is female, 0 otherwise. 1 if the individual is black, 0 otherwise.
The equation suggested is a reduced form equation; it contains all the variables in the
model but does not specify the underlying structural relationship. In contrast, the
following three equation model is a structural equation system () € = C + ( + * + € () @ = C + ( + * + l + @

1 Cornwell, C. and P. Rupert. “Efficient Estimation with Panel Data: An Empirical Comparison of Instrumental
Variable Estimators” Journal of Applied Econometrics, 3, 1988, pp. 149-155
3
() € = @
Arguably, the supply side of this market might consist of a household labor supply
equation such as >O = ( + *()>O + l>O + m>O + o>O + >O
If the number of weeks worked and the accepted wage offer are determined jointly, then ()>O and errors are correlated due to simultaneous causality. We consider two IV
estimator sets for ()>O based on: ( = [>O, >O, >O, >O]
and * = [>O, >O, >O, >O, >O]
(a) Endogenous variables are ()>O and >O, and all other variables are
exogenous. Assume that ()>O is determined by >O and other
“appropriate” exogenous variables. From the discussion above, deduce what variables
would appear in a labor “demand” equation for ()>O and what variables
would serve as IV?
(b) Estimate the parameters of labor demand equation you suggested in part (a) by OLS
and by 2SLS and compare the results (ignore the panel data nature of the data set. Just
pool the data)
(c) Are the instruments used relevant? Explain.
(d) Are the instruments used exogeneous? Explain.

5. (Practice Question, not graded) Suggest an estimator that we discussed in class under
endogeneity, show the asymptotic properties of the estimator you suggested.


6. (10p) Given the following regression > = C + ((> + >

Assume that X is endogenous and Z is an instrumental variable (IV). By studying the
probability limit (plim) of the IV estimator we can see that when Z and u are possibly
correlated, we can write

4
7(,„… = (+ †‡II(S,ˆ)†‡II(S,‰) Š‹ŠŒ (1)

where ˆ and  are the standard deviation of u and X in the population, respectively. The
interesting part of this equation involves the correlation terms. It shows that, even if
Corr(Z,u) is small, the inconsistency in the IV estimator can be very large if Corr(Z,X) is
also small. Thus, even if we focus only on consistency, it is not necessarily better to use
IV than OLS if the correlation between Z and u are smaller than that between X and u.
Using the fact that (, ) = (, ) (ˆ. )⁄ along with the fact that 7( =( + †‡(‰,ˆ)…I(‰) = when (, ) = 0, we can write the plim of OLS estimator – call
it 7(,‘A@ - as

7(,‘A@ = (+ (, ) Š‹ŠŒ (2)

Assume that ˆ = , so that the population variance in the error term is the same as it
is in X. Suppose the instrumental variable, Z, is slightly correlated with u: (, ) =0.1. Suppose also that Z and X have somewhat stronger correlation: (, ) = 0.2.
(a) (5p) What is the bias in the asymptotic IV estimator?
(b) (5p) How much correlation would have to exist between X and u before OLS has more
asymptotic bias than TSLS?



7. (10p) In the discussion of the instrumental variables estimator, we showed that the least
squares estimator 7‡“” is biased and inconsistent. Nonetheless, 7‡“” does estimate something:
7‡“” = = + <(
Derive the asymptotic covariance matrix of 7‡“” and show that 7‡“” is asymptotically
normally distributed.


8. (35p) You will replicate and extend the work reported in Acemoglu, Johnson and Robin- son
(2001). The authors provided an expanded set of controls when they published their 2012
extension and posted the data on the AER website. This dataset is AJR2001 on the course
website.
(a) (3p) Estimate the OLS regression
log( )œ = 0.5 (1)
(0.06)
the reduced form regression
5
Ÿ = −0.61 log() +   (2) (0.13)
and the 2SLS regression: ( )œ = 0.94 (3) (0.16)
(Which point estimate is different by 0.01 from the reported values? This is a
common phenomenon in empirical replication).
(b) (3p) For the above estimates, calculate both homoskedastic and heteroskedastic-
robust standard errors. Which were used by the authors (as reported in (1)-(2)-(3)?)
(c) (3p) Calculate the 2SLS estimates by the Indirect Least Squares formula. Are they the
same?
(d) (3p) Calculate the 2SLS estimates by the two-stage approach. Are they the same?
(e) (3p) Calculate the 2SLS estimates by the control variable approach. Are they the
same?
(f) (4p) Acemoglu, Johnson and Robinson (2001) reported many specifications including
alternative regressor controls, for example latitude and africa. Estimate by least-
squares the equation for logGDP adding latitude and africa as regressors. Does this
regression suggest that latitude and africa are predictive of the level of GDP?
(g) (4p) Now estimate the same equation as in (f) but by 2SLS using log mortality as an
instrument for risk. How does the interpretation of the effect of latitude and africa
change?
(h) (4p) Return to our baseline model (without including latitude and africa ). The
authors reduced form equation uses log(mortality) as the instrument, rather than, say,
the level of mortality. Estimate the reduced form for risk with mortality as the
instrument. (This variable is not provided in the dataset, so you need to take the
exponential of the mortality variable.) Can you explain why the authors preferred the
equation with log(mortality)?
(i) (4p) Try an alternative reduced form, including both log(mortality) and the square of
log(mortality). Interpret the results. Re-estimate the structural equation by 2SLS using
both log(mortality) and its square as instruments. How do the results change?
(j) (4p) Calculate and interpret a test for exogeneity of the instruments.


9. (25p) Using MROZ.dta that we discussed in lecture.
(a) (3p) Run the following regression and report (copy/paste) your results, interpret the
coefficient of education.
() = C + ( +

(b) (3p) Explain why the regression in part (a) may suffer from endogeneity problem?
6

(c) (3p) Use the variable mother’s education (motheduc) as an instrumental variable for
education, report your result and interpret the coefficient of education, is it significant?
Why? Write this regression in regression equation form.

(d) (4p) Run the same regression in part (c) in two separate steps using OLS (that is: run the
first and second stage regressions separately) and report your results. What is the
difference between your result here and the result in part (c) where you used ivregression
command? Be specific. Also, write both stages in regression equation form.

(e) (4p) Use both mother’s and father’s education as instrumental variables with ivregress
2sls command, check for endogeneity. Write this regression in regression equation
form.

(f) (4p) Run the following regression using ivreg command and both motheduc and fatheduc
as instrumental variables for education:
() = C + ( + * + l +

Report your results in equation form as well. Also write the first and second stage in
general equation form (note that you will not see the first stage with ivreg command but
you do know what it is)

(g) (4p) Using the regression in part (f) test exogeneity of both IV’s (motheduc and fatheduc)






欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468