程序代写案例-1ECON 5027

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
1ECON 5027 Lecture 4: Introduction to Instrumental
Variable Methods
Professor Thomas Russell
Carleton University
2Table of Contents
1 Review
2 Introduction to IV Methods
3 Properties of the Linear IV Estimator
4 Some Additional Remarks on IV Estimators
5 Preview for Next Lecture
3Table of Contents
1 Review
2 Introduction to IV Methods
3 Properties of the Linear IV Estimator
4 Some Additional Remarks on IV Estimators
5 Preview for Next Lecture
4From Last Lecture
In the last lecture we discussed the problems of omitted variable bias.
However, we did not provide satisfactory solutions to this problem.
In this lecture, we discuss the method of instrumental variables (IV).
IV methods are widely used by applied researchers.
There is a lot of “jargon” with IV models that is important to know in order
to discuss with other researchers.
5Table of Contents
1 Review
2 Introduction to IV Methods
3 Properties of the Linear IV Estimator
4 Some Additional Remarks on IV Estimators
5 Preview for Next Lecture
6Introduction to IV Methods
To gain some intuition, suppose that the model we wish to estimate is:
Yi = Xiβ + εi,
but suppose that Xi is endogenous in the sense that E[εi|Xi] 6= 0.
This may be because Xi is correlated with an omitted variable, or if
Xi = X∗i + ei is the mismeasured version of the true regressor X∗i with
E[ei] = E[ei|X∗i ] = 0 (as in CEV).
In this environment, IV methods attempt to obtain consistent estimates of β
by means of an instrumental variable, or simply “an instrument.”
An instrument Zi is a random variable that satisfies two criteria:
1 (Exclusion Restriction): E[εi | Zi] = 0.
2 (Relevance or First Stage): E[ZiXi] 6= 0.
Note we will slightly modify these conditions later when we have multiple
regressors Xi and instruments Zi.
7Introduction to IV Methods
Intuitively, the exclusion restriction says that Zi cannot affect Yi beyond its
effects on Xi.
Note: The equation Yi = Xiβ + εi is sometimes called the structural
equation.
I.e. β usually is measuring some important parameter that we believe governs
the true DGP.
Intuitively, (if we first de-mean Xi and Zi) the relevance condition says that
Zi and Xi have a non-zero correlation.
Equivalently imposed as the condition δ1 6= 0 in the linear projection (i.e.
OLS solution to):
Xi = δ0 + Ziδ1 + νi.
The linear projection equation Xi = δ0 + Ziδ1 + νi is sometimes call the
reduced form equation.
Note, we have not assumed that E[νi|Zi] = 0. Thus, this equation has no
structural interpretation; it can be interpreted as a linear approximation to
the (possibly highly non-linear) CEF E[Xi | Zi].
I.e. the coefficients δ0 and δ1 may not actually be measuring any parameters
that exist in reality.
8Introduction to IV Methods
So how does the existence of an instrument Zi help us?
Note that we have:
E[Ziεi] = E[ZiE[εi | Zi]] = 0.
The condition E[Ziεi] = 0 is called a moment condition.
We then have:
0 = E[Ziεi] = E[Zi(Yi −Xiβ)] = E[ZiYi]− E[ZiXi]β =⇒ β = E[ZiXi]−1E[ZiYi],
where the relevancy condition ensures E[ZiXi] has a well-defined inverse.
With i.i.d. observations {(Yi, Xi, Zi)}ni=1, this suggests a straightforward
“sample analog estimator” given by:
βˆIV =
(
1
n
n∑
i=1
ZiXi
)−1(
1
n
n∑
i=1
ZiYi
)
.
βˆIV is called the IV estimator.
9Introduction to IV Methods
Note that the relevancy condition can be tested.
In particular, we can test perform a t-test on the slope coefficient δ1 in the
linear projection:
Xi = δ0 + Ziδ1 + νi.
However, the exclusion restriction is in general an untestable assumption.
Because it specifies a relationship between Zi and εi and εi is unobservable.
This assumption must be argued to hold based on knowledge of the economic
environment.
Note that the IV approach is very different from the proxy variable approach.
Under proxy variables, we want a variable that is highly correlated with the
latent variable.
Under the IV approach, we want an instrument that is uncorrelated with the
latent variable.
What are some examples of environments where IV might be appropriate?
10
Introduction to IV Methods
Example: Returns to Schooling
Suppose we would like to estimate the returns to schooling using the linear
regression model:
log(wagei) = β0 + educiβ1 + εi,
where wagei is an individual’s hourly wage, and educi is some measure of
schooling.
For example, educi might be the number of years of education.
The difficulty in this case is that an individual’s wage and education are likely
both correlated with omitted variables in εi; for example, intelligence, work
ethic, family background characteristics, or “ability.”
If these factors determine educi but not wagei there is no issue.
If these factors determine wagei but not educi there is also no issue.
These factors must affect both wagei and educi.
11
Introduction to IV Methods
Returns to Schooling Example (cont’d)
Given the presence of omitted variables in εi, from the previous lecture we
know (in general) OLS estimates of β will be biased and inconsistent.
Some proposed instruments:
Mother’s education, given by motheduci.
Last digit of SIN, given by sini.
Quarter of birth, given by frstqrti (a dummy equal to 1 if born in first
quarter).
The variable motheduci likely satisfies the relevance condition, but not the
exclusion restriction.
There is often a strong relation between education of the parent and
education of the child.
However, parental education is probably also correlated with factors omitted
from the structural equation.
12
Introduction to IV Methods
Returns to Schooling Example (cont’d)
The variable sini likely satisfies the exclusion restriction, but not the
relevance condition.
The last digit of a person’s SIN is randomized.
Because of randomization, sini is likely independent of all factors omitted
from the structural equation.
However, because of randomization sini is likely unrelated to a person’s
educational attainment.
Angrist and Kruger (1991) suggest using the quarter of birth instrument.
They argue that quarter of birth is unrelated to factors omitted from the
structural equation that are related to educi.
They argue that quarter of birth determines years of schooling through
compulsory education laws.
13
Introduction to IV Methods
Returns to Schooling Example (cont’d)
A long time ago drop out rates were much higher.
Suppose students are required to register for Kindergarten in the 1935 school
year if their 5th birthday is on or before December 31, 1934.
Suppose the legal drop out age is 16.
Now consider individuals who drop out of school.
Then someone who turns 5 on January 1, 1935 will always have 1 year less
education than someone who turns 5 on December 31, 1934.
14
Introduction to IV Methods
Returns to Schooling Example (cont’d)
If the quarter of birth instrument satisfies our IV conditions, then it can be
used as an IV to obtain consistent estimates of β1 in the equation:
log(wagei) = β0 + educiβ1 + εi,
Note that there is an inherent tension between the exclusion restriction and
relevance condition.
Any instrument that perfectly explains Xi is clearly very relevant, but will
fail the exclusion restriction by construction.
Any instrument that is randomly generated will almost certainly satisfy the
exclusion restriction, but will also likely not be very relevant.
In RCTs, the randomization device (i.e. assignment to treatment or control
groups) works as an excellent instrument.
“Natural experiments,” which are characterized by the occurrence of some
event that can be used as an instrument.
For example, Angrist and Evans (1998) study the effects of family size on
female labor market outcomes using the gender of the first two children as an
instrument.
15
Table of Contents
1 Review
2 Introduction to IV Methods
3 Properties of the Linear IV Estimator
4 Some Additional Remarks on IV Estimators
5 Preview for Next Lecture
16
Properties of the Linear IV Estimator
We will now consider the general case with many instruments and many
(possibly endogenous) regressors.
Assumption (2SLS1)
The true DGP is given by Yi = x>i β + εi, where β ∈ Rdx is a finite parameter
and where {(xi, zi, εi)}ni=1 is an independent and identically distributed sequence
of integrable random vectors with xi ∈ Rdx , zi ∈ Rdz and εi ∈ R satisfying
E[εi | zi] = 0 and E[ε2i | zi] = σ2 <∞.
Note we have not placed any assumptions on the dependence between xi and
εi.
The condition E[εi | zi] = 0 is the appropriate multivariate analog of the
exclusion restriction.
The condition E[εi | zi] = 0 is slightly stronger than necessary for our
purposes, but is a nice analog to our OLS assumptions.
This delivers us our key moment condition:
E[ziεi] = E[ziE[ε | zi]] = 0.
17
Properties of the Linear IV Estimator
As usual, let Z denote the n× dz matrix obtained by vertically stacking the row
vectors z>i .
Assumption (2SLS2)
The following are satisfied:
(a) There exists a positive finite constant ∆ such that for all i, E[ε2i ] < ∆,
E[X2ij ] < ∆ and E[Z2ik] < ∆ for all j = 1, . . . , dx and k = 1, . . . , dz.
(b) The matrices E[ziz>i ] and E[zix>i ] are finite and have full column rank, and
E[ziz>i ] is positive definite.
Part (b) imposing full column rank of E[zix>i ] provides the multivariate
analog of the relevancy (or first-stage) condition.
This is sometimes called the rank condition.
A necessary (but not sufficient) condition for the rank condition is the order
condition, which says that dz ≥ dx.
NOTE: we will always include any exogenous variables from the vector xi in
the vector zi. Given this, the order condition says we need at least as many
instruments as there are endogenous variables.
18
Properties of the Linear IV Estimator
If dz < dx, the model is said to be under-identified.
If the rank condition holds and dz = dx, the model is said to be
just-identified.
If the rank condition holds and dz > dx, the model is said to be
over-identified.
In the just-identified case, E[Z>X] is invertible, so we have:
E[Z>ε] = 0 =⇒ E[Z>(y −Xβ)] = 0
=⇒ E[Z>X]β = E[Z>y]
=⇒ β = E[Z>X]−1E[Z>y].
This again leads us back to the (now generalized) IV estimator:1
βˆIV :=
(
1
n
n∑
i=1
zix
>
i
)−1(
1
n
n∑
i=1
ziYi
)
However, when the model is over-identified, E[Z>X] is no longer invertible.
1βˆIV is defined for n large enough under Assumption 2SLS2.
19
Properties of the Linear IV Estimator
Recall our dz × 1 moment conditions E[Z>(y −Xβ)] = 0.
When dz > dx, verify there may be no value of β that satisfies all these
moment conditions exactly!
Indeed, there are dz moments (or equations), and only dx parameters β.
Instead we often target a value of β that minimizes the quadratic form:
Q(β,W ) :=
(
E[Z>(y −Xβ)]
)>
W
(
E[Z>(y −Xβ)]
)
,
where W is a dz × dz weighting matrix.
The sample analog of this quadratic form is given by:
Qn(β, Ŵ ) :=
(
1
n
n∑
i=1
zi(yi − x>i β)
)>

(
1
n
n∑
i=1
zi(yi − x>i β)
)
,
where Ŵ can be some stochastic matrix estimator of W . Then choose β to
minimize Qn(β, Ŵ ).
This is an instance of the generalized method of moments (GMM).
20
Properties of the Linear IV Estimator
From the previous slide:
Qn(β, Ŵ ) :=
(
1
n
n∑
i=1
zi(yi − x>i β)
)>

(
1
n
n∑
i=1
zi(yi − x>i β)
)
,
The idea of GMM is natural: choose the value of β that minimizes the
weighted, squared violations of the sample analog moment conditions.
GMM is VERY general, and unifies many estimators we have already seen.
Replace each zi with xi and Ŵ with I. Then βˆ (the OLS estimator)
minimizes Qn(β, Ŵ ).
Replace each zi with xi and Ŵ with Ω̂−1. Then βˆFGLS minimizes
Qn(β, Ŵ ).
Suppose dz = dx and replace Ŵ with I. Then βˆIV minimizes Qn(β, Ŵ ).
How should we set Ŵ in the case of an over-identified linear IV model?
21
Properties of the Linear IV Estimator
For now we will set Ŵ = (Z>Z/n)−1 and will define the two-stage least
squares (2SLS) estimator βˆ2SLS as:
βˆ2SLS := arg min
β∈Rdx
Qn(β, Ŵ )
Lemma
Under assumptions 2SLS1-2SLS2, for large enough n the matrix n−1(X>PzX) is
finite and non-singular a.s.
22
Properties of the Linear IV Estimator
Proof (cont’d):
23
Properties of the Linear IV Estimator
Proof (cont’d):
24
Properties of the Linear IV Estimator
Proof (cont’d):
25
Properties of the Linear IV Estimator
Proof (cont’d):
26
Properties of the Linear IV Estimator
Proposition
Suppose that 2SLS1-2SLS2 hold. Then (for n large enough) the estimator βˆ2SLS
is given by:
βˆ2SLS = (X>PzX)−1X>Pzy.
Proof:
27
Properties of the Linear IV Estimator
So what are the properties of βˆ2SLS?
Taking expectations we have:
E[βˆ2SLS ] = β + E[(X>PzX)−1X>Pzε].
In general, there are no reasonable additional assumptions which guarantee
the second term is zero.
Under our current assumptions, the second expectation may not even be
defined!
Unbiasedness is thus not relevant in the context of IV estimators, but we can
still discuss consistency.
Before doing so, we should answer an important question: why is βˆ2SLS
called the two stage least squares estimator?
28
Properties of the Linear IV Estimator
Lets suppose dx = 1 so that X is a vector.
Then PzX is just the OLS fitted value of a regression of X on Z.
The OLS coefficient of a regression of y on the fitted value PzX is given by:
γˆ = ((PzX)>(PzX))−1(PzX)>y.
But Pz is a projection matrix: symmetric (P>z = Pz) and idempotent
(Pz = PzPz).
Thus:
γˆ = ((PzX)>(PzX))−1(PzX)>y
= (X>PzPzX)−1X>Pzy
= (X>PzX)−1X>Pzy
= βˆ2SLS .
Thus, βˆ2SLS can be obtained by (i) regressing X on Z and obtaining the
fitted values PzX, and (ii) regressing y on the fitted values PzX. Hence its
name!
29
Properties of the Linear IV Estimator
Theorem
Under Assumption 2SLS1-2SLS2, we have βˆ2SLS is (strongly) consistent for β.
Proof:
30
Properties of the Linear IV Estimator
Proof (cont’d):
31
Properties of the Linear IV Estimator
Proof (cont’d):
32
Properties of the Linear IV Estimator
Proof (cont’d):
We will now derive the asymptotic distribution.
Theorem
Under Assumption 2SLS1-2SLS3, the 2SLS estimator βˆ2SLS satisfies:

n(βˆ2SLS − β) d→ ξdz ∼ N(0,Σ2SLS),
where:
Σ2SLS = σ2
(
E
[
xiz
>
i
]
E
[
ziz
>
i
]−1
E
[
zix
>
i
])−1
.
33
Properties of the Linear IV Estimator
Proof:
34
Properties of the Linear IV Estimator
Proof (cont’d):
35
Properties of the Linear IV Estimator
Proof (cont’d):
Note that this implies:
βˆ2SLS
a∼ N(β, n−1Σ2SLS)
We have:
n−1Σ2SLS
= σ
2
n
((
plim
n→∞
1
n
n∑
i=1
xiz
>
i
)(
plim
n→∞
1
n
n∑
i=1
ziz
>
i
)−1(
plim
n→∞
1
n
n∑
i=1
zix
>
i
))−1
How should we estimate the covariance matrix?
36
Properties of the Linear IV Estimator
To estimate the covariance matrix in the previous theorem, typically we first
estimate the 2SLS residuals:
εˆi = yi − x>i βˆ2SLS .
We then take estimate σ2 using:
σˆ2 = 1
n− dx
n∑
i=1
εˆ2i .
A consistent estimator of the covariance matrix n−1Σ2SLS is then:
Σ̂2SLS = σˆ2
(
X>PzX
)−1
.
NOTE: This estimates n−1Σ2SLS , not Σ2SLS .
NOTE: the asymptotic standard error of βˆj,2SLS is the square-root of the jth
diagonal element.
37
Properties of the Linear IV Estimator
We will not prove any theoretical results for the case of heteroskedasticity.
However, a “White-Type” heteroskedasticity robust covariance matrix exists
for the 2SLS estimator.
In particular, the “robust” asymptotic covariance matrix estimator is:
(X>PzZ>)−1
(
X>Pz εˆ>εˆ PzX
)
(X>PzZ>)−1.
There is one loose end to tie off...recall that the 2SLS minimizes:
Qn(β, Ŵ ) :=
(
1
n
n∑
i=1
zi(yi − x>i β)
)>

(
1
n
n∑
i=1
zi(yi − x>i β)
)
,
where we chose Ŵ = (Z>Z/n)−1.
But why choose this form for Ŵ ? It turns out this choice was optimal in a
specific sense.
38
Properties of the Linear IV Estimator
For an arbitrary choice of positive definite Ŵ (at least a.s. for large enough
n), we have that the minimizer of Qn(β, Ŵ ) is of the form:
β˜ = (X>ZŴZ>X)−1X>ZŴZ>y.
This follows by a similar proof to our previous Lemma/Proposition for the
2SLS estimator.
White (1984) Definition 4.43: For two

n−consistent (for θ0),
asymptotically normal estimators θˆ1 and θˆ2, the estimator θˆ1 is
asymptotically efficient relative to θˆ2 if and only if for all n sufficiently
large Avar(

n(θˆ2 − θ0))−Avar(√n(θˆ1 − θ0)) is positive semi-definite for any
θ0.
Given a class of estimators, an estimator is asymptotically efficient within the
class if it is asymptotically efficient relative to every member of the class.
Theorem
Suppose Assumptions 2SLS1-2SLS2 hold, and suppose Ŵ a.s.→ W , a symmetric
and positive definite matrix for large enough n. Then the 2SLS estimator βˆ2SLS
is asymptotically efficient in the class of estimators of the form:
β˜ = (X>ZŴZ>X)−1X>ZŴZ>y.
39
Properties of the Linear IV Estimator
Before proving this Theorem, we must prove the following Lemma:
Lemma
Suppose A and B are two k × k symmetric and p.d. matrices. Then A−B is
p.s.d if and only if B−1 −A−1 is p.s.d.
Proof:
40
Properties of the Linear IV Estimator
Proof (cont’d):
41
Properties of the Linear IV Estimator
Now we can prove the asymptotic efficiency of the 2SLS estimator:
Proof (2SLS Efficiency Theorem):
42
Properties of the Linear IV Estimator
Proof (cont’d):
43
Properties of the Linear IV Estimator
Proof (cont’d):
44
Table of Contents
1 Review
2 Introduction to IV Methods
3 Properties of the Linear IV Estimator
4 Some Additional Remarks on IV Estimators
5 Preview for Next Lecture
45
Some Additional Remarks on IV Estimators
2SLS is useful to resolve the problem of omitted variables when instruments
are available satisfying the exclusion restriction and rank condition.
Note we have shown the 2SLS estimator is:
1 generally (very) biased;
2 consistent;
3 asymptotically normal; and,
4 asymptotically efficient within a certain class of estimators.
Furthermore, the 2SLS estimator is a natural generalization of the OLS
estimator and the IV estimator.
Reduces to the OLS estimator if all regressors are exogenous.
Reduces to the IV estimator if dz = dx (“just-identified” case).
2SLS can also be useful for the measurement error problem.
46
Some Additional Remarks on IV Estimators
Recall the measurement error problem from the previous lecture.
The true model is:
Yi = Xi1β1 + . . .+X∗idβd + εi.
with E[ε | Xi1, . . . , X∗id] = 0.
Suppose that X∗id is unobserved, but we observe:
Xid = X∗id + eid,
The model we can estimate is:
Yi = Xi1β1 + . . .+Xidβd − eidβd + εi.
Intuitively, the source of inconsistency in the measurement error problem is
the fact that Xid is correlated with the composite error term εi − eidβd.
47
Some Additional Remarks on IV Estimators
The IV solution to this problem is to find an instrument Zid for the variable
Xid that, intuitively, is (i) uncorrelated with the composite error term
εi − eidβd, and (ii) explains some of the variation in Xid.
One common instrument is simply another “measurement” of X∗id:
Zid = X∗id + vid,
Define the vectors:
xi =

Xi1
...
Xid−1
Xid
 zi =

Xi1
...
Xid−1
Zid

Then for IV consistency the main conditions are (i) E[εi − eidβd | zi] = 0 and
(ii) E[zix>i ] has full column rank.
Both these conditions are reasonable if Zid is a measure of X∗id and, for
example, if vid ⊥ eid.
48
Some Additional Remarks on IV Estimators
Despite the fact that IV methods can solve important omitted variable and
measurement error problems, they have their own issues.
It may be that “the cure is worse than the disease.”
In particular, even small violations of the main IV assumptions can have
serious implications.
Consider the simplified system:
Yi = β0 +Xiβ1 + εi,
Xi = δ0 + Ziδ1 + vi.
where Xi is an endogenous variable, and Zi is our instrument.
Recall that we require E[εi | Zi] = 0 and E[ZiXi] 6= 0.
Focus on consistent estimation of β1; thus it is WLOG that we take
E[εi] = E[vi] = 0.
WLOG we can also take E[Zi] = 0.
49
Some Additional Remarks on IV Estimators
Consider a small violation of the exogeneity assumption.
Berkowitz, Caner and Fang (2008) study the problem of “nearly exogenous”
instruments.
Cov(Zi, εi) = E[Ziεi] = E[ZiE[εi | Zi]] = c/

n,
Under this assumption, it is possible to show:

n(βˆ2SLS − β) d→ ξdz ∼ N(Hc,Σ2SLS).
where H =
(
E
[
xiz
>
i
]
E
[
ziz
>
i
]−1
E
[
zix
>
i
])−1
E
[
xiz
>
i
]
.
Thus, the distribution of the 2SLS estimator has been “shifted” by the
unknown parameter c, which is impossible to consistently estimate.
In other words, there is no way of telling if this we in this situation using
data...
Note, without dividing by

n, it is easy to show that this asymptotic bias
diverges to ±∞ if |c| > 0 (elementwise).
50
Some Additional Remarks on IV Estimators
Serious issues can also arise in the presence of weak instruments.
Is easy to show in our simple model that:
plim
n→∞
βˆ2SLS = β +
Cov(Zi, εi)
Cov(Zi, Xi)
= β + σε
σx
Corr(Zi, εi)
Corr(Zi, Xi)
.
Thus, even if there is an extremely small correlation with Zi and εi, if
Corr(Zi, Xi) is small then the inconsistency of 2SLS can be very large.
In fact, you can show that the magnitude of inconsistency is smaller with
OLS if:
|Corr(Xi, εi)| · |Corr(Zi, Xi)| < |Corr(Zi, εi)|.
Thus, with weak enough instruments you can make a good claim that it is
actually better to just use our inconsistent OLS estimator!
Weak instruments can also cause 2SLS standard errors to be large.
The lesson: even very small violations of the IV assumptions can cause very
erratic behaviour for the 2SLS estimand.
51
Table of Contents
1 Review
2 Introduction to IV Methods
3 Properties of the Linear IV Estimator
4 Some Additional Remarks on IV Estimators
5 Preview for Next Lecture
52
Preview for Next Lecture
In this lecture we discussed how IV methods can be helpful with problems of
omitted variables or measurement error.
Note we have shown the 2SLS estimator is:
1 generally (very) biased;
2 consistent;
3 asymptotically normal; and,
4 asymptotically efficient within a certain class of estimators.
However, the 2SLS estimand can behave erratically if exogeneity is violated,
or if instruments are weak.
In the next lecture we will generalize further, and will consider simultaneous
equations models.
53
References
References cited in the slides:
Angrist, J.D. and Krueger, A.B. (1991). Does compulsory school attendance
affect schooling and earnings? The Quarterly Journal of Economics,
106(4):979-1014.
Angrist, J.D. and Evans, W.N. (1998). Children and Their Parents’ Labor
Supply: Evidence from Exogenous Variation in Family Size. The American
Economic Review, 88(3):450-477.
Berkowitz, D., Caner, M. and Fang, Y. (2008). Are “Nearly Exogenous
Instruments” reliable? Economic Letters, 101(1):20-23.
White, H. (1984). Asymptotic Theory for Econometricians. Academic Press.

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468