辅导案例-LEC 4

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Imperial College
London
Business School
FINANCIAL STATISTICS
LEC 4: MLE and GMM
Paolo Za↵aroni
MLE GMM Application: Vasicek 1 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation
Let y = (y1, ...yT ) be sample of dimension T of a rv with
known pdf f(y; ✓0) that depends on unknown parameter ✓0,
of finite dimension k ⇥ 1.
✓0 is called the true parameter vector.
To know what the pdf is, it ’ s a a lot of information!
We know that if the yt are independent then
f(y; ✓0) = f(y1; ✓0)f(y2; ✓0)....f(yT ; ✓0).
In the real world, nothing is independent! In general:
f(y; ✓0) = f(y1; ✓0)f(y2 | y1; ✓0) · · · f(yT | yT1...y1; ✓0),
where f(yt | yt1...y1; ✓0) is the conditional pdf.
MLE GMM Application: Vasicek 3 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation
Consider arbitrary vector ✓ we define the likelihood function :
L(✓) ⌘ f(y; ✓),
which taking logarithm and using the above factorization
(rembemer: log of products is same as sum of logs) yields
l(✓) = logL(✓) =
TX
t=2
log f(yt | yt1....y1; ✓) + log f(y1; ✓).
MLE defined by ✓ˆ = argmax✓ˆ2⇥l(✓) where ⇥ included in R
k.
Remark: in probability we use pdf f(y; ✓0) for given true
value ✓0; in statistics we use likelihood f(y; ✓) = L(✓) for
given data y. Complete change of perspective!
MLE GMM Application: Vasicek 4 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation
Since log function is monotone, is there any di↵erence from
using l(✓) rather than L(✓)?
Assuming l(.) twice di↵erentiable, define the main ’actors’:
s(✓) ⌘ @l(✓)
@✓
(score vector) ,
H(✓) ⌘ @
2l(✓)
@✓@✓0
(Hessian matrix),
I(✓) ⌘ EH(✓) (information matrix).
All the above quantities are also function of the data y,
except the information matrix I(✓0). (Why?)
MLE GMM Application: Vasicek 5 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation
Now you understand why MLE is so powerful (remember, we
know the pdf!).
Very desirable properties of MLE: under regularity conditions,
as T !1,
✓ˆ !p ✓0 (consistency);p
T (✓ˆ ✓0)!d N(0, I1(✓0)) (asymptotic normality);
setting I(✓0) = plimI(✓0)
T
;
✓ˆ asymptotically ecient and invariant.
MLE GMM Application: Vasicek 6 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation
Asymptotic eciency means that for any other estimator
with acm (asymptotically covariance matrix) V then the
di↵erence V I1(✓0) is ’non-negative’ matrix ( more
formally, it is positive semi definite).
MLE is more precise than anything else! This is know as the
Cramer-Rao result.
Invariance means that the MLE of g(✓0), for any continuously
di↵erentiable function g(.) is g(✓ˆ). Important since there is no
need to perform another numerical optimization!
MLE GMM Application: Vasicek 7 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation
Now the MLE ✓ˆ is (almost) never obtained in closed form,
unike the OLS. This is why we distinguish ✓0 (unique true
value), ✓ˆ (unique MLE), ✓ (generic parameter value).
To gauge some of its properties, di↵erentiating both sides of
1 =
R
f(y; ✓0)dy gives (note that
R
is a complicated
T -dimensional integral)
@1
@✓
= 0 =
@
@✓
Z
f(y; ✓0)dy
=
Z
@ log f(y; ✓0)
@✓
f(y; ✓0)dy = Es(✓0).
We learn that the (random) score has zero mean!
MLE GMM Application: Vasicek 8 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation
Di↵erentiating the above expression once moreZ ✓
@2 log f(y; ✓0)
@✓@✓0
f(y; ✓0) +
@ log f(y; ✓0)
@✓
@f(y; ✓0)
@✓0
◆
dy = 0.
Rearranging and using df/d✓ = fdlogf/d✓)
E @
2 log f(y; ✓0)
@✓@✓0
= EH(✓0) = Es(✓0)s(✓0)0).
But this is precisely I(✓0) = Es(✓0)s(✓0)0 so the information
matrix gives var(s(✓0)) since s(✓0) has zero mean!
MLE GMM Application: Vasicek 9 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation: opening black box
Opening the black box (a little bit)!
How can we say that the MLE has all these nice statistical
properties, without having a closed form? Use Taylor
expansion of FOC (why do we have a zero?)
s(✓ˆ) = 0 = s(✓0) +
⇣@s(✓0)
@✓
⌘
(✓ˆ ✓0) + small remainder ...
But what is the derivative of the score ds(✓)/d✓ ? The
Hessian (why?) so one gets
⇣
H(✓0)
⌘1
s(✓0) = (✓ˆ ✓0)
re-written as
I1(✓0)s(✓0) ⇡ (✓ˆ ✓0)
since I(✓0) average of H(✓0) (why?).
MLE GMM Application: Vasicek 10 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation: regression example
Consider linear regression of model
y = X0 + u, u ⇠ N(0,20I),
where now we denote by 0 and 20 the true values. Assume
that Gauss-Markov conditions hold.
Then y is also multi-normal with mean and variance
E(y|X) = X0, var(y|X) = 20I.
so y|X ⇠ N(X0,20I).
We can also use Jacobian of transformation to get the pdf of
y but this is easier!
MLE GMM Application: Vasicek 11 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation: regression example
For arbitrary ✓ = (2,0)0 (not the true values!)
l(✓) = T
2
log(2⇡) T
2
log(2) 1
22
(y X)0(y X).
Di↵erentiating and equating to zero yields the FOC:
@l(✓ˆ)
@
= 1
ˆ2
(X 0y +X 0Xˆ) = 0,
@l(✓ˆ)
@2
= T
2ˆ2
+
1
2ˆ4
(y Xˆ)0(y Xˆ) = 0,
yielding the MLE
ˆ = (X 0X)1X 0y, ˆ2 = (y Xˆ)0(y Xˆ)/T.
MLE GMM Application: Vasicek 12 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation: regression example
Do you recognize ˆ? It coincides with the OLS! This means
that when u are normal, OLS is not just BLUE but is in fact
the BEST estimator (most ecient)!
For ˆ2, we can easily see that it is biased (but only so in finite
samples). In fact
Eˆ2 =
Ee0e
T
= 2
T k
T
< 2
but getting at 2 as T !1. We call it asymptotically
unbiased.
MLE GMM Application: Vasicek 13 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation: regression example
Second order derivatives are:
@2l(✓)
@@0
= X
0X
2
,
@2l(✓)
@@2
= X
0u
4
,
@2l(✓)
@22
=
T
24
u
0u
6
.
MLE GMM Application: Vasicek 14 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation: regression example
E(
@2l(✓)
@@0
|X) = X
0X
2
,
E(
@2l(✓)
@@2
|X) = E(X
0u|X)
4
,
E(
@2l(✓)
@22
|X) = T
24
E(u
0u|X)
6
.
Since E(X 0u|X) = X 0E(u) = 0, E(u0u|X) = T2
substituting and changing sign gives
I(✓) =
1
2
⇣ X 0X 0
0 T22
⌘
MLE GMM Application: Vasicek 15 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation: regression example
Dividing by T and taking the limit
I(✓)
T
=
1
2
⇣ X0X
T 0
0 122
⌘
!p I(✓0) = 1
2
⇣ Q 0
0 122
⌘
.
Its gives the asymptotic covariance matrix (the precision!) of
the MLE:
I1(✓0) = 2
⇣ Q1 0
0 22
⌘
.
Notice again how the formula coincides with the asymptotic
properties of OLS for ˆ.
See how the o↵-diagonal terms are zero: ˆ and ˆ2 are
independent asymptotically. We can construct F test etc !
MLE GMM Application: Vasicek 16 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation: empirical example
Estimation results
IBM on S&P’s 500 - MLE with normal distribution
ˆ 0.0007 0.8993
Standard Error 0.00063 0.0794
t-stat 1.0443 11.3251
Standard errors derived from the Information matrix:
I(ˆ) =
⇣ 0.0001 0.0011
0.0011 1.5763
⌘
If you compare with previous empirical example, (nearly) same as
OLS!
MLE GMM Application: Vasicek 17 / 40
Imperial College
London
Business School
Maximum Likelihood Estimation: empirical example
Estimation results (cont.)
IBM on S&P’s 500 - MLE with student t distribution with 5
dof
ˆ 0.0005 0.8962
Standard Error 0.0004 0.0685
t-stat 1.0963 13.0923
Standard errors derived from the Information matrix:
I(ˆ) =
⇣ 0.00004 0.0007
0.0007 1.1713
⌘
Now we get di↵erent parameter estimates.
MLE GMM Application: Vasicek 18 / 40
Imperial College
London
Business School
Generalized Method of Moments (GMM)
GMM is a general method that nests all previous ones (OLS,
MLE,IV,etc).
It does not necessarily require to know the pdf so useful when
likelihood is cumbersome if not impossible to compute.
Let’s start with the general concept of Method of Moments:
when regression model y = X + u correctly specified then
population moment EX 0u = 0 which becomes
EX 0(y X) = 0. (Why?).
MLE GMM Application: Vasicek 20 / 40
Imperial College
London
Business School
Method of Moments
Replacing the population moment by sample moment and
equating it to zero yields‘implicitly’ the Method-of-Moment
estimator (MM) : T1X 0(y XˆMM ) = 0.
Provided X is full column rank we can solve the system to get
ˆMM = (X
0X)1X 0y.
So MM coincides with OLS!
Notice that we did not need to do any optimization to get
ˆMM !
Notice that in MM we got as many parameters (elements of
) as moment conditions EX 0u = 0.
MLE GMM Application: Vasicek 21 / 40
Imperial College
London
Business School
Generalized Method of Moments (GMM)
Let us consider the GMM approach: often finance and
economic theory leads to orthogonality conditions, which
characterize some aspects of a given economic model, say
Eg(yt, Xt, ✓0) = 0.
Here y and X are the endogenous and exogenous variables
respectively, ✓0 a k-dimensional vector of parameters and g(.)
a L-dimensional vector, with L k.
MLE GMM Application: Vasicek 22 / 40
Imperial College
London
Business School
Generalized Method of Moments (GMM)
Example: yt is the short term interest rate and the theory says
that its conditional mean and conditional variance satisfy:
Et1(yt) = a0 + b0yt1, vart1(yt) = c0 + d0y2t1.
for some parameters a0, b0, c0, d0.
These can be re-written in the GMM notation as
g(yt, yt1, Xt1, ✓0) =
⇣ yt a0 b0yt1
(yt Et1(yt))2 c0 d0y2t1
⌘
⌦Xt1.
with Xt1 being a k ⇥ 1 vector of observed variables (can be
any!) and ✓0 = (a0, b0, c0, d0).We need 2⇥ k 4 (why?).
MLE GMM Application: Vasicek 23 / 40
Imperial College
London
Business School
Generalized Method of Moments (GMM)
GMM estimator is defined as:
✓ˆGMM ⌘ argmin✓2⇥gˆ0(y,X, ✓)WT gˆ(y,X, ✓),
where
gˆ(y,X, ✓) ⌘ 1
T
TX
t=1
g(yt, Xt, ✓),
and WT is a L⇥ L matrix (independent of ✓), such that
WT !p W > 0.
MLE GMM Application: Vasicek 24 / 40
Imperial College
London
Business School
Generalized Method of Moments (GMM)
Under regularity conditions ✓ˆGMM !p ✓0 and
p
T (✓ˆGMM✓0)!d N(0, (G0WG)1G0W⌦WG(G0WG)1),
setting
G = E
@g(yt, Xt, ✓)
@✓0
|✓=✓0 ,
⌦ = Eg(yt, Xt, ✓0)g(yt, Xt, ✓0)
0.
Notice the sandwich fomr of the acm, a symptom of
non-eciency!
MLE GMM Application: Vasicek 25 / 40
Imperial College
London
Business School
Generalized Method of Moments (GMM)
Particular case when WT = W = I. Then the acm becomes
VGMM = (G
0G)1G0⌦G(G0G)1.
Can we choose WT optimally so that we minimize the acm?
Yes! Setting W = ⌦1 yields the acm
VGMM = (G
0⌦1G)1.
This is also called EMM (Ecient MM). Sometimes this
cannot be computed though but iterative procedure usually
works.
MLE GMM Application: Vasicek 26 / 40
Imperial College
London
Business School
Generalized Method of Moments (GMM)
We have seen that OLS can be casted in a GMM framework.
The same is true for MLE.
In fact setting
g(yt, Xt, ✓) = s(✓),
then
⌦ = Es(✓0)s(✓0)
0 = I(✓0)
so EMM
✓ˆGMM ⌘ argmin✓2⇥s(✓)0I1(✓)s(✓),
For this choise of WT and g(.) GMM coincides with MLE!
MLE GMM Application: Vasicek 27 / 40
Imperial College
London
Business School
Generalized Method of Moments (GMM)
When number of moment conditions (L) is larger than
parameters (k), can test the number of over-identifying
restrictions .
In fact when model is correctly specified (we pick the correct
moments!)
T gˆ0(y,X, ✓ˆGMM )WT gˆ(y,X, ✓ˆGMM )!d 2Lk.
So a large p-value suggests correct specification!
MLE GMM Application: Vasicek 28 / 40
Imperial College
London
Business School
MLE of Vasicek’s model of term structure
You will do this at length in your fixed income course. I am
reporting this here for completeness but can skip it.
Start from general asset pricing condition (Euler equation):
1 = Et((1 +Ri,t+1)Mt+1)
where
Ri,t ⌘ real return on asset i,
Mt ⌘ stochastic discount factor (or pricing kernel) ,
For zero-coupon bonds with price Pn,t and maturity n one has
Pn,t = Et(Pn1,t+1Mt+1) since (1 +Rn,t+1) = Pn1,t+1/Pn,t.
MLE GMM Application: Vasicek 30 / 40
Imperial College
London
Business School
MLE of Vasicek’s model of term structure
We need the following result: given a random variable
X ⇠ N(µ,2),
EeX = eµ+
2/2.
Then applied it to the Euler equation:
pn,t = Et(mt+1 + pn1,t+1) +
1
2
V ARt(mt+1 + pn1,t+1)
where pn,t = logPn,t, mt = logMt.
Assumption: mt+1 = xt (pricing kernel mt linear in state
variable xt).
Assumption: state variable follows dynamic regression
xt µ0 = 0(xt1 µ0) + ✏t with ✏t ⇠ NID(0,20).
MLE GMM Application: Vasicek 31 / 40
Imperial College
London
Business School
MLE of Vasicek’s model of term structure
What is this state variable? Answer: setting n = 1 gives
pn1,t+1 = p0,t+1 = 0,
yielding
p1t = Et(mt+1) +
1
2
V ARt(mt+1) = xt.
One-period bond yield is then y1,t = p1,t = xt.
Vasick model usually set in continuous time whereby y1,t is an
Ornstein-Uhlenbeck dy1,t = (✓ y1,t)dt+ dBt and Bt is
Brownian motion.
MLE GMM Application: Vasicek 32 / 40
Imperial College
London
Business School
MLE of Vasicek’s model of term structure
General solution for n-maturity bond (log) price
pn,t = An +Bnyt,1 and in terms of yields
yn,t = an + bnyt,1
with an =
An
n , bn =
Bn
n .
The loadings An, Bn are implicit function of ✓ = (µ,,2):
An = An1 +Bn1(1 0)µ0 1
2
B2n1
2
0, (1)
Bn = 1 +Bn10 =
1 n0
1 0 , (2)
MLE GMM Application: Vasicek 33 / 40
Imperial College
London
Business School
MLE of Vasicek’s model of term structure
Let us turn to the most interesting part (for us!): estimation.
Since we assume Gaussianity, we can implement MLE (why?).
First we estimate the model parameters ✓0 = (µ0,0,20) by
MLE ✓ˆ applied to y1,t.
Second, we plug-in ✓ˆ onto An, Bn to get the estimated term
structure curve:
yˆn,t = aˆn + bˆnyt,1.
MLE GMM Application: Vasicek 34 / 40
Imperial College
London
Business School
MLE of Vasicek’s model of term structure
First pass: for ✓ = (2,, µ) the loglikelihood equals
l(✓) = cst+
1
2
log(1 2) (1
2)
22
(y1,1 µ)2 1
2
log 2
(T 1)/2 log 2 1/(22)
TX
t=2
(y1,t µ(1 ) y1,t1)2.
The blue part is sum of conditional pdfsPT
t=2 logf(y1,t|y1,t1, ...; ✓). The other part is marginal to
y1,1. More to come when we do time series!
MLE GMM Application: Vasicek 35 / 40
Imperial College
London
Business School
MLE of Vasicek’s model of term structure
Following the same steps undetaken for MLE of regression model,
eg taking double derivative etc gives the acm of ✓ˆ:
I1(✓) =
0B@ 24 0 00 (1 2) 0
0 0
2
(1)2
1CA .
MLE GMM Application: Vasicek 36 / 40
Imperial College
London
Business School
MLE of Vasicek’s model of term structure
Estimation results:
Vasieck’s model for 3-month UK interest rate
(µ,,2) 4.9393 0.9503 0.0029
Standard Error 0.0108 0.0197 0.0003
t-stat 455.9924 48.3404 11.2027
Standard errors derived from the Information matrix.
See the MatLab film!
MLE GMM Application: Vasicek 37 / 40
Imperial College
London
Business School
MLE ans GMM: analytical questions
1 Consider the binomial rv y, which takes value 1 with
probability ✓ and zero with probability 1 ✓. Verify that
E(y) = ✓ and var(y) = ✓(1 ✓). Find the MLE of ✓ when a
random sample is available. Find the acm of the MLE.
2 A sample of three values (1, 2, 3) is drawn from the
exponential distribution with pdf
f(x) =
1
✓
ex/✓.
Derive the MLE of ✓ and the ML estimate for the above
sample. Derive the acm of the MLE and derive an estimate of
it based on the ML estimate.
MLE GMM Application: Vasicek 38 / 40
Imperial College
London
Business School
MLE and GMM: analytical questions
3 Assume we have a random sample from the distribution
f(x) = ✓ 0.5✓2x for 0  x  2/✓
and zero everywhere else. Derive E(X) and derive the MM
estimate of ✓.
MLE GMM Application: Vasicek 39 / 40
Imperial College
London
Business School
MLE and GMM: Summary
General concepts on MLE.
Example: MLE of linear regression.
General concepts on method of moments.
MM
GMM
Empirical example of MLE: Vasicek model of the term
structure.
MLE and GMM: analytical questions.
MLE GMM Application: Vasicek 40 / 40

欢迎咨询51作业君