辅导案例-LEC 4
Imperial College London Business School FINANCIAL STATISTICS LEC 4: MLE and GMM Paolo Za↵aroni MLE GMM Application: Vasicek 1 / 40 Imperial College London Business School Maximum Likelihood Estimation Let y = (y1, ...yT ) be sample of dimension T of a rv with known pdf f(y; ✓0) that depends on unknown parameter ✓0, of finite dimension k ⇥ 1. ✓0 is called the true parameter vector. To know what the pdf is, it ’ s a a lot of information! We know that if the yt are independent then f(y; ✓0) = f(y1; ✓0)f(y2; ✓0)....f(yT ; ✓0). In the real world, nothing is independent! In general: f(y; ✓0) = f(y1; ✓0)f(y2 | y1; ✓0) · · · f(yT | yT 1...y1; ✓0), where f(yt | yt 1...y1; ✓0) is the conditional pdf. MLE GMM Application: Vasicek 3 / 40 Imperial College London Business School Maximum Likelihood Estimation Consider arbitrary vector ✓ we define the likelihood function : L(✓) ⌘ f(y; ✓), which taking logarithm and using the above factorization (rembemer: log of products is same as sum of logs) yields l(✓) = logL(✓) = TX t=2 log f(yt | yt 1....y1; ✓) + log f(y1; ✓). MLE defined by ✓ˆ = argmax✓ˆ2⇥l(✓) where ⇥ included in R k. Remark: in probability we use pdf f(y; ✓0) for given true value ✓0; in statistics we use likelihood f(y; ✓) = L(✓) for given data y. Complete change of perspective! MLE GMM Application: Vasicek 4 / 40 Imperial College London Business School Maximum Likelihood Estimation Since log function is monotone, is there any di↵erence from using l(✓) rather than L(✓)? Assuming l(.) twice di↵erentiable, define the main ’actors’: s(✓) ⌘ @l(✓) @✓ (score vector) , H(✓) ⌘ @ 2l(✓) @✓@✓0 (Hessian matrix), I(✓) ⌘ EH(✓) (information matrix). All the above quantities are also function of the data y, except the information matrix I(✓0). (Why?) MLE GMM Application: Vasicek 5 / 40 Imperial College London Business School Maximum Likelihood Estimation Now you understand why MLE is so powerful (remember, we know the pdf!). Very desirable properties of MLE: under regularity conditions, as T !1, ✓ˆ !p ✓0 (consistency);p T (✓ˆ ✓0)!d N(0, I 1(✓0)) (asymptotic normality); setting I(✓0) = plimI(✓0) T ; ✓ˆ asymptotically e cient and invariant. MLE GMM Application: Vasicek 6 / 40 Imperial College London Business School Maximum Likelihood Estimation Asymptotic e ciency means that for any other estimator with acm (asymptotically covariance matrix) V then the di↵erence V I 1(✓0) is ’non-negative’ matrix ( more formally, it is positive semi definite). MLE is more precise than anything else! This is know as the Cramer-Rao result. Invariance means that the MLE of g(✓0), for any continuously di↵erentiable function g(.) is g(✓ˆ). Important since there is no need to perform another numerical optimization! MLE GMM Application: Vasicek 7 / 40 Imperial College London Business School Maximum Likelihood Estimation Now the MLE ✓ˆ is (almost) never obtained in closed form, unike the OLS. This is why we distinguish ✓0 (unique true value), ✓ˆ (unique MLE), ✓ (generic parameter value). To gauge some of its properties, di↵erentiating both sides of 1 = R f(y; ✓0)dy gives (note that R is a complicated T -dimensional integral) @1 @✓ = 0 = @ @✓ Z f(y; ✓0)dy = Z @ log f(y; ✓0) @✓ f(y; ✓0)dy = Es(✓0). We learn that the (random) score has zero mean! MLE GMM Application: Vasicek 8 / 40 Imperial College London Business School Maximum Likelihood Estimation Di↵erentiating the above expression once moreZ ✓ @2 log f(y; ✓0) @✓@✓0 f(y; ✓0) + @ log f(y; ✓0) @✓ @f(y; ✓0) @✓0 ◆ dy = 0. Rearranging and using df/d✓ = fdlogf/d✓)