Generalised Linear Models (GLMs) I Generalised Linear Models (GLMs) I 1 / 17 Learning goals Be able to define Generalised Linear Model (GLM). Be able to obtain the cannonical link function for GLM. (Challenging) Be able to explain ideas for the iterated weighted least squares (IWLS) algorithm. Be able to compute the variance of parameter estimates in GLM using IWLS algorithm. Understand (scaled) deviance Be able to define (scaled) deviance. Be able to compute (scaled) deviance. Be able to use it to test model adequacy, perform model selection for nested models and non-nested models. Be able to perform diagnostics for GLM using R. Understand what different ‘diagnostics measurements’ aim to measure. Be able to obtain those diagnostic measurements from the glm output. Be able to perform diagnostics for GLM using R. Generalised Linear Models (GLMs) I 2 / 17 Generalised Linear Model Definition: Y is a GLM if it is from an exponential family, and µ := EY = g−1(xTβ) where g is a monotonic differentiable function called the link function. x is a vector of independent (predictor) variables, and β is a vector of parameters Remark: We model location using η = xTβ, and let the scale sort itself out. That is, we do not model the scale explicitly. Generalised Linear Models (GLMs) I 3 / 17 Canonical link Recall Y is from an exponential family if f (y ; θ, φ) = exp [ yθ − b(θ) a(φ) + c(y , φ) ] . If g(µ) = g(EY ) = xTβ = θ then g is called the cannonical link. Since µ = b′(θ), it follows that the canonical link must be (b′)−1. Generalised Linear Models (GLMs) I 4 / 17 Examples: canonical links normal θ = µ, g(µ) = µ Poisson θ = log λ = logµ, g(µ) = logµ binomial θ = log p1−p = log µ m−µ , g(µ) = log µ m−µ Generalised Linear Models (GLMs) I 5 / 17 Estimation of parameters in GLM Iterated Weighted Least Squares (IWLS) algorithm Implemented in the glm function in R Generalised Linear Models (GLMs) I 6 / 17 Estimation of parameters in GLM using maximum likelihood methods Suppose we have independent observations yi from an exponential family, with canonical parameter θi and dispersion parameter φ, for i = 1, . . . , n. Furthermore suppose that yi has mean µi = b ′(θi ) = g−1(xTi β) If g is the canonical link then θi = x T i β. The log-likelihood is then l(β, φ; y) = n∑ i=1 ( yiθi − b(θi ) a(φ) + c(yi , φ) ) . Maximum likelihood methods aim to find the values of parameters that maximise their log likelihood function. Generalised Linear Models (GLMs) I 7 / 17 Aside: Newton-Raphson method If f (x) has a continuous derivative f ′(x), then the problem of finding the value that maximise f (x) is equivalent to 1) finding x∗1 , . . . , x ∗ k that are the roots of f ′(x), and then 2) among them, finding the one that maximise f (x). We apply the Newton-Raphson method for root-finding to f ′(x): xn+1 = xn − f ′(xn) f ′′(xn) . Generalised Linear Models (GLMs) I 8 / 17 Newton-Raphson method to find MLE Suppose we wish to find the value of θ that maximise a log likelihood l(θ) using Newton-Raphson method. Our update step is θn+1 = θn − H(θn)−1U(θn) where U(θ) = ∂l(θ) ∂θ and H(θ) = ∂2l(θ) ∂θ∂θT = −J (θ). If we replace J by I, the Fisher information, then the algorithm is called Fisher scoring. The Fisher information is guaranteed to be positive definite (unlike the observed information). Generalised Linear Models (GLMs) I 9 / 17 Connection with Iterated Weighted Least Squares (IWLS) algorithm? It turns out that the Fisher scoring applied to GLM can be interpreted as a least squares problem (if you want to know details, read “McCullagh & Nelder on IWLS” posted on LMS). The glm function in R computes MLEs of parameters in GLM using iterated weighted “Least Squares” algorithm. Generalised Linear Models (GLMs) I 10 / 17 Aside: Weighted Least Squares method Suppose Y = Xβ + ε where ε ∼ N(0,Σ), and X and Σ are full rank. Multiplying by Σ−1/2 we get Σ−1/2Y = Σ−1/2Xβ + ε′ where ε′ ∼ N(0, I ). What is the least squares estimator of β? The estimator of β that minimises the sum of squares (Σ−1/2y − Σ−1/2Xβ)T (Σ−1/2y − Σ−1/2Xβ) = (y − Xβ)TΣ−1(y − Xβ) is βˆ = (XTΣ−1X )−1XTΣ−1y = (XTWX )−1XTW y, where the weight W = Σ−1. Generalised Linear Models (GLMs) I 11 / 17 Weighted Least Squares for GLM g(Yi ) ≈ g(µi ) + (Yi − µi )g ′(µi ) Let Zi = g(µi ) + (Yi − µi )g ′(µi ) and i = (Yi − µi )g ′(µi ). Then, Zi = x T i β + i where Var i = (g ′(µi ))2VarYi . If we knew Var i then the estimator of β would be the solution to the weighted least squares problem: min β (z− Xβ)TΣ−1(z− Xβ) where Σ is diagonal with Σii = Var i . Generalised Linear Models (GLMs) I 12 / 17 Weighted Least Squares for GLM We have g(Y) ≈ Z = Xβ+ ε where i = (Yi − µi )g ′(µi ) and Σ = Var ε is diagonal with entries Σii = (g ′(µi ))2VarYi = (g ′(µi ))2v(µi )a(φ). This suggests βˆ = (XTΣ−1X )−1XTΣ−1z. Problem: zi and Σii depend on β because µi = g −1(xTi β). Solution: iterate! Note that the a(φ) factor in the expression for βˆ cancels out. Generalised Linear Models (GLMs) I 13 / 17 Iterated Weighted Least Squares (IWLS) algorithm IWLS algorithm (for finding MLE of β) 1 Start with µˆ0 = y. 2 Given µˆn calculate, for each i , zni = g(µˆ n i ) + (yi − µˆni )g ′(µˆni ) and W nii = 1 Σii = 1 g ′(µˆni )2v(µˆ n i ) . 3 Put βˆ n+1 = (XTW nX )−1XTW nzn and for each i , µˆn+1i = g −1(xTi βˆ n+1 ). 4 If βˆ n+1 is sufficiently close to βˆ n then stop, otherwise return to (2). For GLM, IWLS algorithm is equivalent to Fisher scoring. Example: see the section “IWLS” in Bliss.pdf. Generalised Linear Models (GLMs) I 14 / 17 Variance of βˆ from IWLS algorithm Suppose that the IWLS algorithm converges to the estimate βˆ, then βˆ = (XT Σˆ−1X )−1XT Σˆ−1z where, elementwise, we have µˆi = g −1(xTi βˆ) zi = g(µˆi ) + (yi − µˆi )g ′(µˆi ) Σˆii = (g ′(µˆi ))2v(µˆi ). Since Var z = Σˆ we have Var βˆ = [(XT Σˆ−1X )−1XT Σˆ−1]Σˆ[(XT Σˆ−1X )−1XT Σˆ−1]T = (XT Σˆ−1X )−1 Note that the a(φ) term in Σˆ does not cancel here, as it did in the IWLS algorithm, so we need to estimate it. Generalised Linear Models (GLMs) I 15 / 17 Estimator for a(φ) Yi has mean µi and variance v(µi )a(φ). So∑ i (Yi − µˆi )2 v(µˆi )a(φ) ≈ χ2n−p where p is the number of parameters used to estimate µ. Let X 2 := ∑ i (Yi − µˆi )2 v(µˆi ) . Then, X 2/(n − p) will be an estimator for a(φ). X 2 is called Pearson’s χ2 statistic, and it can be shown that X 2/(n− p) is a consistent estimator for a(φ). Generalised Linear Models (GLMs) I 16 / 17 Variance of βˆ from IWLS algorithm Var βˆ = (XT Σˆ−1X )−1, where Σˆii = (g ′(µˆi ))2v(µˆi ) ˆa(φ) = (g ′(µˆi ))2v(µˆi )X 2/(n − p). Generalised Linear Models (GLMs) I 17 / 17
欢迎咨询51作业君