Binomial Regression II Binomial Regression II 1 / 17 Reminder: questions from Challenger disaster Forecast probability of an O-ring being damaged when the launch temperature is 29 oF . How good is our forecast? Can we provide a confidence interval? Is temperature useful to predict the O-ring failing? Binomial Regression II 2 / 17 Is temperature useful to predict the O-ring failing? Yi , the number of damaged O-rings on the i-th launch, has distribution Yi ∼ bin(6, pi ) where log pi/(1− pi ) = ηi = β0 + β1ti . Test for association between the number of damaged O-rings and temperature: H0 : β1 = 0 Ha : β1 6= 0 Wald Test Likelihood Ratio Test Binomial Regression II 3 / 17 Wald Test Reminder: asymptotic normality of MLE θˆi ∼ asy . N(θ∗i , [I(θˆ)−1]i ,i ). Test for association between the number of damaged O-rings and temperature H0 : β1 = 0 Ha : β1 6= 0 Wald test statistic: z∗ = βˆ1 se(βˆ1) ∼ asy . N(0, 1) under H0. Challenger disaster See R script and result in “Wald Test” of Challenger.pdf |z∗| = 4.07 > 1.96 (critical value N(0, 1) at α = 0.05) ⇒ reject H0. p-value = 0.0000476. Binomial Regression II 4 / 17 Likelihood Ratio Test (LRT) Test for association between the number of damaged O-rings and temperature H0 : β1 = 0 Ha : β1 6= 0 Full model (F): ηi = β0 + β1ti Maximum log likelihood: logL(βˆF ) βˆ F : MLE of the parameters in the full model. Reduced model (R): ηi = β0 Maximum log likelihood: logL(βˆR) βˆ R : MLE of the parameters in the reduced model. Compare two models. Likelihood ratio test statistic: LR∗ = −2 [ logL(βˆR)− logL(βˆF ) ] ∼ asy . χ21 under H0. LR∗ > critical value from χ21 at α ⇒ reject H0. Binomial Regression II 5 / 17 Wald test vs Likelihood Ratio Test (LRT) Wald test and LRT are asymptotically equivalent. [z∗]2 = [ βˆ1 se(βˆ1) ]2 ∼ asy . χ21 Precisely speaking, two tests are asymptotically equivalent in the sense that under H0 they reach the same decision with probability approaching 1 as n goes to infinite. However, the chi-squared approximation to the log likelihood ratio is generally better than the normal approximation to the MLE. Binomial Regression II 6 / 17 Likelihood Ratio Test (LRT): Challenger disaster See R script and result in “Likelihood Ratio test” and “Wald Test vs Likelihood Ratio test” of Challenger.pdf LR∗ = 21.98 > 3.84 (critical value from χ21 at α = 0.05) ⇒ reject H0. p-value = 0.0000027. Binomial Regression II 7 / 17 Likelihood Ratio Test (LRT) for model selection In general, likelihood ratio test is used to select between two nested models (one model can be obtained by constraining parameters of another model). Full model (F): Maximum log likelihood: logL(θˆF ) Reduced model (R): Maximum log likelihood: logL(θˆR) Let k indicate the difference in the number of parameters between two models. Compare two nested models. Under the reduced model LR∗ = −2 [ logL(θˆR)− logL(θˆF ) ] ∼ asy . χ2k . LR∗ > critical value from χ2k at α ⇒ select the full model. Binomial Regression II 8 / 17 (Scaled) Deviance The scaled deviance is used to judge model adequacy. For the binomial regression model the deviance is the same as the scaled deviance, which is defined as the log likelihood ratio for the fitted model compared to the saturated model. Full model (F): The saturated model has the same number of parameters and the observations. Maximum log likelihood: logL(θˆF ) Reduced model (R): The fitted model. Maximum log likelihood: logL(θˆR) The scaled deviance: D = −2 [ logL(θˆR)− logL(θˆF ) ] . Binomial Regression II 9 / 17 (Scaled) Deviance Warning: the number of parameters in the saturated model is n, which is not fixed, so the theory of maximum likelihood does not apply, and D may not converge to a chi-squared distribution. Binomial Regression II 10 / 17 (Scaled) Deviance for binomial regression model The saturated model allows for one parameter for each observation. For binomial regression the saturated model has p1, p2, . . . , pn as parameters. Clearly, for this model we estimate pi by yi/mi . Let pˆi = g −1(xTi βˆ) be our (not saturated) model estimate of pi , then the scaled deviance is D = −2 n∑ i=1 ( yi (log pˆi − log yi mi ) +(mi − yi )(log(1− pˆi )− log(1− yi mi )) ) = −2 n∑ i=1 ( yi log yˆi yi + (mi − yi ) log mi − yˆi mi − yi ) where yˆi = mi pˆi is the i-th fitted value. Binomial Regression II 11 / 17 (Scaled) Deviance for binomial regression model for testing model adequacy It just happens: if mipi and mi (1− pi ) are large enough (≥ 5 is a common rule of thumb), then for a binomial regression model, if the model is adequate then D ≈ χ2n−k , where k is the number of parameters in the fitted model (including β0). In this case the (scaled) deviance can be used as a test for model adequacy. If D is too large (as compared to a χ2n−k), then the model is missing something. For a binomial model with small mi we can’t use the (scaled) deviance directly to test model adequacy, but we can still use it for model selection. Binomial Regression II 12 / 17 Use the scaled deviance for model selection (LRT) If model A is nested within model B, and model A has (scaled) deviance DA and model B has (scaled) deviance DB , then DA − DB = −2 [ logL(θˆA)− logL(θˆB) ] , where θˆ A and θˆ B are MLEs for the models A and B, respectively. That is, the log likelihood for the saturated model cancels, and we are left with the log likelihood ratio. Binomial Regression II 13 / 17 Use the scaled deviance for model selection (AIC) The Akaike Information Criterion is used for model selection: AIC = 2k − 2 logL(θˆ) where k is the number of parameters in the model. Given a choice, we prefer that model with the smaller AIC. If model B has s more parameters than model A (not necessarily nested within B), then AICB − AICA = 2s − 2 logL(θˆB) + 2 logL(θˆA) = 2s − DA + DB . Binomial Regression II 14 / 17 (Scaled) deviance: Challenger disaster See R script and result in “Deviance” of Challenger.pdf Binomial Regression II 15 / 17 Reminder: questions from Challenger disaster Forecast probability of an O-ring being damaged when the launch temperature is 29 oF . How good is our forecast? Can we provide a confidence interval? Is temperature useful to predict the O-ring failing? Binomial Regression II 16 / 17 Learning goals Understand binomial regression know when you should use binormal regression. be able to write binomial regression model and its likelihood. be able to obtain estimators of parameters or function of parameters using R script. be able to quantify uncertainty of the estimators (e.g., computing CI). be able to test hypothesis. be able to do model selection. Understand asymptotic properties of MLEs (maximum likelihood estimators) use asymptotic normality of MLEs to quantify uncertainty of the estimators (e.g., Wald CI). Undertand Wald test and likelihood ratio test (LRT) use them to test hypothesis in binomial regression Understand (scaled) deviance use it to test model adequacy or perform LRT and model comparison. Binomial Regression II 17 / 17
欢迎咨询51作业君