Binomial Regression I Binomial Regression I 1 / 1 Learning goals Understand binomial regression know when you should use binormal regression. </br>be able to write binomial regression model and its likelihood. be able to obtain estimators of parameters or function of parameters using R script. be able to quantify uncertainty of the estimators (e.g., computing CI). be able to test hypothesis. be able to do model selection. Understand asymptotic properties of MLEs (maximum likelihood estimators) use asymptotic normality of MLEs to quantify uncertainty of the estimators (e.g., Wald CI). Undertand Wald test and likelihood ratio test (LRT) use them to test hypothesis in binomial regression Understand (scaled) deviance use it to test model adequacy or perform LRT and model comparison. Binomial Regression I 2 / 1 Challenger example Binomial Regression I 3 / 1 Challenger disaster On the 28th of January 1986 the Space Shuttle Challenger broke apart after an O-ring seal failed at liftoff, leading to the deaths of its seven crew members. Despite concerns about the O-rings failing due to the cold—the forecast temperature was 29± 3 oF—no one was able to provide an analysis that convinced NASA (who were under pressure to launch following several previous delays) not to go ahead. The way the data was presented didn’t help matters. Binomial Regression I 4 / 1 Challenger disaster: data Binomial Regression I 5 / 1 Challenger disaster: data Summarise data using number of damaged O-rings (out of 6) per launch. Data is in dataframe orings. Response damage is number of damaged O-rings (out of 6). Predictor temp is temperature (oF ) > library(faraway) > data(orings) > str(orings) data.frame: 23 obs. of 2 variables: $ temp : num 53 57 58 63 66 67 67 67 68 69 ... $ damage: num 5 1 1 1 0 0 0 0 0 0 ... > plot(damage/6 ~ temp, data = orings, + xlim = c(30, 90), ylim = c(0, 1)) Binomial Regression I 6 / 1 Challenger disaster: visualize data 30 40 50 60 70 80 90 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 temp da m ag e/ 6 Binomial Regression I 7 / 1 Challenger disaster: model Make the assumption that Yi , the number of damaged O-rings on the i-th launch, has distribution Yi ∼ bin(6, pi ) where pi depends on the temperature ti . We also assume that the Yi are independent. How to relate pi with ti . Question: just use a linear relationship? pi = β0 + β1ti? No. We need parameter bounds so 0 ≤ pi ≤ 1. We need alternate function to relate pi with ti . Binomial Regression I 8 / 1 Alternative function to relate pi with ti For a single launch, best estimate of pi is just yi/6. From plot of yi/6 against ti it is reasonable to assume that pi = p(ti ) where p is a smooth function of the temperature, decreasing from 1 down to 0 as the temperature increases. 30 40 50 60 70 80 90 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 temp da m ag e/ 6 Binomial Regression I 9 / 1 We choose logistic function: suppose that for some β0 and β1 p(t) = 1 1 + e−(β0+β1t) = eβ0+β1t 1 + eβ0+β1t Restricts 0 ≤ p(t) ≤ 1 Monotonic increasing/decreasing function Can be linearized through the logit transformation log p(t) 1− p(t) = β0 + β1t Binomial Regression I 10 / 1 We choose logistic function: suppose that for some β0 and β1 p(t) = 1 1 + e−(β0+β1t) = eβ0+β1t 1 + eβ0+β1t p(t) = 1/2 for t = −β0/β1, so −β0/β1 controls the location of the curve. p′(−β0/β1) = β1/4, so β1 controls the steepness of the curve. Example: see R script and result in the section “logistic function with different values for beta0 and beta1” of Challenger.pdf. Binomial Regression I 11 / 1 Challenger disaster: model Make the assumption that Yi , the number of damaged O-rings on the i-th launch, has distribution Yi ∼ bin(6, pi ) where pi = eβ0+β1ti 1 + eβ0+β1ti . We also assume that the Yi are independent. Binomial Regression I 12 / 1 Challenger disaster: model fitting Make the assumption that Yi , the number of damaged O-rings on the i-th launch, has distribution Yi ∼ bin(6, pi ) where pi = eβ0+β1ti 1 + eβ0+β1ti . We also assume that the Yi are independent. Maximum likelihood estimators (MLE) βˆ0, βˆ1: values for β0, β1 which maximize the log-likelihood. Binomial Regression I 13 / 1 Challenger disaster: model fitting The log-likelihood is l(β0, β1) = logL(β0, β1) = logP(Y = y | β0, β1) = log ∏ i P(Yi = yi | β0, β1) = ∑ i log (( 6 yi ) pyii (1− pi )6−yi ) = c + ∑ i (yi log pi + (6− yi ) log(1− pi )) = c + ∑ i ( yi log pi 1− pi + 6 log(1− pi ) ) Put ηi = β0 + β1ti , then log pi/(1− pi ) = ηi and log(1− pi ) = − log(1 + eηi ). Binomial Regression I 14 / 1 Challenger disaster: model fitting There is no closed form solution for MLE βˆ0, βˆ1 in this model. Numerical search procedures are required to find MLE. the glm function uses the iterative weighted least squares (IWLS) algorithm - I will cover this later. For now, let’s use the optim function. Example: See R script and result in “maximum likelihood fitting” of Challenger.pdf βˆ0 = 11.667 and βˆ1 = -0.216 Binomial Regression I 15 / 1 Challenger disaster: questions Forecast probability of an O-ring being damaged when the launch temperature is 29 oF . How good is our forecast? Can we provide a confidence interval? Is temperature useful to predict the O-ring failing? Binomial Regression I 16 / 1 Binomial regression Binomial Regression I 17 / 1 Binomial regression model We suppose that we observe Yi ∼ bin(mi , pi ), i = 1, . . . , n, independent. The mi are known and we suppose that for some link function g , g(pi ) = ηi = x T i β = β0 + β1xi1 + · · ·+ βqxiq where xi are known predictors and β are unknown parameters. Binomial Regression I 18 / 1 Binomial regression model: link function Usual choices for g : logit η = log p 1− p , p = 1 1 + exp(−η) = exp(η) 1 + exp(η) complementary log-log η = log(− log(1− p)), p = 1− exp(−eη) probit η = Φ−1(p), p = Φ(η), where Φ is the cumulative distribution function (cdf) of the standard normal distribution. Binomial Regression I 19 / 1 Binomial regression model: link function > curve(1/(1+exp(-x)), -4, 4, ylim=c(0,1), + xlab="eta", ylab="p", col="red", + main="binomial link functions") > curve(pnorm(x), -4, 4, add=TRUE, col="blue", lty=2) > curve(1-exp(-exp(x)), -4, 4, add=TRUE, col="black", lty=3) > legend("topleft", c("logit", "probit", "comp. log-log"), + col=c("red", "blue", "black"), lty=c(1,2,3), bty="n") Binomial Regression I 20 / 1 Binomial regression model: link function −4 −2 0 2 4 0. 0 0. 2 0. 4 0. 6 0. 8 1. 0 binomial link functions eta p logit probit comp. log−log Binomial Regression I 21 / 1 Binomial regression model: likelihood Given observations yi of Yi ∼ bin(mi , pi = g−1(ηi )), where ηi = xTi β, the log-likelihood is l(β) = n∑ i=1 logP(Yi = yi ) = n∑ i=1 log (( mi yi ) pyii (1− pi )mi−yi ) = c + n∑ i=1 yi log(g −1(ηi )) + (mi − yi ) log(1− g−1(ηi )) There is no closed form solution for MLE of β. Numerical search procedures are required to find MLE. Binomial Regression I 22 / 1 Reminder: questions from Challenger disaster Forecast probability of an O-ring being damaged when the launch temperature is 29 oF . How good is our forecast? Can we provide a confidence interval? Is temperature useful to predict the O-ring failing? Binomial Regression I 23 / 1 Forecast probability of an O-ring being damaged when the launch temperature is 29 oF . pˆ = g−1(ηˆ) when t = 29. pˆ = exp(ηˆ)1+exp(ηˆ) , where ηˆ = βˆ0 + βˆ129 pˆ = 0.995 : see R script and result in “prediction for temp of 29” of Challenger.pdf Binomial Regression I 24 / 1 Reminder: questions from Challenger disaster Forecast probability of an O-ring being damaged when the launch temperature is 29 oF . How good is our forecast? Can we provide a confidence interval? Is temperature useful to predict the O-ring failing? We need to know properties of MLE!! Binomial Regression I 25 / 1 Asymptotic properties MLE Binomial Regression I 26 / 1 Reminder: Maximum likelihood estimate (MLE) Suppose that Yi , i = 1, . . . , n, are indepedendent, with densities/mass-functions fi (·;θ). Given observations yi of the Yi , the log-likelihood is l(θ) = l(θ; y) = ∑ i log fi (yi ;θ). MLE θˆ is that value of θ which maximises l(θ). Note that allowing fi to depend on i means that we can include the case where the distribution of Yi depends on some covariate xi . That is, we can have fi (·;θ) = f (·; xi ,θ) for some common f . Binomial Regression I 27 / 1 Asymptotic properties of MLE Under certain regularity conditions (to be introduced later), the MLE is asymptotically consistent, asymptotically normal, asymptotically efficient. Binomial Regression I 28 / 1 MLE: asymptotic consistency Let θ∗ denote a true value for θ. As n→∞, θˆ p−→ θ∗. That is for any > 0 P(|θˆ − θ∗| > )→ 0 as n→∞. Binomial Regression I 29 / 1 MLE: asymptotic normality Let θ∗ denote a true value for θ. θˆ ≈d N(θ∗, I(θ∗)−1). Binomial Regression I 30 / 1 MLE: asymptotic normality The observed information is J (θ) = − ∂ 2l(θ) ∂θ∂θT = −Hl(θ), where the hessian matrix, Hl(θ), is a square matrix of second-order partial derivatives of the log likelihood. For binomial regression with one predictor (e.g., θ = β = ( β0 β1 ) ): J (θ) = J (β) = −∂2l(β)∂β20 − ∂2l(β)∂β1∂β0 − ∂2l(β)∂β0∂β1 − ∂2l(β) ∂β21 Clearly J (θ) = J (θ; y) depends on y through l(θ) = l(θ; y). Binomial Regression I 31 / 1 MLE: asymptotic normality The Fisher information is I(θ) = EJ (θ; Y). I(θ) does not depend on y. Binomial Regression I 32 / 1 Exercise: binomial regression with a logit link We suppose that we observe Yi ∼ bin(mi , pi ), i = 1, . . . , n, independent, where pi = 1 1+exp(−ηi ) and ηi = β0 + β1xi . The log-likelihood is l(β0, β1) = c + ∑ i [ yi log pi 1− pi + mi log(1− pi ) ] = c + ∑ i [ yi (β0 + β1xi )−mi log(1 + eβ0+β1xi ) ] . Binomial Regression I 33 / 1 Exercise: binomial regression with a logit link Then, J (β) = ( ∑ i mipi (1− pi ) ∑ i mixipi (1− pi )∑ i mixipi (1− pi ) ∑ i mix 2 i pi (1− pi ) ) . See my derivation in “observed information binomial regression.pdf”. So, since there are no yi terms left, I(β) = ( ∑ i mipi (1− pi ) ∑ i mixipi (1− pi )∑ i mixipi (1− pi ) ∑ i mix 2 i pi (1− pi ) ) . Binomial Regression I 34 / 1 MLE: asymptotic normality Let θ∗ denote a true value for θ. As n→∞, I(θ∗)1/2(θˆ − θ∗) d−→ N(0, I ). That is, θˆ ≈d N(θ∗, I(θ∗)−1). Any intuition for why variance is the inverse of fisher information which is related to hessian matrix? Binomial Regression I 35 / 1 MLE: asymptotic efficiency Asymptotic consistency: θˆ p−→ θ∗ Asymptotic normality: θˆ ≈d N(θ∗, I(θ∗)−1) MLE: asymptotically unbiased estimator with smallest variance I(θ∗)−1. Binomial Regression I 36 / 1 Wald CI for tTθ (linear combination of parameters) t = t1 t2 ... tm θ = θ1 θ2 ... θm θ∗ = θ∗1 θ∗2 ... θ∗m θˆ = θˆ1 θˆ2 ... θˆm θˆ ≈ N(θ∗, I(θ∗)−1). In practice, we do not know θ∗, the true value of θ. Thus, we approximate I(θ∗)−1 using I(θˆ)−1, leading to θˆ ≈ N(θ∗, I(θˆ)−1). Then, tT θˆ ≈ N(tTθ∗, tTI(θˆ)−1t). Binomial Regression I 37 / 1 Wald CI for tTθ (linear combination of parameters) tT θˆ ≈ N(tTθ∗, tTI(θˆ)−1t). An approximate 100(1− α)% confidence interval for tTθ: tT θˆ ± zα √ tTI(θˆ)−1t, where Φ(zα) = 1− α/2 In particular, taking t = standard unit vectors, 1 0 ... 0 , 0 1 ... 0 , . . . , 0 0 ... 1 , we can obtain CI for each of parameters θ1, θ2, . . . , θn. An approximate 100(1− α)% confidence interval for θi : θˆi ± zα √ (I(θˆ)−1)i ,i If I is unavailable then we can approximate it using the observed information J , i.e., I(θˆ)−1 ≈ J (θˆ; y)−1. Binomial Regression I 38 / 1 Reminder: questions from Challenger disaster Forecast probability of an O-ring being damaged when the launch temperature is 29 oF . How good is our forecast? Can we provide a confidence interval? Is temperature useful to predict the O-ring failing? Binomial Regression I 39 / 1 How good is our forecast? Can we provide a confidence interval? CI for p = g−1(η) = exp(η)1+exp(η) , where η = β0 + β129. Step 1: Compute a CI for η, (ηl , ηr ). Step 2: CI for p = g−1(η) is (g−1(ηl), g−1(ηr )). CI for p: (0.864307, 0.9998686). See R script and result in “Confidence Interval for p” of Challenger.pdf Binomial Regression I 40 / 1 log likelihood ratio CI We have 2l(θˆ)− 2l(θ∗) ≈ χ2k where k is the dimension of θ∗. This result can also, in principle, be used to construct a 100(1− α)% confidence region for θ: {θ : 2l(θˆ)− 2l(θ) ≤ χ2k(1− α)} where χ2k(1− α) is the 100(1− α)% point for a χ2k distribution. One lab problem will be to plot CI using the log likelihood ratio. This approximation is generally better than the normal approximation for θˆ. That is, it holds for smaller sample sizes. Binomial Regression I 41 / 1 MLE: regularity conditions For maximum likelihood theory to hold we require l smooth enough with respect to θ (third derivatives exist and continuous) Third order derivatives of l have bounded expectations Support of Yi does not depend on θ The domain Θ of θ is finite dimensional and doesn’t depend on Yi θ∗ is not on the boundary of Θ. References McCullagh & Nelder (1989), Appendix A. F.W. Scholz, Maximum likelihood estimation. Encyclopedia of Statistical Sciences Vol. 7, p.4629ff. Wiley, 2006. Binomial Regression I 42 / 1
欢迎咨询51作业君