程序代写案例-I 1 /

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Binomial Regression I
Binomial Regression I 1 / 1
Learning goals
Understand binomial regression
know when you should use binormal regression. </br>be able to write binomial regression model and its likelihood.
be able to obtain estimators of parameters or function of parameters
using R script.
be able to quantify uncertainty of the estimators (e.g., computing CI).
be able to test hypothesis.
be able to do model selection.
Understand asymptotic properties of MLEs (maximum likelihood
estimators)
use asymptotic normality of MLEs to quantify uncertainty of the
estimators (e.g., Wald CI).
Undertand Wald test and likelihood ratio test (LRT)
use them to test hypothesis in binomial regression
Understand (scaled) deviance
use it to test model adequacy or perform LRT and model comparison.
Binomial Regression I 2 / 1
Challenger example
Binomial Regression I 3 / 1
Challenger disaster
On the 28th of January 1986 the Space Shuttle Challenger broke apart
after an O-ring seal failed at liftoff, leading to the deaths of its seven crew
members.
Despite concerns about the O-rings failing due to the cold—the forecast
temperature was 29± 3 oF—no one was able to provide an analysis that
convinced NASA (who were under pressure to launch following several
previous delays) not to go ahead.
The way the data was presented didn’t help matters.
Binomial Regression I 4 / 1
Challenger disaster: data
Binomial Regression I 5 / 1
Challenger disaster: data
Summarise data using number of damaged O-rings (out of 6) per launch.
Data is in dataframe orings.
Response damage is number of damaged O-rings (out of 6).
Predictor temp is temperature (oF )
> library(faraway)
> data(orings)
> str(orings)
data.frame: 23 obs. of 2 variables:
$ temp : num 53 57 58 63 66 67 67 67 68 69 ...
$ damage: num 5 1 1 1 0 0 0 0 0 0 ...
> plot(damage/6 ~ temp, data = orings,
+ xlim = c(30, 90), ylim = c(0, 1))
Binomial Regression I 6 / 1
Challenger disaster: visualize data
30 40 50 60 70 80 90
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
temp
da
m
ag
e/
6
Binomial Regression I 7 / 1
Challenger disaster: model
Make the assumption that Yi , the number of damaged O-rings on the i-th
launch, has distribution
Yi ∼ bin(6, pi )
where pi depends on the temperature ti . We also assume that the Yi are
independent.
How to relate pi with ti .
Question: just use a linear relationship? pi = β0 + β1ti?
No. We need parameter bounds so 0 ≤ pi ≤ 1. We need alternate
function to relate pi with ti .
Binomial Regression I 8 / 1
Alternative function to relate pi with ti
For a single launch, best estimate of pi is just yi/6. From plot of yi/6
against ti it is reasonable to assume that pi = p(ti ) where p is a smooth
function of the temperature, decreasing from 1 down to 0 as the
temperature increases.
30 40 50 60 70 80 90
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
temp
da
m
ag
e/
6
Binomial Regression I 9 / 1
We choose logistic function: suppose that for some β0 and β1
p(t) =
1
1 + e−(β0+β1t)
=
eβ0+β1t
1 + eβ0+β1t
Restricts 0 ≤ p(t) ≤ 1
Monotonic increasing/decreasing function
Can be linearized through the logit transformation
log
p(t)
1− p(t) = β0 + β1t
Binomial Regression I 10 / 1
We choose logistic function: suppose that for some β0 and β1
p(t) =
1
1 + e−(β0+β1t)
=
eβ0+β1t
1 + eβ0+β1t
p(t) = 1/2 for t = −β0/β1, so −β0/β1 controls the location of the
curve.
p′(−β0/β1) = β1/4, so β1 controls the steepness of the curve.
Example: see R script and result in the section “logistic function with
different values for beta0 and beta1” of Challenger.pdf.
Binomial Regression I 11 / 1
Challenger disaster: model
Make the assumption that Yi , the number of damaged O-rings on the i-th
launch, has distribution
Yi ∼ bin(6, pi )
where
pi =
eβ0+β1ti
1 + eβ0+β1ti
.
We also assume that the Yi are independent.
Binomial Regression I 12 / 1
Challenger disaster: model fitting
Make the assumption that Yi , the number of damaged O-rings on the i-th
launch, has distribution
Yi ∼ bin(6, pi )
where
pi =
eβ0+β1ti
1 + eβ0+β1ti
.
We also assume that the Yi are independent.
Maximum likelihood estimators (MLE) βˆ0, βˆ1: values for β0, β1 which
maximize the log-likelihood.
Binomial Regression I 13 / 1
Challenger disaster: model fitting
The log-likelihood is
l(β0, β1) = logL(β0, β1)
= logP(Y = y | β0, β1) = log
∏
i
P(Yi = yi | β0, β1)
=
∑
i
log
((
6
yi
)
pyii (1− pi )6−yi
)
= c +
∑
i
(yi log pi + (6− yi ) log(1− pi ))
= c +
∑
i
(
yi log
pi
1− pi + 6 log(1− pi )
)
Put ηi = β0 + β1ti , then log pi/(1− pi ) = ηi and
log(1− pi ) = − log(1 + eηi ).
Binomial Regression I 14 / 1
Challenger disaster: model fitting
There is no closed form solution for MLE βˆ0, βˆ1 in this model.
Numerical search procedures are required to find MLE.
the glm function uses the iterative weighted least squares (IWLS)
algorithm - I will cover this later.
For now, let’s use the optim function.
Example:
See R script and result in “maximum likelihood fitting” of
Challenger.pdf
βˆ0 = 11.667 and βˆ1 = -0.216
Binomial Regression I 15 / 1
Challenger disaster: questions
Forecast probability of an O-ring being damaged when the launch
temperature is 29 oF .
How good is our forecast? Can we provide a confidence interval?
Is temperature useful to predict the O-ring failing?
Binomial Regression I 16 / 1
Binomial regression
Binomial Regression I 17 / 1
Binomial regression model
We suppose that we observe Yi ∼ bin(mi , pi ), i = 1, . . . , n, independent.
The mi are known and we suppose that for some link function g ,
g(pi ) = ηi = x
T
i β = β0 + β1xi1 + · · ·+ βqxiq
where xi are known predictors and β are unknown parameters.
Binomial Regression I 18 / 1
Binomial regression model: link function
Usual choices for g :
logit
η = log
p
1− p , p =
1
1 + exp(−η) =
exp(η)
1 + exp(η)
complementary log-log
η = log(− log(1− p)), p = 1− exp(−eη)
probit
η = Φ−1(p), p = Φ(η),
where Φ is the cumulative distribution function (cdf) of the
standard normal distribution.
Binomial Regression I 19 / 1
Binomial regression model: link function
> curve(1/(1+exp(-x)), -4, 4, ylim=c(0,1),
+ xlab="eta", ylab="p", col="red",
+ main="binomial link functions")
> curve(pnorm(x), -4, 4, add=TRUE, col="blue", lty=2)
> curve(1-exp(-exp(x)), -4, 4, add=TRUE, col="black", lty=3)
> legend("topleft", c("logit", "probit", "comp. log-log"),
+ col=c("red", "blue", "black"), lty=c(1,2,3), bty="n")
Binomial Regression I 20 / 1
Binomial regression model: link function
−4 −2 0 2 4
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
binomial link functions
eta
p
logit
probit
comp. log−log
Binomial Regression I 21 / 1
Binomial regression model: likelihood
Given observations yi of Yi ∼ bin(mi , pi = g−1(ηi )), where ηi = xTi β, the
log-likelihood is
l(β) =
n∑
i=1
logP(Yi = yi )
=
n∑
i=1
log
((
mi
yi
)
pyii (1− pi )mi−yi
)
= c +
n∑
i=1
yi log(g
−1(ηi )) + (mi − yi ) log(1− g−1(ηi ))
There is no closed form solution for MLE of β. Numerical search
procedures are required to find MLE.
Binomial Regression I 22 / 1
Reminder: questions from Challenger disaster
Forecast probability of an O-ring being damaged when the launch
temperature is 29 oF .
How good is our forecast? Can we provide a confidence interval?
Is temperature useful to predict the O-ring failing?
Binomial Regression I 23 / 1
Forecast probability of an O-ring being damaged when the
launch temperature is 29 oF .
pˆ = g−1(ηˆ) when t = 29.
pˆ = exp(ηˆ)1+exp(ηˆ) , where ηˆ = βˆ0 + βˆ129
pˆ = 0.995 : see R script and result in “prediction for temp of 29” of
Challenger.pdf
Binomial Regression I 24 / 1
Reminder: questions from Challenger disaster
Forecast probability of an O-ring being damaged when the launch
temperature is 29 oF .
How good is our forecast? Can we provide a confidence interval?
Is temperature useful to predict the O-ring failing?
We need to know properties of MLE!!
Binomial Regression I 25 / 1
Asymptotic properties MLE
Binomial Regression I 26 / 1
Reminder: Maximum likelihood estimate (MLE)
Suppose that Yi , i = 1, . . . , n, are indepedendent, with
densities/mass-functions fi (·;θ).
Given observations yi of the Yi , the log-likelihood is
l(θ) = l(θ; y) =
∑
i
log fi (yi ;θ).
MLE θˆ is that value of θ which maximises l(θ).
Note that allowing fi to depend on i means that we can include the case
where the distribution of Yi depends on some covariate xi . That is, we can
have fi (·;θ) = f (·; xi ,θ) for some common f .
Binomial Regression I 27 / 1
Asymptotic properties of MLE
Under certain regularity conditions (to be introduced later), the MLE is
asymptotically consistent,
asymptotically normal,
asymptotically efficient.
Binomial Regression I 28 / 1
MLE: asymptotic consistency
Let θ∗ denote a true value for θ. As n→∞, θˆ p−→ θ∗.
That is for any > 0
P(|θˆ − θ∗| > )→ 0 as n→∞.
Binomial Regression I 29 / 1
MLE: asymptotic normality
Let θ∗ denote a true value for θ.
θˆ ≈d N(θ∗, I(θ∗)−1).
Binomial Regression I 30 / 1
MLE: asymptotic normality
The observed information is
J (θ) = − ∂
2l(θ)
∂θ∂θT
= −Hl(θ),
where the hessian matrix, Hl(θ), is a square matrix of second-order partial
derivatives of the log likelihood.
For binomial regression with one predictor (e.g., θ = β =
(
β0
β1
)
):
J (θ) = J (β) =
−∂2l(β)∂β20 − ∂2l(β)∂β1∂β0
− ∂2l(β)∂β0∂β1 −
∂2l(β)
∂β21

Clearly J (θ) = J (θ; y) depends on y through l(θ) = l(θ; y).
Binomial Regression I 31 / 1
MLE: asymptotic normality
The Fisher information is
I(θ) = EJ (θ; Y).
I(θ) does not depend on y.
Binomial Regression I 32 / 1
Exercise: binomial regression with a logit link
We suppose that we observe Yi ∼ bin(mi , pi ), i = 1, . . . , n, independent,
where pi =
1
1+exp(−ηi ) and ηi = β0 + β1xi . The log-likelihood is
l(β0, β1) = c +
∑
i
[
yi log
pi
1− pi + mi log(1− pi )
]
= c +
∑
i
[
yi (β0 + β1xi )−mi log(1 + eβ0+β1xi )
]
.
Binomial Regression I 33 / 1
Exercise: binomial regression with a logit link
Then,
J (β) =
( ∑
i mipi (1− pi )
∑
i mixipi (1− pi )∑
i mixipi (1− pi )
∑
i mix
2
i pi (1− pi )
)
.
See my derivation in “observed information binomial regression.pdf”.
So, since there are no yi terms left,
I(β) =
( ∑
i mipi (1− pi )
∑
i mixipi (1− pi )∑
i mixipi (1− pi )
∑
i mix
2
i pi (1− pi )
)
.
Binomial Regression I 34 / 1
MLE: asymptotic normality
Let θ∗ denote a true value for θ.
As n→∞,
I(θ∗)1/2(θˆ − θ∗) d−→ N(0, I ).
That is,
θˆ ≈d N(θ∗, I(θ∗)−1).
Any intuition for why variance is the inverse of fisher information which is
related to hessian matrix?
Binomial Regression I 35 / 1
MLE: asymptotic efficiency
Asymptotic consistency: θˆ
p−→ θ∗
Asymptotic normality: θˆ ≈d N(θ∗, I(θ∗)−1)
MLE: asymptotically unbiased estimator with smallest variance I(θ∗)−1.
Binomial Regression I 36 / 1
Wald CI for tTθ (linear combination of parameters)
t =

t1
t2
...
tm
 θ =

θ1
θ2
...
θm
 θ∗ =

θ∗1
θ∗2
...
θ∗m
 θˆ =

θˆ1
θˆ2
...
θˆm

θˆ ≈ N(θ∗, I(θ∗)−1).
In practice, we do not know θ∗, the true value of θ. Thus, we approximate
I(θ∗)−1 using I(θˆ)−1, leading to
θˆ ≈ N(θ∗, I(θˆ)−1).
Then,
tT θˆ ≈ N(tTθ∗, tTI(θˆ)−1t).
Binomial Regression I 37 / 1
Wald CI for tTθ (linear combination of parameters)
tT θˆ ≈ N(tTθ∗, tTI(θˆ)−1t).
An approximate 100(1− α)% confidence interval for tTθ:
tT θˆ ± zα
√
tTI(θˆ)−1t, where Φ(zα) = 1− α/2
In particular, taking t = standard unit vectors,

1
0
...
0
 ,

0
1
...
0
 , . . . ,

0
0
...
1
,
we can obtain CI for each of parameters θ1, θ2, . . . , θn.
An approximate 100(1− α)% confidence interval for θi :
θˆi ± zα
√
(I(θˆ)−1)i ,i
If I is unavailable then we can approximate it using the observed
information J , i.e., I(θˆ)−1 ≈ J (θˆ; y)−1.
Binomial Regression I 38 / 1
Reminder: questions from Challenger disaster
Forecast probability of an O-ring being damaged when the launch
temperature is 29 oF .
How good is our forecast? Can we provide a confidence interval?
Is temperature useful to predict the O-ring failing?
Binomial Regression I 39 / 1
How good is our forecast? Can we provide a confidence
interval?
CI for p = g−1(η) = exp(η)1+exp(η) , where η = β0 + β129.
Step 1: Compute a CI for η, (ηl , ηr ).
Step 2: CI for p = g−1(η) is (g−1(ηl), g−1(ηr )).
CI for p: (0.864307, 0.9998686). See R script and result in “Confidence
Interval for p” of Challenger.pdf
Binomial Regression I 40 / 1
log likelihood ratio CI
We have
2l(θˆ)− 2l(θ∗) ≈ χ2k
where k is the dimension of θ∗.
This result can also, in principle, be used to construct a 100(1− α)%
confidence region for θ:
{θ : 2l(θˆ)− 2l(θ) ≤ χ2k(1− α)}
where χ2k(1− α) is the 100(1− α)% point for a χ2k distribution.
One lab problem will be to plot CI using the log likelihood ratio.
This approximation is generally better than the normal approximation for
θˆ. That is, it holds for smaller sample sizes.
Binomial Regression I 41 / 1
MLE: regularity conditions
For maximum likelihood theory to hold we require
l smooth enough with respect to θ (third derivatives exist and
continuous)
Third order derivatives of l have bounded expectations
Support of Yi does not depend on θ
The domain Θ of θ is finite dimensional and doesn’t depend on Yi
θ∗ is not on the boundary of Θ.
References
McCullagh & Nelder (1989), Appendix A.
F.W. Scholz, Maximum likelihood estimation. Encyclopedia of
Statistical Sciences Vol. 7, p.4629ff. Wiley, 2006.
Binomial Regression I 42 / 1

欢迎咨询51作业君