程序代写案例-MATH97082

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

© 2020 Imperial College London Page 1
MATH97082
BSc, MSci and MSc EXAMINATIONS (MATHEMATICS)
May-June 2020
This paper is also taken for the relevant examination for the
Associateship of the Royal College of Science
Statistical Modelling 2
SUBMIT YOUR ANSWERS AS SEPARATE PDFs TO THE RELEVANT DROPBOXES ON
BLACKBOARD (ONE FOR EACH QUESTION) WITH COMPLETED COVERSHEETS WITH
YOUR CID NUMBER, QUESTION NUMBERS ANSWERED AND PAGE NUMBERS PER
QUESTION.
.
Date: 22nd May 2020
Time: 13.00pm - 15.30pm (BST)
Time Allowed: 2 Hours 30 Minutes
Upload Time Allowed: 30 Minutes
This paper has 5 Questions.
Candidates should start their solutions to each question on a new sheet of paper.
Each sheet of paper should have your CID, Question Number and Page Number on the
top.
Only use 1 side of the paper.
Allow margins for marking.
Any required additional material(s) will be provided.
Credit will be given for all questions attempted.
Each question carries equal weight.
Throughout this paper, numerical answers need not be simplified.
1. Consider the linear model with n observations,
Y = Xβ + , ∼ N(0, σ2In).
(a) Show that the maximum likelihood estimator β̂ satisfies
XT (y −Xβ) = 0,
and give a geometric interpretation of this result. (4 marks)
(b) Derive an expression in terms of X for a matrix P such that the fitted values ŷ = Py and
the residuals e = (In − P )y. (4 marks)
(c) Explain what is meant by the leverage of an observation, and state its relationship to the
variance of the corresponding residual. (2 marks)
(d) In the case of simple linear regression with an intercept, where
X =

1 x1
... ...
1 xn
 ,
give the condition for X to have full rank, and interpret this condition in practical terms.
(2 marks)
(e) In the setting of part (d), show that the leverage of the ith observation can be written as
1
n
+ (xi − x¯)
2∑n
j=1(xj − x¯)2
,
where x¯ = 1
n
∑n
i=1 xi.
(5 marks)
(f) Explain how leverages can be used in model criticism.
(3 marks)
(Total: 20 marks)
MATH96051/MATH97082 Statistical Modelling II (2020) Page 2
2. Consider a Poisson regression model, in which the random variables Yi ∼ Poisson(µi) are
independent, and µ is related to the linear predictor η = Xβ by the canonical link function.
(a) Write the Poisson mass function fY (y) in exponential family form, identifying the canonical
parameter. (2 marks)
(b) Show that the score and Fisher information can be written as
U = XT (y − µ) , I = XTWX,
respectively, where W is a matrix to be determined.
(6 marks)
(c) Define what is meant by the deviance of a generalized linear model and show that in this
case, (3 marks)
D = 2
n∑
i=1
yi log
(
yi
µ̂i
)
− (yi − µ̂i).
Consider a Poisson GLM for the number of new cases of a disease as a function of time.
µi = exp(β0 + β1ti + β2t2i ).
The R output at the end of the question represents the result of fitting this model to observed
data.
(d) Assuming that the model is adequate, carry out a hypothesis test for β2 = 0, stating your
conclusion in plain language. Describe the conclusions that can be drawn from your preferred
model, in the original data context. (4 marks)
(e) State a feature of the output that suggests that the model fit is not adequate. Propose an
approximate solution and comment on how this would affect your conclusions in part (d).
(3 marks)
(f) Suggest features of the data context that may violate the modelling assumptions employed
here. (2 marks)
MATH96051/MATH97082 Statistical Modelling II (2020) Page 3
Call:
glm(formula = cases ~ year + I(year^2), family = poisson)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.408883 0.400405 1.021 0.3072
year 0.134293 0.058848 2.282 0.0225 *
I(year^2) -0.002327 0.001971 -1.181 0.2378
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 127.624 on 24 degrees of freedom
Residual deviance: 94.204 on 22 degrees of freedom
AIC: 176.44
(Total: 20 marks)
MATH96051/MATH97082 Statistical Modelling II (2020) Page 4
3. The output below shows the result of fitting a model to the the pulp dataset that was considered
in a tutorial class. It consists of 20 observations, balanced between four operators labelled a to
d. Some output has been obscured with ####. For questions that refer to obscured output, you
should briefly justify your answers.
Call:
lm(formula = bright ~ operator, data = pulp)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 60.2400 0.1458 413.243 <2e-16 ***
operatorb -0.1800 0.2062 -0.873 0.3955
operatorc 0.3800 0.2062 1.843 0.0839 .
operatord 0.4400 0.2062 2.134 0.0486 *
Residual standard error: 0.326 on ## degrees of freedom
Multiple R-squared: 0.4408,Adjusted R-squared: 0.3359
F-statistic: 4.204 on ## and ## DF, p-value: 0.02261
(a) Explain what is meant by the Adjusted R-squared, and how it can be used. (2 marks)
(b) Give the forms of the row of the design matrix for an observation from operator a and an
observation from operator b.
(2 marks)
(c) State the null hypothesis and the number of degrees of freedom for the F test given in the
output. (3 marks)
(d) Explain why the three model parameters for operators b, c and d have equal standard errors.
(2 marks)
Question continues on the next page
MATH96051/MATH97082 Statistical Modelling II (2020) Page 5
Suppose now we fit the linear mixed model as given in the code below.
Linear mixed model fit by REML [’lmerMod’]
Formula: bright ~ 1 + (1 | operator)
Data: pulp
REML criterion at convergence: 18.6
Scaled residuals:
Min 1Q Median 3Q Max
-1.4666 -0.7595 -0.1244 0.6281 1.6012
Random effects:
Groups Name Variance Std.Dev.
operator (Intercept) 0.06808 0.2609
Residual ##### #####
Number of obs: 20, groups: operator, 4
Fixed effects:
Estimate Std. Error t value
(Intercept) 60.4000 0.1494 404.2
(e) State, with justification, the value of the residual standard deviation. (2 marks)
(f) Explaining your reasoning, determine an estimate of the intra-class correlation (you need not
simplify your answer). (3 marks)
(g) Explain the difficulty that arises when using standard asymptotic results for the null distribution
of the likelihood ratio test statistic to compare the two models that have been fit, and suggest
an alternative approach. (3 marks)
(h) Suppose that the mixed model here is to be compared with one in which an additional fixed
effect is included. Explain how this could be done, noting any changes to the fitting procedure
that would be required.
(3 marks)
(Total: 20 marks)
MATH96051/MATH97082 Statistical Modelling II (2020) Page 6
4. (a) Define the three components of a generalized linear model (GLM).
(2 marks)
(b) Suppose that it is desired to estimate the concentration ρ0 of bacteria per unit volume of a
solution, by means of a dilution assay. At dilution stage x, a sample of the solution is diluted
so that it contains
ρx =
ρ0
2x
bacteria per unit volume on average.
A unit volume of solution is applied to n plates at each dilution stage. The number of bacteria
on a plate at dilution stage x is then Poisson distributed with mean ρx. A plate is said to be
infected if any bacteria are present.
(i) Write down in terms of ρx the probability pix that a plate at stage x is infected.
(1 mark)
(ii) Write down the distribution of the number Yx out of n plates that are infected, stating
relevant modelling assumptions.
(2 marks)
(iii) Show that ρ0 can be estimated using a GLM with the complementary log-log link
g(pix) = log(− log(1 − pix)), where the linear predictor η = β0 + β1x depends on ρ0
in a way that should be determined.
(2 marks)
(iv) The R output below is from a GLM as described in part (iii). Select values from the
output to give a point estimate of ρ0, together with an approximate 95% confidence
interval. State any assumptions needed.
(3 marks)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.7443 0.5207 7.191 6.42e-13 ***
x -0.8185 0.1044 -7.841 4.47e-15 ***
---
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 379.3628 on 24 degrees of freedom
Residual deviance: 5.9539 on 23 degrees of freedom
AIC: 28.014
Question continues on the next page
MATH96051/MATH97082 Statistical Modelling II (2020) Page 7
(c) A study is conducted to determine the association between smoking and circulatory disease.
The table below shows the number of disease sufferers (D) and non-sufferers (D¯), by smoking
status.
D D¯ Total
Smoker 18 19 37
Non-smoker 38 175 213
Total 56 194 250
Table 1: Smoking and circulatory disease data.
To analyze this data, a GLM was fit in R as follows. Note that the output has been abridged.
fit0<-glm(disease~smoker,family=binomial,data=circ_dat)
summary(fit0)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.5272 0.1790 -8.533 < 2e-16 ***
smoker ####1 ####2 3.934 8.35e-05 ***
(i) State the link function that was used to fit the model. (1 mark)
(ii) Explain how to use the values in the table to determine the intercept reported in the
output. (2 marks)
(iii) Write down the log odds ratio ∆, marked ####1 for the effect of smoking, leaving your
answer in terms of fractions. (3 marks)
(iv) Give the standard error for the log ratio marked ####2, leaving your answer in terms of
fractions. (2 marks)
(v) State which (if any) of the parameter estimates would change if these data had come
from a retrospective rather than a prospective study. State an important assumption
needed when interpreting results from a retrospective study. (2 marks)
(Total: 20 marks)
MATH96051/MATH97082 Statistical Modelling II (2020) Page 8
5. Example 2.1.1 of the extract discusses an experiment in which tree seedlings are grown under two
different concentration regimes for carbon dioxide. Three trees are assigned to each of the two
conditions, and the stomatal area is measured at four random locations on each plant.
Models are fit in R as follows
m0 <- lm(area ~ CO2, stomata)
m1 <- lm(area ~ CO2 + tree, stomata)
anova(m0,m1)
Analysis of Variance Table
Model 1: area ~ CO2
Model 2: area ~ CO2 + tree
Res.Df RSS Df Sum of Sq
1 22 2.1348
2 18 0.8604 4 1.2744
(a) Explain the difficulties in Example 2.1.1 with using a fixed effects model
yi = αj + βk + i,
where observation i is of tree k, exposed to CO2 level j.
(4 marks).
(b) Explain why m0 and m1 have 22 and 18 residual degrees of freedom, respectively.
(4 marks)
(c) Write down a numerical expression, in terms of the values in the table, for the F statistic.
State the distribution followed by the F statistic under the null hypothesis. (3 marks)
(d) Describe how least squares is used in the example above to obtain an unbiased estimate of
the random effects variance σ2b . (3 marks)
Section 2.2.1 of the extract discusses numerical methods for parameter estimation.
(e) Explain briefly how Newton’s method is used to obtain maximum likelihood estimators.
(2 marks)
(f) Show by means of a sketch in the case of a one-dimensional optimization that it is possible
for a Newton step to decrease the log likelihood. Explain why a sufficiently small step in
the Newton direction must increase the likelihood, so long as the Hessian matrix is negative
definite. (3 marks)
(g) Give an example of a model for which Newton’s method would converge after precisely one
iteration. Justify your answer briefly. (1 mark)
(Total: 20 marks)
MATH96051/MATH97082 Statistical Modelling II (2020) Page 9
Course: MATH96051/MATH97082
Setter: Hallsworth
Checker: Nason
Editor: Hallsworth
External: Jennison
Date: April 18, 2020
MSc EXAMINATIONS (MATHEMATICS)
May 2020
MATH96051/MATH97082
Statistical Modelling II [SOLUTIONS]
Setter’s signature Checker’s signature Editor’s signature
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
MATH96051/MATH97082 Statistical Modelling II [SOLUTIONS] (2020) Page 1 of 12
1. (a) [Seen] The likelihood is given by
1
(2piσ2)
n
2
exp
(
− 1
2σ2
(y −Xβ)T (y −Xβ)
)
Hence the log likelihood is
l(β) = −n
2
log
(
2piσ2
)− 1
2σ2
(y −Xβ)T (y −Xβ) .
For fixed σ2, we can maximize this log likelihood by expanding the expression
(y −Xβ)T (y −Xβ) = yTy − 2βTXTy + βTXTXβ
(or by the chain rule) and taking the gradient with respect to β
to see that
−2XTy + 2XTXβ = 0
at a stationary point. Since the gram matrix XTX is positive (semi-) definite, this stationary point
is a minimum, yielding a maximum for the original log likelihood.
Geometrically, this says that the vector e = y −Xβ of residuals is orthogonal to all columns of the
design matrix X. ŷ = Xβ̂ is the orthogonal projection of y onto the columns of the design matrix.
(b) [Seen] Taking
XTXβ = XTy,
so that if X is full rank, we have
β̂ = (XTX)−1XTy.
Then if ŷ = Xβ̂ we see that P = X(XTX)−1XT is the desired matrix.
Now e = y − ŷ = (I − P )y, so the residuals are given by the projection I − P onto the orthogonal
complement of the column space of X.
(c) [Seen] The leverage hi of the ith observation is defined to be the ith diagonal entry of the matrix P
given in part (b).
Note that E(e) = E (I − P )y = (I − P ) E(y) = (I −P )Xβ = 0, since I −P maps all columns of
X to 0.
Hence the variance-covariance matrix of e is given by E(eeT ) = (I−P )TE(yyT )(I−P ) = (I−P )σ2,
using the fact that (I − P )T = (I − P ) = (I − P )2. Hence the variance of the ith residual is given
by (1− hi)σ2.
(d) [Seen Similar] X has full rank when the vectors (1, . . . , 1)T and (x1, . . . , xn)
T are not scalar
multiples. This is the case when the xi values are not all equal.
Practically, we cannot learn how y changes when x changes if we have only a single x value.
(e) [Unseen] In this case,
X =
↑ ↑1 xi
↓ ↓
 ,
so that XTX =
(
n
∑
xj∑
xj
∑
x2j
)
,
and
(XTX)−1 =
1
n
∑
x2j − (
∑
xj)2
( ∑
x2j −
∑
xj
−∑xj n
)
=
1
n
∑
(xj − x¯)2
( ∑
x2j −
∑
xj
−∑xj n
)
.
Now the ith diagonal entry of P is given by
(
1 0
)
P
(
1
0
)
=
(
1 0
)
X(XTX)−1XT
(
1
0
)
=
(
1 xi
)
(XTX)−1
(
1
xi
)
=
1
n
∑
(xj − x¯)2
(
1 xi
)(∑x2j − xi∑xj
−∑xj + nxi
)
=
∑
x2j − 2xi
∑
xj + nx
2
i
n
∑
(xj − x¯)2
=
∑
(xj − x¯)2 + nx¯2 − 2nxix¯+ nx2i
n
∑
(xj − x¯)2 =
∑
(xj − x¯)2 + n(xi − x¯)2
n
∑
(xj − x¯)2
=
1
n
+
(xi − x¯)2∑
(xj − x¯)2 .
(f) [Seen Similar] Leverage can be used to identify observations that have the potential to have a
substantial effect on the fit, in virtue of their position in covariate space. Specifically, it identifies
points with a large (Mahalanobis) distance from the centroid of covariate space.
High leverage points with a large residual correspond to a large Cook’s distance. This is a measure
of how much the predictions from the model would change if a particular observation were omitted.
So a plot of leverage against residual, marked with contours of Cook’s distance (which is a function
of these), allows points with a substantial effect on the fit to be identified.
MATH96051/MATH97082 Statistical Modelling II [SOLUTIONS] (2020) Page 2 of 12
2. (a) [Seen]
fY (y;λ) = exp (−λ+ y log λ− log(y!))
is in exponential family form with canonical parameter θ = log λ.
(b) [Seen Similar]
with the canonical link θi = ηi = Xiβ, the log likelihood takes the form of a sum over contributions
from the individual observations
l(β) =
n∑
i=1
li =
n∑
i=1
yiηi − exp(ηi)− log(y!)
so that the jth entry of the gradient is given by
∂l(β)
∂βj
=
n∑
i=1
∂li
∂ηi
∂ηi
∂βj
=
n∑
i=1
yixij − xij exp(ηi).
Hence the gradient vector is of the form XT (y − µ).
We get the (j, k)th entry of the observed information by differentiating the expression above with
respect to βk:
∂2l(β)
∂βkβj
=
n∑
i=1
−xijxik exp(ηi).
We see that this is independent of y, so taking expectation with respect to y gives
Ijk = E
(
−∂
2l(β)
∂βkβj
)
=
n∑
i=1
xijxikwii,
where W is a diagonal matrix with entries wii = exp(ηi). Hence I = X
TWX.
(c) [Seen] The deviance of a model is twice the difference between the log likelihood for the saturated
model, and the log likelihood evaluated at the MLE of the model parameters.
D = 2 (l(y,y)− l(y, µ̂)) .
In the case of the Poisson, this gives
D = 2
n∑
i=1
yi log(yi)− yi − 2
n∑
i=1
yi log(µ̂i)− µ̂i = 2
n∑
i=1
yi log
(
yi
µ̂i
)
− (yi − µ̂i).
(d) [Seen Similar] We have no reason to reject the null hypothesis β2 = 0, because the p-value for this
test is 0.2378. This suggests that the data are consistent with a model in which µi = exp(β0+β1ti).
In this model, since β1 > 0, the spread of the disease is essentially unchecked as a function of time
- exponential increase in new cases.
(e) [Seen Similar] Assuming that the sample is large enough that the deviance is roughly χ2(n− p), the
residual deviance is rather large. This suggests overdispersion - the variance is larger than can be
accounted for by the functional dependency on the mean.
Could use a quasi-Poisson model to address this, estimating the dispersion parameter from the data
e.g. using the residual deviance φ̂ = Dn−p .
This would not change the point estimate β̂, but the standard errors for the entries of β̂ would
change by a factor of
√
φ̂ ≈ 2. Hence the coefficient for year would no longer be significantly
different from zero. The quadratic term would remain non-significant.
(f) [Unseen]
∗ The data are structured in time and so there may be autocorrelation between observations in
successive years.
∗ Poisson-distributed response suggests new cases as independent, rare events. Instead, more
likely to see clusters of cases - hence overdispersion.
MATH96051/MATH97082 Statistical Modelling II [SOLUTIONS] (2020) Page 4 of 12
3. (a) [Seen] The adjusted R2 is a measure of goodness of fit defined by
R¯2 = 1− (1−R2) n− 1
n− p− 1 ,
where n is the number of observations and p the number of identifiable parameters. It is an attempt
to adjust R2, which is not comparable between models with different numbers of predictors.
It can be used to compare the goodness of fit of two linear models fit to the same data, with different
numbers of variables.
(b) [Seen Similar] For an observation with operator a, the design matrix row would be (1, 0, 0, 0).
For an observation with operator b, the design matrix row would be (1, 1, 0, 0).
(c) [Seen Similar] H0 : βb = βc = βd = 0. The test statistic has an F (3, 16) distribution under the null
hypothesis.
(d) [Seen Similar] Balanced design, so the matrix XTX is invariant under permutation of the b, c, d
class labels. Hence the standard errors arising from the diagonal entries of the inverse of this matrix
must be identical.
(e) [Seen Similar] Standard error is the same as for the linear model, σ̂ = 0.3260.
(f) [Seen Similar] The intra-class correlation is given by
ρ =
σ2b
σ2b + σ
2
=
0.06808
0.06808 + 0.10625
.
(g) [Seen Similar] The asymptotic null distribution of the generalized likelihood ratio test statistic is
valid only when entries of the parameter vector are restricted to the interior of the parameter space
- here the value under consideration is on the boundary and the asymptotic result no longer applies.
Instead, could use a parametric bootstrap routine to estimate the probability, under the null
hypothesis that σ2b = 0, that the log likelihood ratio test statistic is as large as the one observed.
To do this, generate a large number of independent samples from the null model, and compute the
observed value of the test statistic.
(h) [Seen Similar] Can use a likelihood ratio test, with asymptotic chi square distribution, to compare
models on the same data but different fixed effect structures. However, the model given was estimated
using REML, and REML likelihoods cannot be compared between models with different fixed effect
structures. (REML involves the likelihood of transformed data, and the transformation is a function
of the fixed effect design matrix - in general may not even be probability densities on the same
dimension.) Hence need to fit with maximum likelihood rather than REML.
4. (a) [Seen]
∗ The random component specifies the probability distribution of the response variables.
Specifically, the components of y have pdf or pmf from an exponential family of distributions,
with E(Y ) = µ.
∗ The systematic component specifies a linear predictor η = Xβ as a function of the covariates
and the unknown parameters.
∗ The link function g may be any monotonic differentiable function. The link function provides a
functional relationship between the systematic component and the expectation of the response
in the random component; namely η = g(µ). [Seen]
(b) (i) [Seen Similar] pix = 1− exp(−ρx).
(ii) [Seen Similar] Assuming that the solution is well-mixed, and growth on different plates proceeds
independently,
Yx ∼ Binomial(n, pix).
(iii) [Unseen] By part (i), log(1− pix) = −ρx = − ρ02x .
Taking logs gives
log (− log(1− pix)) = log ρ0 − x log 2
Hence can apply the complementary log-log link function to get a GLM in which the intercept
β0 = log ρ0 can be used to estimate the unknown concentration.
(iv) [Unseen] Assuming that the the number n of plates at each dilution is sufficiently
large that maximum likelihood estimators of the β parameters can be taken to
be normally distributed, an approximate 95% confidence interval for log ρ0 is given
by 3.7443 ± 1.96 × 0.5207. Then a 95% confidence interval for ρ0 is given by
(exp (3.7443− 1.96× 0.5207) , exp (3.7443 + 1.96× 0.5207)) .
(c) (i) [Seen] Since the no link function is specified, R uses the canonical link. For the binomial family,
this is the logit link.
(ii) [Seen Similar] Intercept is the log odds for non-smokers
log
(
38
175
)
= (−1.5272).
(iii) [Seen Similar]
∆ = log
(
38
175
)
− log
(
18
19
)
(= 1.4731)
(iv) [Seen Similar]
SE(∆) =
√
1
18
+
1
19
+
1
38
+
1
175
(v) [Seen Similar] The intercept will change by an additive factor relating to the relative probability
of being sampled within the two disease conditions. The coefficient for smoking will be
unchanged. An important assumption is that the sampling probability depends only on disease
status, and not on the covariate (in this case, smoking status).
5. (a) ∗ Since trees are nested within treament, the α and β parameters are not identifiable: the individual
tree effects and the treatment effects are confounded.
∗ The model assumes that trees are wholly unrelated - we cannot use the result obtained to
generalize to the population of trees beyond the six that have been studied.
(b) There are 4× 3× 2 = 24 observations.
The design matrix for m0 contains an intercept column and a column indicating which observations
are in the second CO2 condition. These two columns are linearly independent, giving a design matrix
of rank 2, so there are 24− 2 = 22 residual degrees of freedom.
The design matrix for m1 contains an intercept column and a column indicating which observations
are in the second CO2 condition. It contains five further columns specifying (by means of a binary
indicator) which observations come from tree k = 2, 3, 4, 5, 6, however, the sum of the k = 4, 5, 6
columns is equal to the column indicating the second CO2 condition. Hence the rank of the design
matrix is 1 + 1 + 5− 1 = 6 and so there are 24− 6 = 18 residual degrees of freedom.
(c)
F =
(2.1348− 0.8604)/4
0.8604/18
(= 6.665) .
Under H0, this statistic has the F (4, 18) distribution.
(d) Suppose that we use a random effect to model differences between trees,
yi = αj + bk + i,
where bk ∼ N(0, σ2b ) and i ∼ N(0, σ2). We can obtain an unbiased estimate of σ2 from least
squares in the usual way.
For balanced data, averaging over levels of a random effect yields a simplified model. Consider
sample average for each tree, this is just
y¯k = αj + ek,
where tree k is in condition j, and the errors ek ∼ N(0, σ2b + σ2/4) are independent.
Least squares can now be used to obtain an unbiased estimate of σ2b + σ
2/4.
We can now obtain an unbiased estimate of σ2b by combining the two variance estimates.
(e) Suppose we have an intial value θ0, close to the optimum. The method approximates the log
likelihood by a quadratic function of the parameters θ (Taylor expansion to second order), and seeks
the value of θ that maximizes the quadratic approximant. This can be determined analytically to be
θ1 = θ0 −
(∇2l)−1∇l.
This procedure is iterated using θ1 as an initial value, and in regular circumstances, will converge to
a local maximizer of the log likelihood.
(f) Should see sketch similar to fig 2.5 in the reading material. If the inverse of the Hessian is a negative
definite matrix Q, then by Taylor expansion of l to first order, for sufficiently small α > 0,
l(θ0 − αQ∇l) ≈ l(θ0)− α (∇l)T Q (∇l) > l(θ0),
since (∇l)T Q (∇l) < 0.
(g) For a Normal linear model, the log likelihood is a quadratic function of its parameters, so the Newton
step performs exact maximization in a single step.
ExamModuleCode QuestionNComments for Students
MATH96051 1
Generally good. Some students struggle with both the precise definition and interpretations of
leverage. Also, struggle with the interpretation of non‐singular design matrix. Likelihood for
linear model not always properly specified.
MATH96051 2
generally good. Some confusion about link functions, Some sloppy interpretation of signifiance
tests.
MATH96051 3
(b) Many answers either forgot the intercept completely or removed it for some categories, (d)
many answers lacked detail, (e) "State" meant the answer didn't need to be derived (and was
available in the R output). The other parts were mostly answered reasonably.
MATH97082 4 Well attempted for the most part
MATH97082 5 Well attempted for the most part
If your module is taught across multiple year levels, you might have received this form for each level of the module. You are only required to fill this out once for each question.
Please record below, some brief but non‐trivial comments for students about how well (or otherwise) the questions were answered. For example, you may wish to comment on common errors and
misconceptions, or areas where students have done well. These comments should note any errors in and corrections to the paper. These comments will be made available to students via the MathsCentral
Blackboard site and should not contain any information which identifies individual candidates. Any comments which should be kept confidential should be included as confidential comments for the Exam
Board and Externals. If you would like to add formulas, please include a sperate pdf file with your email.

欢迎咨询51作业君