Chapter 6. Models for Multi-categorical Responses:
Multivariate Extensions of GLM
MAST90139 Statistical Modelling for Data Science Slides
Guoqi Qian
SCHOOL OF MATHEMATICS AND STATISTICS
THE UNIVERSITY OF MELBOURNE
Multivariate GLM (Ch6) Guoqi Qian 1 / 143
Outline
1 §6.1 Four examples
§6.1.1 Example 1. Caesarian birth study
§6.1.2 Example 2. Breathing test results
§6.1.3 Example 3. Visual impairment study
§6.1.4 Example 4. Ohio children respiratory infection
2 §6.2 Multicategorical response models
§6.2.1 Multinomial distribution and Data
§6.2.2 The multi-categorical logit model
§6.2.3 Multivariate generalised linear models (MGLM)
3 §6.3 Models for nominal responses
§6.3.1 Principle of maximum random utility
§6.3.2 Modelling explanatory variables: choice of design matrix
4 §6.4 Models for ordinal responses
§6.4.1 Cumulative (or threshold) models
§6.4.2 Extended version of cumulative models
§6.4.3 Link functions and design matrices for cumulative models
§6.4.4 Sequential models and other approaches
5 §6.5 Statistical inference for MGLM (optional)
6 §6.6 Multivariate models for correlated responses
§6.6.1 Conditional models (optional)
§6.6.2 Margin models
§6.6.3 Marginal models for longitudinal data
Multivariate GLM (Ch6) Guoqi Qian 2 / 143
Chapter 6 Outline
In this chapter we consider the following multivariate extensions of GLM:
1 One response variable with k > 2 nominal categories;
2 One response variable with k > 2 ordinal categories;
3 Multiple correlated binary response variables.
Cases 1 and 2 refer to the polychotomous response which is often
modelled by a multinomial distribution.
Case 3 is often seen in situations involving repeated measurements or
longitudinal data.
We first present four examples which will be extensively analysed in
this chapter.
Multivariate GLM (Ch6) Guoqi Qian 3 / 143
Example 1. Caesarian birth study (1)
Table 1: Data on classification of 251 births by caesarian section
Caesarian planned Caesarian not planned
Infection Infection
I II non I II non
Antibiotics
Risk-factors 0 1 17 4 7 87
No risk-factors 0 0 2 0 0 0
No antibiotics
Risk-factors 11 17 30 10 13 3
No risk-factors 4 4 32 0 0 9
Response variable — Infection, with 3 levels (I, II, non)
Three dichotomous covariates (factors)
1 CP: caesarian planned or not;
2 RF: risk factors (e.g. diabetes, over-weight, etc.);
3 AB: antibiotics given as a prophylaxis?
Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 4 / 143
Example 1. Caesarian birth study (2)
Aim: Analysing the effects of the covariates on the risk of infection.
Note:Response variable is also called dependent variable or outcome
variable.
Note: Covariate is also called independent variable, explanatory
variable, or predictor.
The response variable Infection is categorical having > 2 levels.
The level values are nominal, but may also be treated as ordinal
which makes the analysis even more difficult.
Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 5 / 143
Example 1. Caesarian birth study (3)
The data in Table 1 are grouped data. They can be converted to
ungrouped data.
Individual CP RF AB Infection
1 1 1 1 II
2 1 1 1 non
...
...
...
...
...
18 1 1 1 non
19 1 0 1 non
20 1 0 1 non
21 1 1 0 I
...
...
...
...
...
251 0 0 0 non
CP=1 if yes and 0 if no; RF=1 if yes and 0 if no; AB=1 if yes and 0 if no.
Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 6 / 143
Example 2. Breathing test results
Forthofer & Lehnen (1981) considered the effect of age & smoking on
breathing test results (BTRs) for workers in industrial plants in Texas.
The response variable is BTR, ordinal with K = 3 levels.
Age (2 levels) and smoking status (3 levels) are predictors.
Table 2: BTS of Houston industrial workers
Breathing test results
Age Smoking status Normal Borderline Abnormal
< 40
Never smoked 577 27 7
Former smoker 192 20 3
Current smoker 682 46 11
40 ∼ 59
Never smoked 164 4 0
Former smoker 145 15 7
Current smoker 245 47 27
Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 7 / 143
Example 3. Visual impairment study
Table 3: Visual impairment data, from Liang, Zeger & Qaqish (1992)
White Black
Visual Age
impairment 40-50 51-60 61-70 70+ 40-50 51-60 61-70 70+ Total
Left eye
Yes 15 24 42 139 29 38 50 85 422
No 617 557 789 673 750 574 473 344 4777
Right eye
Yes 19 25 48 146 31 37 49 93 448
No 613 556 783 666 748 575 474 336 4751
Vector binary response variable (y1, y2), where y1 = 1 if left-eye
impaired, 0 otherwise; y2 = 1 if right-eye impaired, 0 otherwise.
Covariates: Age (yrs.), Race (W or B).
Aim: find the effect of race and age on visual impairment.
Complication: y1 and y2 are correlated.
Methods: multivariate models for correlated responses; conditional
models; asymmetric models, marginal models, GEE, etc..
Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 8 / 143
Example 4. Respiratory infection in Ohio children (1)
Table 4: Presence and absence of respiratory infection
Mother did not smoke Mother smoked
Age of child Frequency Age of child Frequency
7 8 9 10 7 8 9 10
0 0 0 0 237 0 0 0 0 118
0 0 0 1 10 0 0 0 1 6
0 0 1 0 15 0 0 1 0 8
0 0 1 1 4 0 0 1 1 2
0 1 0 0 16 0 1 0 0 11
0 1 0 1 2 0 1 0 1 1
0 1 1 0 7 0 1 1 0 6
0 1 1 1 3 0 1 1 1 4
1 0 0 0 24 1 0 0 0 7
1 0 0 1 3 1 0 0 1 3
1 0 1 0 3 1 0 1 0 3
1 0 1 1 2 1 0 1 1 1
1 1 0 0 6 1 1 0 0 4
1 1 0 1 2 1 1 0 1 2
1 1 1 0 5 1 1 1 0 4
1 1 1 1 1 1 1 1 1 7
Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 9 / 143
Example 4. Respiratory infection in Ohio children (2)
Data in Table 4 are grouped ones. They can be equivalently expressed as
the following ungrouped ones:
individual mother smoking status Infection History
1 0 0 0 0 0
...
...
...
...
...
...
237 0 0 0 0 0
238 0 0 0 0 1
...
...
...
...
...
...
247 0 0 0 0 1
...
...
...
...
...
...
531 1 1 1 1 1
...
...
...
...
...
...
537 1 1 1 1 1
Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 10 / 143
Example 4. Respiratory infection in Ohio children (3)
Data come from the Harvard Study of Air Pollution and Health
(Laird, Beck & Ware, 1984).
537 children were exam annually from age 7 to 10 on the presence or
absence of R.I.
Response variable: presence/absence of R.I.
Repeated measurements of the response variable for each child,
regarded as a short time series, may be referred to as longitudinal
data. Responses of one child may be correlated.
Covariate: mother’s smoking status (regular, non-regular) at the
beginning of the study.
Aim: Analysing the effect of mother’s smoking on children’s R.I.
Methods: Multivariate models for correlated responses, generalised
estimating equations (GEEs), generalised linear mixed effects models
(GLMM).
For examples 1 to 4, statistical modelling techniques and models are
needed for performing data analysis and drawing conclusions.
Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 11 / 143
§6.2.1 Multinomial distribution (1)
Multinomial distribution is an important one for multi-categorical response
variables.
Response variable Y has k levels, labelled 1, 2, · · · , k, i.e.
Y ∈ {1, 2, · · · , k}.
Y can be represented by y = (y1, · · · , yq)T , q = k − 1, with binary
dummy variable yr =
{
1 if Y = r
0 otherwise
, r = 1, · · · , q.
Hence Y = r ⇐⇒ y = (0, · · · , 0, 1︸ ︷︷ ︸
r
, 0, · · · , 0)T and
Pr(Y = r) = Pr(yr = 1 and yj = 0 for j 6= r).
Suppose we have m independent observations of Y : Y1, · · · ,Ym.
They can be represented by y1, · · · , ym, with yi = (yi1, · · · , yiq)T ,
i = 1, · · · ,m; with yir = I (Yi = r).
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 12 / 143
§6.2.1 Multinomial distribution (2)
Suppose pir = Pr(Yi = r), r = 1, · · · , k, remain constants for all
i = 1, · · · ,m. Note
k∑
r=1
pir = 1 and pi = (pi1, · · · , piq)T , q = k − 1.
Let y˜ =
m∑
i=1
yi =
(
m∑
i=1
yi1, · · · ,
m∑
i=1
yiq
)T
denoted
= (y˜1, · · · , y˜q)T which
is a vector of frequencies of Yi = r , r = 1, · · · , q. Then y˜ follows a
multinomial distribution M(m,pi) with pmf
Pr (y˜ = (m1, · · · ,mq)) =
m!
m1! · · ·mq!(m−m1−· · ·−mq)!pi
m1
1 · · ·pimqq (1−pi1−· · ·−piq)m−m1−···−mq
It can be shown that E (y˜) = (mpi1, · · · ,mpiq)T = mpi.
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 13 / 143
§6.2.1 Multinomial distribution (3)
It can also be shown that
Cov(y˜) = m
[
diag(pi)− pipiT
]
=

mpi1(1−pi1) −mpi1pi2 · · · −mpi1piq
−mpi2pi1 mpi2(1−pi2) · · · −mpi2piq
...
...
. . .
...
−mpiqpi1 −mpiqpi2 · · · mpiq(1−piq)

Let y¯ = y˜/m be the vector of relevant frequencies.
It can be shown that E (y¯) = (pi1, · · · , piq)T = pi and
Cov(y¯) =
1
m
[
diag(pi)− pipiT ] = 1
m2
Cov(y˜).
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 14 / 143
§6.2.1 Data (1)
Data under analysis are usually of two forms:
Ungrouped data
Response variable obs. Explanatory variables obs.
Unit 1

y11 · · · y1q
... · · · ...
yi1 · · · yiq
... · · · ...
yn1 · · · ynq
 =

yT1
...
yTi
...
yTn

n×q
,

x11 · · · x1L
... · · · ...
xi1 · · · xiL
... · · · ...
xn1 · · · xnL
 =

xT1
...
xTi
...
xTn

n×L
...
Unit i
...
Unit n
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 15 / 143
§6.2.1 Data (2)
Grouped data
Suppose y
(1)
i , · · · , y(ni )i are response observations with fixed covariate
observations xi = (xi1, · · · , xiL)T .
Let y¯i =
1
ni
∑ni
j=1 y
(j)
i be the mean vector over the group-i units,
giving the relative frequencies of (Yi = 1, · · · ,Yi = q).
Let y˜i =
∑ni
j=1 y
(j)
i be the total vector over the group-i units, giving
the frequencies of (Yi = 1, · · · ,Yi = q).
Then the grouped data have the following form
Group Size Response variable obs. Explanatory variables obs.
1 n1

y∗11 · · · y∗1q
... · · ·
...
y∗i1 · · · y∗iq
... · · ·
...
y∗g1 · · · y∗gq
 =

y∗T1
...
y∗Ti
...
y∗Tg

g×q
,

x11 · · · x1L
... · · ·
...
xi1 · · · xiL
... · · ·
...
xg1 · · · xgL
 =

xT1
...
xTi
...
xTg

g×L
...
...
i ni
...
...
g ng
where y∗i is either the group mean y¯i or the group total y˜i , i = 1, · · · , g .
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 16 / 143
§6.2.2 The multi-categorical logit model (1)
In the multinomial response case pii = µi = E (yi |xi ) is a q × 1
vector. pii = (pii1, · · · , piiq)T .
The structure part of the GLM has the form pii = h(Ziβ), where
h(·) is a q × 1 vector-valued response function (also called inverse
Zi is a q × p design matrix constructed from xi ;
β is a p × 1 unknown vector parameter.
The q × 1 vector linear predictor for unit i is ηi = Ziβ.
And
[
ηT1 , , · · · ,ηTn
]T
= vec(η1, · · · ,ηn) is the nq × 1 vector linear
predictor for all n units.
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 17 / 143
§6.2.2 The multi-categorical logit model (2)
A multi-categorical logit model is expressed as
piir = P(Yi = r) =
exp
(
βr0 + zTi βr
)
1 +
∑q
s=1 exp
(
βs0 + zTi βs
) , r = 1, 2, · · · , q
where zi is an (a− 1)× 1 vector from xi . Equivalently,
log
piir
piik
= log
P(Yi = r)
P(Yi = k)
= βr0 + z
T
i βr , r = 1, 2, · · · , q.
Note piik = 1−
∑q
s=1 piis for all i = 1, · · · , n.
We see for this model, h = (h1, · · · , hq)T is a q × 1 vector response
function, where for each unit i ,
hr = hr (ηi ) = hr (Ziβ) =
exp(ηir )
1+
∑q
s=1 exp(ηis)
; r =1, 2, · · ·, q; i =1, · · ·, n
where ηir = βr0 + z
T
i βr and ηi = (ηi1, · · · , ηiq)T .
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 18 / 143
§6.2.2 The multi-categorical logit model (3)
We also see for this model, the design matrix for each unit i is
Zi =

1 zTi 0 0
T · · · 0 0T
0 0T 1 zTi · · · 0 0T
...
...
...
...
. . .
...
...
0 0T 0 0T · · · 1 zTi

q×(aq)
β = (β10,β
T
1 , · · · , βq0,βTq )T is an (aq)× 1 vector parameter. η1...
ηn
 = [ηT1 , · · · ,ηTn ]T = [η11, · · · , η1q, · · · , ηn1, · · · , ηnq]T
=
[
(Z1β)
T , · · · , (Znβ)T
]T
=
 Z1β...
Znβ
=
 Z1...
Zn

(nq)×(aq)
β
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 19 / 143
§6.2.2 The multi-categorical logit model (4)
Note that the vector link function of this multi-categorical logit model
is g = (g1, · · · , gq)T , where for each unit i ,
gr = gr (pii1, · · · , piiq) = log piir
piik
= log
piiir
1− (pii1 + · · ·+ piiq) ;
r = 1, · · · , q and i = 1, · · · , n.
It shows that a multivariate GLM with multinomial responses is
characterised by two specifications:
the vector response function h (or the vector link function g = h−1);
the design matrix that depends on the covariates and the model.
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 20 / 143
§6.2.2 Example: Caesarian birth study (1)
Consider the data on infection following Caesarian birth given in §6.1.
Now we want to distinguish between the two different types of
infection.
Therefore the response variable Y has 3 possible outcomes: 1 for type
I infection, 2 for type II infection, and 3 for no infection.
Represent Yi by 2 dummy variables (yi1, yi2), where
yi1 = I (the ith Caesarian was followed by infection I)
yi2 = I (the ith Caesarian was followed by infection II)
There are 3 binary covariates: NoPlan, Antib, RiskF, coded by 3
dummy variables:
NoPlan = I (the Caesarian has not been planned)
Antib = I (antibiotics were given as a prophylaxis)
RiskF = I (risk factors were present)
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 21 / 143
§6.2.2 Example: Caesarian birth study (2)
The data can be condensed in the grouped data (group total) form.
Size Response variable obs. Explanatory variables obs.
(y˜i1, y˜i2) NoPlan, Antib, RiskF
Group 1 n1 = 40

4 4
11 17
0 0
0 1
0 0
10 13
4 7


0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 1

Group 2 n2 = 58
Group 3 n3 = 2
Group 4 n4 = 18
Group 5 n5 = 9
Group 6 n6 = 26
Group 7 n7 = 98
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 22 / 143
§6.2.2 Example: Caesarian birth study (3)
We fit the following multi-categorical logit model to the data
log
pii1
pii3
= log
P(infection type I)
P(no infection)
= β10 + β1NNoPlan + β1AAntib + β1RRiskF
log
pii2
pii3
= log
P(infection type II)
P(no infection)
= β20 + β2NNoPlan + β2AAntib + β2RRiskF
Equivalently,
P(infection type j)
P(no infection)
= eβj0eβjNNoPlaneβjAAntibeβjRRiskF, j = I, II.
Thus, the exponential of the parameter gives the factorial
contribution to the odds if the corresponding explanatory variable
takes value 1 instead of 0.
Alternatively, the exponential of the parameter may be seen as odds
ratio between odds for value 1 and odds for value 0.
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 23 / 143
§6.2.2 Example: Caesarian birth study (4)
For example, for NoPlan one obtains
eβjN =
P(infection type j |NoPlan=1,Antib,RiskF)
P(no infection|NoPlan=1,Antib,RiskF)
P(infection type j |NoPlan=0,Antib,RiskF)
P(no infection|NoPlan=0,Antib,RiskF)
, j = I, II.
where Antib and RiskF can take any fixed values.
Note the vector-valued link function for this model is
g(pii1, pii2) =
(
log
pii1
1− pii1 − pii2 , log
pii2
1− pii1 − pii2
)T
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 24 / 143
§6.2.2 Example: Caesarian birth study (5)
The estimates of the parameters and their exponential are given in
the following table
Table 5: Parameter estimates in Caesarian birth study
β exp(β) β exp(β)
constant -2.621 0.072 constant -2.560 0.077
NoPlan (I) 1.174 3.235 NoPlan (II) 0.996 2.707
Antib (I) -3.520 0.030 Antib (II) -3.087 0.046
RiskF (I) 1.829 6.228 RiskF (II) 2.195 8.980
NoPlan of 1 vs. 0 increases the type I infection odds by a factor of
3.235.
Taking Antib decreases the type I infection OR to 0.030 (from 1).
RiskF increases the type I infection odds ratio to 6.228.
Similar things can be said of the type II infection odds ratio.
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 25 / 143
§6.2.2 Example: Caesarian birth study (6)
is.data.frame(Caes.dat) #TRUE
Caes.dat
size noInf Inf1 Inf2 NoPlan Antib RiskF
1 40 32 4 4 0 0 0
2 58 30 11 17 0 0 1
3 2 2 0 0 0 1 0
4 18 17 0 1 0 1 1
5 9 9 0 0 1 0 0
6 26 3 10 13 1 0 1
7 98 87 4 7 1 1 1
library(nnet)
##package nnet is used for fitting multinomial log-linear model
##or multi-categorical logit model.
Caes1=multinom(as.matrix(Caes.dat[,2:4])~NoPlan+Antib+RiskF,data=Caes.dat)
# weights: 15 (8 variable)
initial value 275.751684
iter 10 value 161.068578
final value 160.937147
converged
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 26 / 143
§6.2.2 Example: Caesarian birth study (7)
Caes1=multinom(as.matrix(Caes.dat[,2:4])~NoPlan+Antib+RiskF,data=Caes.dat)
summary(Caes1)
Call:
multinom(formula=as.matrix(Caes.dat[,2:4]) ~ NoPlan + Antib + RiskF, data=Caes.dat)
Coefficients:
(Intercept) NoPlan Antib RiskF
Inf1 -2.621012 1.1742513 -3.520249 1.829241
Inf2 -2.559917 0.9959762 -3.087160 2.195458
Std. Errors:
(Intercept) NoPlan Antib RiskF
Inf1 0.5567209 0.5213014 0.6717416 0.6023322
Inf2 0.5462995 0.4813634 0.5498675 0.5869601
Residual Deviance: 321.8743 AIC: 337.8743
attributes(Caes1)
$names  "n" "nunits" "nconn" "conn" "nsunits"  "decay" "entropy" "softmax" "censored" "value"  "wts" "convergence" "fitted.values" "residuals" "call"  "terms" "weights" "deviance" "rank" "lab"  "coefnames" "vcoefnames" "xlevels" "edf" "AIC"$class
 "multinom" "nnet"
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 27 / 143
§6.2.2 Example: Caesarian birth study (8)
Caes1$fitted ##fitted category probabilities noInf Inf1 Inf2 1 0.8695347 0.063240568 0.067224753 2 0.4656329 0.210951192 0.323415876 3 0.9943521 0.002140051 0.003507889 4 0.9568456 0.012827892 0.030326540 5 0.6922135 0.162899513 0.144886971 6 0.2300766 0.337273035 0.432650355 7 0.8855925 0.038416539 0.075990954 Caes1$residuals
#resid(Caes1) does the same thing, = category means - fitted category probabilities.
noInf Inf1 Inf2
1 -0.069534679 0.036759432 0.032775247
2 0.051608447 -0.021296019 -0.030312428
3 0.005647940 -0.002140051 -0.003507889
4 -0.012401124 -0.012827892 0.025229016
5 0.307786484 -0.162899513 -0.144886971
6 -0.114691994 0.047342349 0.067349645
7 0.002162595 0.002399788 -0.004562382
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 28 / 143
§6.2.2 Example: Caesarian birth study (9)
Caes1$deviance #residual deviance of the model, also given by deviance(Caes1)  321.8743 Caes1$edf
 8
predict(Caes1, data.frame(NoPlan=1, Antib=1, RiskF=0), type="probs")
noInf Inf1 Inf2
0.983753294 0.006850796 0.009395909
predict(Caes1, data.frame(NoPlan=1, Antib=1, RiskF=0), type="class")
 noInf
Levels: noInf Inf1 Inf2
predict(Caes1, data.frame(NoPlan=1, Antib=1, RiskF=0), type="probs", se.fit=TRUE)
noInf Inf1 Inf2
0.983753294 0.006850796 0.009395909
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 29 / 143
§6.2.2 Example: Caesarian birth study (10)
step(Caes1)
Start: AIC=337.87
as.matrix(Caes.dat[, 2:4]) ~ NoPlan + Antib + RiskF
trying - NoPlan
# weights: 12 (6 variable)
...................................................................
converged
Df AIC
8 337.8743
- NoPlan 6 340.8649
- RiskF 6 358.3103
- Antib 6 397.4663
###The model ~ NoPlan + Antib + RiskF has the smallest AIC and is the best model.
Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 30 / 143
§6.2.2 Example: Caesarian birth study (11)
Caes2=multinom(as.matrix(Caes.dat[,2:4])~NoPlan+Antib,data=Caes.dat)
summary(Caes2)
Coefficients:
(Intercept) NoPlan Antib
Inf1 -1.431678 1.255595 -2.962218
Inf2 -1.058278 1.086508 -2.484319
Std. Errors:
(Intercept) NoPlan Antib
Inf1 0.2870129 0.4954738 0.6408664
Inf2 0.2470012 0.4475669 0.5136334
Residual Deviance: 346.3103
AIC: 358.3103
Caes2$edf  6 Caes2$deviance-Caes1$deviance  24.43605 1-pchisq(24.43605,2)  4.940594e-06 ## Hence RiskF has significant effect on the response variable. Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 31 / 143 §6.2.3 Extending Multicategory logit model to MGLM (1) Consider a multivariate q × 1 response variable yi for unit i . Let µi = E (yi |xi ), i = 1, · · · , n. Distributional assumption: The yi ’s given respective xi ’s are independent. Each yi has a distribution belonging to a simple (multivariate) exponential family with pdf f (yi |θi , φ, ωi ) = exp { yTi θi − b(θi ) φ · ωi + c(yi , φ, ωi ) } Structural assumption: µi ’s depends on the linear predictor ηi = Ziβ in the form µi = h(ηi ) = h(Ziβ) = [h1(Ziβ), · · · , hq(Ziβ)]T , i = 1, · · · , n where the response function h : S → M is defined on S ⊂ Rq, taking values in M ⊂ Rq; Zi is a q × p design matrix for unit i ; β = (β1, · · · , βp)T is an unknown parameter vector from B ⊂ Rp. Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 32 / 143 §6.2.3 Extending Multicategory logit model to MGLM (2) For the case of a multi-categorical response, one has to consider the multinomial distribution which may be embedded into the framework of a simple multivariate exponential family. For yi = (yi1, · · · , yiq)T ∼ Multinomial(ni ,pii ), the pmf of y¯i = n−1i yi has the form f (y¯i |θi , φ, ωi ) = exp { y¯Ti θi − b(θi ) φ · ωi + c(y¯i , φ, ωi ) } , i = 1, · · · , g where the natural parameter θi is given by θi = [ log pii1 piik , · · · , log piiq piik ]T ; piik = 1− q∑ j=1 piij ; and b(θi )=− log(1−pii1−· · ·−piiq) = − log piik = log ( 1+eθi1 +· · ·+eθiq); c(yi , φ, ωi ) = log ni ! yi1!···yiq!(ni−yi1−···−yiq)! ; and ωi = ni . Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 33 / 143 §6.3.1 Principle of maximum random utility (1) Models for response variables with unordered categories may be motivated from a consideration of latent variables. In probabilistic choice theory, it is often assumed that an unobserved utility Ur is associated with category r of the response variable Y . Let Ur be a latent variable associated with category r . Now assume Ur = ur + εr , with ur being a fixed value and ε1, · · · , εk i.i.d. having continuous cdf F . Following the principle of maximum random utility Y = r ⇔ Ur = max{U1, · · · ,Uk}, r = 1, · · · , k. Now it follows that P(Y = r) = P(Ur − U1 ≥ 0, · · · ,Ur − Uk ≥ 0) = P(ε1 ≤ ur − u1 + εr , · · · , εk ≤ ur − uk + εr ) = ∫ +∞ −∞ s 6=r F (ur−us +ε) · f (ε)dε, where f =F ′ is the pdf of εr . Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 34 / 143 §6.3.1 Principle of maximum random utility (2) Different distributional assumption for εr ’s yields different models. If εr ’s are i.i.d. N(0, 1), one gets independent probit model for Y . (This is true if k = 2. But I am not able to prove it when k > 2.) Try the case k = 3. Then Ui ∼ N(ui , 1), i = 1, 2, 3; and U1,U2,U3 are independent. Let W1 = U1 − U2 and W2 = U1 − U3. Then[ W1 W2 ] ∼ N ([ u1 − u2 u1 − u3 ] , [ 2 1 1 2 ]) , with joint pdf f (w1,w2) = 1 2 3pi exp { −1 6 [ w1−(u1−u2) w2−(u1−u3) ]T[ 2 −1 −1 2 ][ w1−(u1−u2) w2−(u1−u3) ]} = 1 2 3pi e{− 13 [(w1−(u1−u2))2−(w1−(u1−u2))(w2−(u1−u3))+(w2−(u1−u3))2]} Then e.g. P(Y =1)=P(U1−U2 ≥ 0,U1−U3 ≥ 0)=P(W1 ≥ 0,W2 ≥ 0) = ∫∞ 0 ∫∞ 0 f (w1,w2)dw1dw2 = ∫∞ 0 1 4 pi e −1 4 (w2−(u1−u3))2 [ erf ( 1 2 3 [w2−(u1−u3)+2(u1−u2)] ) +1 ] dw2 where erf(x) = 2√ pi ∫ x 0 e−t 2 dt = 2[Φ( 2x)− 12 ]. This does not seem to result in a probit model. Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 35 / 143 §6.3.1 Principle of maximum random utility (3) If ε1, · · · , εk i.i.d. following the extreme-value distribution with cdf F (x) = exp{−e−x} and pdf f (x) = exp{−e−x} exp(−x), then we get the multinomial logit model P(Y = r) = eur∑k s=1 e us = eur−uk 1+ ∑q s=1 e us−uk = e u˜r 1+ ∑q s=1 e u˜s , with u˜r = ur − uk . We see a specific cdf F determines the link or response function of the model. Proof. P(Y = r) = ∫ +∞ −∞ s 6=r exp{−e−ur+us−ε} · exp{−e−ε} · exp{−ε}dε = ∫ +∞ −∞ e− s 6=r e us−ur−ε−e−εe−εdε = ∫ +∞ −∞ e−[ s 6=r e us−ur +1]e−εe−εdε = e−[ s 6=r e us−ur +1]e−ε∑ s 6=r e us−ur + 1 ∣∣∣∣∣ +∞ −∞ = 1∑ s 6=r e us−ur + 1 = eur∑k s=1 e us . Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 36 / 143 §6.3.2 Choice of design matrix (1) Explanatory variables and their coefficients in a linear predictor function ηi = Ziβ are the terms influencing the response variable. Assume the response variable has multiple categories. Explanatory variables can be classified into 3 types: 1 global: the ones whose values depend on individuals but not the response categories; 2 alternative-specific: the ones whose values depend on the response categories but not the individuals; 3 neither (1) or (2); or both (1) and (2). The coefficient parameters (which always do not depend on individuals) are classified into 1 global: whose values don’t depend on response categories; 2 category-specific: depend on categories. Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 37 / 143 §6.3.2 Choice of design matrix (2) Example. Suppose the mean utility for individual i is uir , satisfying uir = αr0 + z T i αr , r = 1, · · · , k; i = 1, · · · , n. Then zi is global and αr is category-specific. The multinomial logit model becomes P(Yi = r |zi ) = e βr0+zTi βr 1 + ∑q s=1 e βs0+zTi βs where βr0 + zTi βr = u˜ir = uir − uik = (αr0 − αk0) + zTi (αr −αk). Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 38 / 143 §6.3.2 Choice of design matrix (3) Example (continued 1) The associated GLM is pii1... piiq  =  P(Yi = 1|zi )... P(Yi = q|zi )  =  eηi1 1+ ∑q s=1 e ηis ... e ηiq 1+ ∑q s=1 e ηis  denoted= h(ηi ) where the linear predictor has the form ηi = Ziβ, with the design matrix for individual i being Zi =  1 zTi 0 0 T · · · 0 0T 0 0T 1 zTi · · · 0 0T ... ... ... ... . . . ... ... 0 0T 0 0T · · · 1 zTi  q×p and β = (β10,β T 1 , · · · , βq0,βTq )T is a p × 1 parameter vector. Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 39 / 143 §6.3.2 Choice of design matrix (4) Example (continued 2) If uir also depends on some alternative-specific explanatory variables, then uir may be assumed of the form uir = αr0 + z T i αr + w T r γ, r = 1, · · · , k; i = 1, · · · , n where wr is an alternative-specific explanatory vector variable and γ is a global parameter. Then u˜ir = uir − uik = βr0 + zTi βr + (wr −wk)Tγ. The resultant multinomial logit model will have the following design matrix for individual i Zi =  1 zTi 0 0 T · · · 0 0T (w1 −wk)T 0 0T 1 zTi · · · 0 0T (w2 −wk)T ... ... ... ... . . . ... ... ... 0 0T 0 0T · · · 1 zTi (wq −wk)T  q×p and β = (β10,β T 1 , · · · , βq0,βTq ,γT )T is a p × 1 parameter vector. Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 40 / 143 §6.4 Models for ordinal responses Response variable having k > 2 categories may be of ordinal nature. Example: Breathing test results Ordinal variables may come from quite different mechanisms. Two distinguished types are: grouped continuous and assessed ordered categorical variables. Grouped continuous — merely a categorized continuous variable Assessed ordered — obtained after an assessor’s judgment. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 41 / 143 §6.4.1 Cumulative models: the threshold approach Suppose there is an unobserved continuous variable U for the response variable Y having k categories. Suppose Y is determined by U in the following way Y = r ⇔ θr−1 < U ≤ θr ; r = 1, · · · , k where −∞ = θ0 < θ1 < · · · < θk = +∞ unknown. That means Y is a coarser (categorized) version of U determined by the thresholds θ1, · · · , θk−1. Suppose U depends on a p × 1 vector explanatory variable x in the following way U = −xTγ + ε where γ = (γ1, · · · , γp)T is an unknown vector parameter and ε is random with cdf F . It follows that P(Y ≤ r |x) = P(U ≤ θr ) = F (θr + xTγ). This is called a cumulative model or threshold model with cdf F . Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 42 / 143 §6.4.1 Cumulative logistic (or proportional odds) model (1) If choose F (x) = [ 1 + e−x ]−1 which is the cdf of a logistic distribution, then P(Y ≤ r |x) = e θr+xTγ 1 + eθr+x Tγ , r = 1, · · · , q = k − 1. (1) It is equivalent to log P(Y ≤ r |x) P(Y > r |x) = θr + x Tγ or (2) P(Y ≤ r |x) P(Y > r |x) = exp{θr + x Tγ} (3) The model given by (1), (2) or (3) is called the cumulative logistic model. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 43 / 143 §6.4.1 Cumulative logistic (or proportional odds) model (2) The cumulative logistic model is also called a proportional odds model due to following interpretation: Suppose there are 2 populations (groups) identified by x1 and x2 respectively. Then P(Y ≤ r |x1) P(Y > r |x1) = e θr+xT1 γ and P(Y ≤ r |x2) P(Y > r |x2) = e θr+xT2 γ . It follows that cumulative odds ratio = P(Y ≤ r |x1)/P(Y > r |x1) P(Y ≤ r |x2)/P(Y > r |x2) = e (x1−x2)Tγ which is the same for all r = 1, · · · , q. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 44 / 143 §6.4.1 Grouped Cox (or proportional hazards) model If choose F as the cdf of the extreme minimal-value distribution, i.e. F (x) = 1− exp{−ex}, then P(Y ≤ r |x) = 1− exp{−eθr+xTγ} (4) ⇐⇒ log [− log P(Y > r |x)]︸ ︷︷ ︸ complementary log-log link = θr + x Tγ, r = 1, · · · , q Model (4) is referred to as the grouped Cox model or proportional hazards model due to the following interpretation: For 2 populations (groups) identified by x1 and x2 respectively log P(Y > r |x1) = log P(Y > r |x2) · e(x1−x2)Tγ , r = 1, · · · , q Differentiating w.r.t. r on both sides results in λ(r |x1) = λ(r |x2) · e(x1−x2)Tγ , ⇔ λ(r |x1) λ(r |x2) = e (x1−x2)Tγ , which is the same for all r = 1, · · · , q. Here λ(r |·) = − d dr logP(Y > r |·) is the hazard function. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 45 / 143 §6.4.1 Cumulative model based on extreme maximal-value CDF If choose F as the cdf of the extreme maximal-value distribution, i.e. F (x) = exp{−e−x}, then P(Y ≤ r |x) = exp{−e−(θr+xTγ)} (5) ⇐⇒ log [− log P(Y ≤ r |x)]︸ ︷︷ ︸ log-log link = −(θr + xTγ), r = 1, · · · , q. (6) Model (5) or (6) is referred to as the extreme maximal-value distribution model. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 46 / 143 S6.4.2 Extended version of cumulative models (1) Recall that an ordinal categorical response Y with k categories is determined by a latent variable U as Y = r ⇔ θr−1and U = −xTγ + ε, with ε having cdf F . Then a cumulative model or threshold model with cdf F is P(Y ≤ r |x) = P(U ≤ θr ) = F (θr + xTγ). (7) One may assume a linear form for thresholds θ1, · · · , θq, θr = βr0 + w Tβr where w is a vector explanatory variable and βr = (βr1, · · · , βrm)T is a category-specific parameter vector. Then an extended cumulative model is obtained: P(Y ≤ r |x,w) = F (βr0 + wTβr + xTγ). (8) Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 47 / 143 S6.4.2 Extended version of cumulative models (2) Remarks 1 Model (8) becomes unidentifiable if wi = xi for each individual i = 1, · · · , n; i.e., it will not be possible to separate γ from each βr if wi = xi . 2 So wi and xi must not be the same. 3 βr can be global, while γ can be replaced by a category-specific parameter. 4 wi are called threshold variables, while xi are called shift variables. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 48 / 143 §6.4.3 Link functions for cumulative models For a cumulative GLM, the link function g =(g1, · · · , gq), as a vector function of (pi1 = P(Y =1), · · · , piq = P(Y =q)), is given by gr (pi1, · · · , piq) = F−1(pi1 + · · ·+ pir ), r = 1, · · · , q, where F−1(·) is the inverse function for F that specifies the cumulative model pi1 + · · ·+ pir = P(Y ≤ r |x,w) = F (βr0 + wTβr + xTγ). F being the N(0, 1) cdf gives the probit link gr (pi1, · · · , piq) = Φ−1(pi1 + · · ·+ pir ), r = 1, · · · , q. For proportional odds model, the link is the cumulative logit gr (pi1, · · · , piq) = log pi1 + · · ·+ pir 1− (pi1 + · · ·+ pir ) , r = 1, · · · , q. For proportional hazards model, the link is the complement log-log gr (pi1, · · · , piq) = log{− log(1− (pi1 + · · ·+ pir ))}, r = 1, · · · , q. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 49 / 143 §6.4.3 Design matrices for cumulative models (1) For the simple cumulative model P(Yi ≤ r |xi ) = F (θr + xTi γ), the design matrix Zi in the linear predictor ηi = Ziβ is given by Zi =  1 xTi 1 xTi . . . ... 1 xTi  q×(q+dim(xi )) and β = (θ1, · · · , θq,γT )T . Here xi should not contain a constant component. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 50 / 143 §6.4.3 Design matrices for cumulative models (2) For the extended cumulative model P(Yi ≤ r |xi ,wi ) = F (βr0 + wTi βr + xTi γ), the design matrix Zi in the linear predictor ηi = Ziβ is given by Zi =  1 wTi x T i 1 wTi x T i . . . . . . ... 1 wTi x T i  q×(q(1+dim(wi ))+dim(xi )) and β = (β10,β T 1 , · · · , βq0,βTq ,γT )T . Here neither wi nor xi should contain a constant component. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 51 / 143 §6.4.3 Alternative formulation/parameterisation in cumulative models (1) The threshold parameters θ1, · · · , θq in cumulative models P(Y ≤ r |x) = F (θr + xTγ) are restricted by θ1 < θ2 < · · · < θq The corresponding restriction for the extended cumulative model is β10 + w Tβ1 < β20 + w Tβ2 < · · · < βq0 + wTβq which is more complex. If such constraints are not explicitly accounted for in estimation, the iterative estimation procedure may fail due to fitting inadmissible parameters. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 52 / 143 §6.4.3 Alternative formulation/parameterisation in cumulative models (2) For the simple cumulative model, the problem can be avoided by using an alternative formulation or reparameterisation: α1 = θ1, αr = log(θr − θr−1), r = 2, · · · , q ⇐⇒ θ1 = α1, θr = θ1 + r∑ i=2 eαi , r = 2, · · · , q. The new parameters α1, · · · , αq are unconstrained. The cumulative model under the new formulation has a linear structure of the following form F−1 (P(Y = 1|x)) = α1 + xTγ log [ F−1 (P(Y ≤ r |x))− F−1 (P(Y ≤ r − 1|x))] = αr , r = 2, · · · , q. The link function corresponding to the new formulation can be determined accordingly. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 53 / 143 §6.4.3 Alternative formulation/parameterisation in cumulative models (3) For the special case of the cumulative logistic model, one obtains the following link function (log-logit link) g1(pi1, · · · , piq) = log ( pi1 1− pi1 ) gr (pi1, · · · , piq) = log [ log { pi1 + · · ·+ pir 1− pi1 − · · · − pir } − log { pi1 + · · ·+ pir−1 1− pi1 − · · · − pir−1 }] , q = 2, · · · , q. If this alternative link function is used, the design matrix Zi has to be adapted. Now Zi =  1 xTi 1 0 . . . ... 1 0  q×(q+dim(xi )) and the parameter vector becomes β = (α1, · · · , αq,γT )T . Here xi should not contain a constant component. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 54 / 143 §6.4.3 Computing packages and an example The polr() function in the R MASS package can be used to fit cumulative models (but not extended cumulative models). More generally, the ordinal package in R can be used for fitting ordinal regression models. Example 2 in §6.1. Breathing test results. The response variable is BTR, ordinal with k = 3 levels. Age (2 levels) and Smoking status (3 levels) are predictors. The aim is to see how Age and Smoking are associated with BTR. Command polr in R MASS package is used in the analysis. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 55 / 143 §6.4.3 Breathing testing example (1) Usage of polr: polr(formula, data, weights, start, ..., subset, na.action, contrasts = NULL, Hess = FALSE, model = TRUE, method = c("logistic", "probit", "loglog", "cloglog", "cauchit")) Note the estimates of γ are those from polr multiplied by −1 due to the model used in polr being P(Y ≤ r |x) = F (θr − xTγ). > library(MASS); BTR.dat <- read.csv("D:/MAST90139/Example6-2BTR.csv"); BTR.dat BTR Age Smoking Freq 1 1Normal <40 1Never 577 2 2Border <40 1Never 27 3 3Abnorm <40 1Never 7 4 1Normal <40 2Former 192 5 2Border <40 2Former 20 6 3Abnorm <40 2Former 3 7 1Normal <40 3Current 682 8 2Border <40 3Current 46 9 3Abnorm <40 3Current 11 10 1Normal 40to59 1Never 164 11 2Border 40to59 1Never 4 12 3Abnorm 40to59 1Never 0 13 1Normal 40to59 2Former 145 14 2Border 40to59 2Former 15 15 3Abnorm 40to59 2Former 7 16 1Normal 40to59 3Current 245 17 2Border 40to59 3Current 47 18 3Abnorm 40to59 3Current 27 Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 56 / 143 §6.4.3 Breathing testing example (2) > is.factor(BTR.dat$BTR) #TRUE; > is.ordered(BTR.dat$BTR) #FALSE > is.factor(BTR.dat$Age) #TRUE; > is.factor(BTR.dat$Smoking) #TRUE > is.factor(BTR.dat$Freq) #FALSE;
> BTR.dat$BTR=as.ordered(BTR.dat$BTR) #Set BTR as an ordinal factor
> help(polr)
> BTR.logit=polr(BTR~Age+Smoking+Age:Smoking, data=BTR.dat, weights=Freq,
Hess=T,method="logistic")
> summary(BTR.logit)
Coefficients:
Value Std. Error t value
Age40to59 -0.8857 0.5359 -1.653
Smoking2Former 0.6973 0.2822 2.471
Smoking3Current 0.3472 0.2238 1.551
Age40to59:Smoking2Former 1.1458 0.6228 1.840
Age40to59:Smoking3Current 2.2007 0.5689 3.868
Intercepts:
Value Std. Error t value
1Normal|2Border 2.8334 0.1764 16.0603
2Border|3Abnorm 4.3078 0.2130 20.2210
Residual Deviance: 1564.968
AIC: 1578.968
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 57 / 143
§6.4.3 Breathing testing example (3)
BTR.logit1=polr(BTR~Age+Smoking,data=BTR.dat, weights=Freq,Hess=T,method="logistic")
summary(BTR.logit1)
Coefficients:
Value Std. Error t value
Age40to59 0.7772 0.1482 5.243
Smoking2Former 0.7815 0.2333 3.350
Smoking3Current 0.9607 0.1918 5.010
Intercepts:
Value Std. Error t value
1Normal|2Border 3.1927 0.1749 18.2544
2Border|3Abnorm 4.6543 0.2125 21.9076
Residual Deviance: 1589.744
AIC: 1599.744
anova(BTR.logit, BTR.logit1)
Likelihood ratio tests of ordinal regression models
Response: BTR
Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
1 Age + Smoking 2214 1589.744
2 Age + Smoking + Age:Smoking 2212 1564.968 1 vs 2 2 24.77585 4.168e-06
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 58 / 143
§6.4.3 Breathing testing example (4)
> BTR.logit$fitted 1Normal 2Border 3Abnorm 1 0.9444565 0.04225961 0.013283873 2 0.9444565 0.04225961 0.013283873 3 0.9444565 0.04225961 0.013283873 4 0.8943685 0.07930628 0.026325236 5 0.8943685 0.07930628 0.026325236 6 0.8943685 0.07930628 0.026325236 7 0.9231700 0.05813458 0.018695369 8 0.9231700 0.05813458 0.018695369 9 0.9231700 0.05813458 0.018695369 10 0.9763207 0.01815786 0.005521451 11 0.9763207 0.01815786 0.005521451 12 0.9763207 0.01815786 0.005521451 13 0.8671630 0.09895796 0.033879045 14 0.8671630 0.09895796 0.033879045 15 0.8671630 0.09895796 0.033879045 16 0.7633793 0.17036521 0.066255462 17 0.7633793 0.17036521 0.066255462 18 0.7633793 0.17036521 0.066255462 > BTR.logit$residual #does not exist.
> BTR.logit$lp #linear predictor values 1 2 3 4 5 6 7 8 9 0.0000000 0.0000000 0.0000000 0.6972823 0.6972823 0.6972823 0.3472245 0.3472245 0.3472245 10 11 12 13 14 15 16 17 18 -0.8857462 -0.8857462 -0.8857462 0.9573393 0.9573393 0.9573393 1.6621466 1.6621466 1.6621466 > attributes(BTR.logit)  "coefficients" "zeta" "deviance" "fitted.values" "lev" "terms"  "df.residual" "edf" "n" "nobs" "call" "method"  "convergence" "niter" "lp" "Hessian" "model" "contrasts"  "xlevels"$class
 "polr"
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 59 / 143
§6.4.3 Breathing testing example (5)
On the previous 3 slides we fit two cumulative logistic models to BTR
using Age and Smoking: the first (BTR.logit includes the
Age:Smokings interaction-effect terms as well as the main-effect
terms; while the second (BTR.logit1 includes main-effect terms only.
Dummy coding (i.e. contr.treatment coding in R) is used for
representing Age and Smoking:
Age40to59 =
{
1, if age is 40 ∼ 59
0 if age < 40
Smoking2Former =
{
1, former smoker
0 else
,
Smoking3Current =
{
1, current smoker
0 else
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 60 / 143
§6.4.3 Breathing testing example (6)
The first model can be written as
log
P(BTR = Normal)
P(BTR > Normal)
= θ1 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current
+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current
log
P(BTR ≤ Bordering)
P(BTR > Bordering)
= θ2 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current
+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current
MLEs of the parameters in the first model are
θˆ1 = 2.833, θˆ2 = 4.308, γˆ1 = 0.886, γˆ2 = −0.697, γˆ3 = −0.347, γˆ4 = −1.146, γˆ5 = −2.201
with their standard errors given in the R output.
Estimated values of the γ parameters can be interpreted as log
cumulative odds ratios. For example, the odds of having a normal
test result (or normal or bordering result) for a worker aged 40 to 59
who never smoked is estimated to be e−γˆ2−γˆ4 = e0.697+1.146 = 6.32
times of that for a worker aged 40 to 59 who is a former worker.
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 61 / 143
§6.4.3 Breathing testing example (7)
The second model can be written as
log
P(BTR = Normal)
P(BTR > Normal)
= θ1 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current
log
P(BTR ≤ Bordering)
P(BTR > Bordering)
= θ2 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current
MLEs of the parameters in the first model are
θˆ1 = 3.193, θˆ2 = 4.654, γˆ1 = −0.777, γˆ2 = −0.782, γˆ3 = −0.961
with their standard errors given in the R output.
“H0 : γ4 = γ5 = 0 vs. H1 : not H0” is the hypothesis about the
Age:Smoking interaction effects on BTR.
A χ2 analysis of deviance test comparing models BTR.logit and
textttBTR.logit1 gives an LR statistic of 22.776 and p-value of
4.168× 10−6. Therefore, there is very strong evidence to reject H0.
We conclude that Age and Smoking have very strong interaction
effects on BTR.
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 62 / 143
§6.4.3 Breathing testing example (8)
Linear predictor of the first model is
η1 = θ1 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current
+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current
η2 = θ2 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current
+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current
However, the linear predictor returned from polr is just xTγ, i.e.
γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current
+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current
thus has just 6 different values knowing that there are 6
combinations of Age and Smoking.
Knowing these tricks, there should be no difficulty to reveal how the
fitted values returned by BTR.logit$fitted are obtained. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 63 / 143 §6.4.3 Breathing testing example (9) Effect coding (i.e. contr.sum coding in R, −1 for last category) may also be used for representing Age and Smoking: Age1 = { 1, if age < 40 −1, if age is 40 ∼ 59 Smoking1 =  1, never smoked0, former smoker−1, current smoker , Smoking2 =  0, never smoked1, former smoker−1, current smoker With this coding, the first model can be written as log P(BTR = Normal) P(BTR > Normal) = θ1 + γ1Age1 + γ2Smoking1 + γ3Smoking2 +γ4Age1 · Smoking1 + γ5Age1 · Smoking2 log P(BTR ≤ Bordering) P(BTR > Bordering) = θ2 + γ1Age1 + γ2Smoking1 + γ3Smoking2 +γ4Age1 · Smoking1 + γ5Age1 · Smoking2 Although the parameter estimates will be different now, other results from the model will be the same as that using dummy coding. Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 64 / 143 §6.4.3 Breathing testing example (10) > options(contrasts = c("contr.sum", "contr.poly")) > BTR.logit=polr(BTR~Age+Smoking+Age:Smoking, data=BTR.dat, weights=Freq, Hess=T,method="logistic") > summary(BTR.logit) Coefficients: Value Std. Error t value Age1 -0.11487 0.1086 -1.0580 Smoking1 -0.90596 0.1890 -4.7933 Smoking2 0.36428 0.1421 2.5644 Age1:Smoking1 0.55777 0.1890 2.9512 Age1:Smoking2 -0.01517 0.1421 -0.1068 Intercepts: Value Std. Error t value 1Normal|2Border 2.3704 0.1086 21.8209 2Border|3Abnorm 3.8447 0.1599 24.0486 Residual Deviance: 1564.968 AIC: 1578.968 > BTR.logit1=polr(BTR~Age+Smoking, data=BTR.dat, weights=Freq, Hess=T,method="logistic") > anova(BTR.logit, BTR.logit1) Likelihood ratio tests of ordinal regression models Response: BTR Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi) 1 Age + Smoking 2214 1589.744 2 Age + Smoking + Age:Smoking 2212 1564.968 1 vs 2 2 24.77585 4.168618e-06 Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 65 / 143 §6.4.3 Breathing testing example (11) > attributes(BTR.logit)$names
 "coefficients" "zeta" "deviance" "fitted.values" "lev" "terms" "df.residual"
 "edf" "n" "nobs" "call" "method" "convergence" "niter"
 "lp" "Hessian" "model" "contrasts" "xlevels"
> BTR.logit$fitted 1Normal 2Border 3Abnorm 1 0.9444565 0.04225911 0.013284361 2 0.9444565 0.04225911 0.013284361 3 0.9444565 0.04225911 0.013284361 4 0.8943660 0.07930714 0.026326874 5 0.8943660 0.07930714 0.026326874 6 0.8943660 0.07930714 0.026326874 7 0.9231672 0.05813603 0.018696801 8 0.9231672 0.05813603 0.018696801 9 0.9231672 0.05813603 0.018696801 10 0.9763222 0.01815653 0.005521309 11 0.9763222 0.01815653 0.005521309 12 0.9763222 0.01815653 0.005521309 13 0.8671585 0.09895993 0.033881540 14 0.8671585 0.09895993 0.033881540 15 0.8671585 0.09895993 0.033881540 16 0.7633676 0.17037058 0.066261785 17 0.7633676 0.17037058 0.066261785 18 0.7633676 0.17037058 0.066261785 > BTR.logit$lp #linear predictor values
1 2 3 4 5 6 7 8 9 10
-0.4630592 -0.4630592 -0.4630592 0.2342498 0.2342498 0.2342498 -0.1157938 -0.1157938 -0.1157938 -1.3488684
11 12 13 14 15 16 17 18
-1.3488684 -1.3488684 0.4943191 0.4943191 0.4943191 1.1991525 1.1991525 1.1991525
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 66 / 143
§6.4.3 Breathing testing example (12)
We also fit a grouped Cox or proportional hazards model.
> BTR.ph=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat, weights=Freq,
Hess=T, method="cloglog")
> summary(BTR.ph)
Coefficients:
Value Std. Error t value
Age40to59 -0.2865 0.14264 -2.008
Smoking2Former 0.2125 0.10054 2.114
Smoking3Current 0.1088 0.07301 1.490
Age40to59:Smoking2Former 0.4333 0.18941 2.288
Age40to59:Smoking3Current 0.8379 0.16357 5.122
Intercepts:
Value Std. Error t value
1Normal|2Border 1.0477 0.0558 18.7612
2Border|3Abnorm 1.5528 0.0629 24.6916
Residual Deviance: 1559.949
AIC: 1573.949
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 67 / 143
§6.4.3 Breathing testing example (13)
We further fit a cumulative model with the extreme maximal-value
distribution.
> BTR.extm=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat,weights=Freq,
Hess=T,method="loglog")
> summary(BTR.extm)
Coefficients:
Value Std. Error t value
Age40to59 -0.8672 0.5286 -1.640
Smoking2Former 0.6755 0.2700 2.501
Smoking3Current 0.3368 0.2167 1.554
Age40to59:Smoking2Former 1.1003 0.6070 1.813
Age40to59:Smoking3Current 2.0724 0.5573 3.719
Intercepts:
Value Std. Error t value
1Normal|2Border 2.8614 0.1715 16.6838
2Border|3Abnorm 4.2761 0.2080 20.5605
Residual Deviance: 1566.335
AIC: 1580.335
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 68 / 143
§6.4.3 Breathing testing example (14)
We fit a grouped Cox or proportional hazards model using the effect
coding.
> options(contrasts = c("contr.sum", "contr.poly"))
> BTR.ph=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat, weights=Freq,
Hess=T, method="cloglog")
> summary(BTR.ph)
Coefficients:
Value Std. Error t value
Age1 -0.06862 0.03421 -2.00575
Smoking1 -0.31898 0.05366 -5.94408
Smoking2 0.11022 0.04961 2.22168
Age1:Smoking1 0.21188 0.05361 3.95205
Age1:Smoking2 -0.00481 0.04960 -0.09699
Intercepts:
Value Std. Error t value
1Normal|2Border 0.8720 0.0353 24.7072
2Border|3Abnorm 1.3771 0.0441 31.2213
Residual Deviance: 1559.949
AIC: 1573.949
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 69 / 143
§6.4.3 Breathing testing example (15)
We finally fit a cumulative model with the extreme maximal-value
distribution using the effect coding.
> options(contrasts = c("contr.sum", "contr.poly"))
> BTR.extm=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat,weights=Freq,
Hess=T,method="loglog")
> summary(BTR.extm)
Coefficients:
Value Std. Error t value
Age1 -0.09523 0.1053 -0.904
Smoking1 -0.86618 0.1854 -4.671
Smoking2 0.35943 0.1361 2.642
Age1:Smoking1 0.52877 0.1854 2.852
Age1:Smoking2 -0.02136 0.1361 -0.157
Intercepts:
Value Std. Error t value
1Normal|2Border 2.4288 0.1054 23.0536
2Border|3Abnorm 3.8434 0.1573 24.4373
Residual Deviance: 1566.335
AIC: 1580.335
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 70 / 143
§6.4.4 Motivating example for sequential models
In many applications the ordering of the response categories is due to
a sequential mechanism. The categories are ordered since they can be
reached only successively.
Example: Tonsil size. Children have been classified according to their
tonsil size and whether or not they are carriers of Streptococcus pyogenes.
Table 6: Tonsil size and Streptococcus pyogenes (Holmes & Williams, 1954)
Present but
not enlarged
Enlarged
Greatly
enlarged
Carriers 19 29 24
Noncarriers 497 560 269
It may be assumed that tonsil size always starts in the normal state “present but not
enlarged” (category 1). If the tonsils grow abnormally, they may become “enlarged”
(category 2); if the process does not stop, tonsils may become “greatly enlarged” (category
3). But in order to get greatly enlarged tonsils, they first have to be enlarged for the
duration of the intermediate state “enlarged.”
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 71 / 143
§6.4.4 Sequential models (1)
Let latent variables Ur , r = 1, · · · , q with q = k − 1 have the linear
form
Ur = −xTγ + εr , where εr has cdf F .
We assume the ordinal response variable Y is determined by Ur ’s in
the following sequential way:
Y = 1 ⇔ U1 ≤ θ1
Y = 2 given Y ≥ 2 ⇔ U2 ≤ θ2; and in general
Y = r given Y ≥ r ⇔ Ur ≤ θr ; or equivalently
Y > r given Y ≥ r ⇔ Ur > θr , r = 1, · · · , q (9)
The sequential mechanism (9) models the transition from category r
to category r + 1 given that category r is reached.
The main difference to the threshold approach used in the cumulative
model is the conditional modelling of the transitions. The sequential
mechanism assumes a binary decision in each step.
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 72 / 143
§6.4.4 Sequential models (2)
The sequential response mechanism (9) combined with the linear
form of the latent variable
Ur = −xTγ + εr , where εr has cdf F ,
leads to the sequential model with cdf F
P(Y = r |Y ≥ r , x) = P(Ur ≤ θr ) = F (θr + xTγ), r = 1, · · · , k,
(10)
where θk = +∞. It follows that
P(Y = r |x) = P(Y = r |Y ≥ r , x) · P(Y ≥ r |x)
= F (θr + x
Tγ) · P(Y > r − 1|Y ≥ r − 1, x) ·
P(Y > r − 2|Y ≥ r − 2, x) · · ·P(Y > 1|Y ≥ 1, x)
= F (θr + x
Tγ)[1− F (θr−1 + xTγ)] · · · [1− F (θ1 + xTγ)]
= F (θr + x
Tγ)
r−1∏
i=1
[1− F (θi + xTγ)], r = 1, · · · , k; Note ∏0i=1[·] = 0
Model (10) is also called the continuation ratio model.
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 73 / 143
§6.4.4 Sequential models (3)
Note that there is no need to impose any order restriction on
θ1, · · · , θq in (10).
Now specify a cdf F will give a specific sequential model.
1 Choosing F (x) = (1 + e−x)−1 gives the sequential logit model
P(Y = r |Y ≥ r , x) = exp(θr + x
Tγ)
1 + exp(θr + xTγ)
, r = 1, · · · , k
which is equivalent to log
{
P(Y = r |x)
P(Y > r |x)
}
= θr + x
Tγ.
2 Choosing F (x) = 1− exp(−ex), the sequential model has the form
P(Y = r |Y ≥ r , x) = 1− exp
(
−eθr+xTγ
)
, r = 1, · · · , k (11)
which is equivalent to
log
[
− log
{
P(Y > r |x)
P(Y ≥ r |x)
}]
= θr + x
Tγ, r = 1, · · · , k. (12)
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 74 / 143
§6.4.4 Sequential models (4)
It is worth to note that (12) is equivalent to the cumulative model
with F (x) = 1− exp(−ex). This means (12) is a special parametric
form of the grouped Cox model. This equivalence follows by the
reparameterisation:
θr = log{e θ˜r − e θ˜r−1}, r = 1, · · · , k − 1.
Note θ˜0 = −∞ and θ1 = θ˜1. Further θ˜r = log
(
r∑
i=1
eθi
)
.
Substituting θr = log{e θ˜r − e θ˜r−1} into (12), one obtains
− logP(Y > r |x) + logP(Y > r − 1|x) = e θ˜r+xTγ − e θ˜r−1+xTγ , r = 1, · · · , q
⇒ − logP(Y > r |x) = e θ˜r+xTγ , r = 1, · · · , q
⇒ P(Y ≤ r |x) = 1− exp
(
−e θ˜r+xTγ
)
which is a grouped Cox model.
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 75 / 143
§6.4.4 Sequential models (5)
3 Choosing exponential cdf F (x) = 1− e−x , the exponential
sequential model becomes
P(Y = r |Y ≥ r , x) = 1− e−(θr+xTγ)
⇔ − log
{
P(Y>r |x)
P(Y≥r |x)
}
= θr + xTγ, r = 1, · · · , q.
Generalised sequential models:
If assuming θr = δr0 + zTδr , where δr = (δr1, · · · , δrm)T is a
category-specific vector parameter, then one gets the Generalised
sequential model :
P(Y = r |Y ≥ r , x, z) = F (δr0 + zTδr + xTγ) (13)
Remark: The effect of z is non-homogeneous over the categories, while
the effect of x is homogeneous.
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 76 / 143
§6.4.4 Link functions and design matrices in sequential models
Response and link functions may be derived directly from model (10).
The link function g = (g1, · · · , gq)T is given by
gr (pi1, · · · , piq) = F−1
(
pir
1− pi1 − · · · − pir−1
)
, r = 1, · · · , q
and the response function h = (h1, · · · , hq)T has the form
hr (η1, · · · , ηq) = F (ηr )
r−1∏
i=1
(1− F (ηi )), r = 1, · · · , q.
For the sequential logit model, with r = 1, · · · , q
gr (pi1, · · · , piq) = log pir
1−pi1−· · ·−pir−1 and hr (η1, · · · , ηq) = e
ηr
r−1∏
i=1
(1 + eηi )−1.
In regard to the design matrices there is no difference between
sequential and cumulative models. Thus the design matrices from
§3.3.3 apply.
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 77 / 143
§6.4.4 Strict stochastic ordering (optional)
Strict stochastic ordering is a concept generalizing the proportional
odds for simple cumulative model (7)
∆c(x1, x2) = F
−1 (P(Y ≤ r |x1))− F−1 (P(Y ≤ r |x2)) .
If ∆c(x1, x2) = γT (x1 − x2), we say strict stochastic ordering holds.
Then for general cumulative model (8) where w = x is set and xTγ is
omitted, one can test strict stochastic ordering by testing
β1 = β2 = · · · = βq.
The concept can be generalized to the sequential model (10) as well.
Let
∆s(x1, x2) = F
−1 (P(Y = r |Y ≥ r , x1))−F−1 (P(Y = r |Y ≥ r , x2)) .
If ∆s(x1, x2) = γT (x1 − x2), we say strict stochastic ordering holds
for the sequential model.
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 78 / 143
§6.4.4 Two-step models (optional) (1)
It may be more appropriate to divide the ordered categories into sets
of categories with very homogeneous responses in the first step, then
model the categories within each set separately in the second step.
This is the two-step modelling approach.
Example: Rheumatoid arthritis. Mehta, Patel & Tsiatis (1984) analysed
data of patients with acute rheumatoid arthritis. A new agent was compared
with an active control, and each patient was evaluated on a 5-point
assessment scale.
Table 7: Clinical trial of a new agent and an active control
Global assessment
Drug
Much
improved
Improved
No
change
Worse
Much
worse
New agent 24 37 21 19 6
Active control 11 51 22 21 7
Global assessment may be subdivided into“improvement”, “no change” and
“worse” first. Then “improvement” is split up into “much improved” and
“improved” and the “worse” category is split into “worse” and “much worse.”
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 79 / 143
§6.4.4 Two-step models (optional) (2)
Specifically, let the categories 1, · · · , k be subdivided into t basic sets
S1, · · · ,St , where Sj = {mj−1 + 1, · · · ,mj}, m0 = 0 and mt = k.
In Step 1: Y ∈ Sj ⇔ θj−1 < U0 ≤ θj , j = 1, · · · , t.
Assume U0 = −xTγ0 + ε.
In Step 2: Y = R | Y ∈ Sj ⇔ θj ,r−1 < Uj ≤ θjr .
Assume Uj = −xTγ j + εj . Also ε, εj ’s are iid with cdf F .
Then the two-step model becomes
P(Y ∈ Tj |x) = F (θj + xTγ0), P(Y ≤ r |Y ∈ Sj , x) = F (θjr + xTγ j)
(14)
where Tj = S1 ∪ · · · ∪ Sj , θ1 < · · · < θt−1, θt =∞;
θj ,mj−1+1 < · · · < θj ,mj−1 , θj ,mj =∞, j = 1, · · · , t.
Since cumulative mechanisms are used in both steps, (14) is called a
two-step cumulative model. Two-step sequential model can be
similarly defined.
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 80 / 143
§6.4.4 Alternative approaches (optional)
1 Anderson (1984).
P(Y = r |x) = e
βr0−φrβT x
1 +
∑q
i=1 e
βi0−φiβT x
, r = 1, · · · , q;
where 1 = φ1 > · · · > φk = 0.
2 Williams and Grizzle (1972).
k∑
r=1
srP(Y = r |x) = xTβ
where s1, · · · , sk are some given scores for the categories.
3 Adjacent categories logit model (Agresti 1984).
log
P(Y = r |x)
P(Y = r − 1|x) = x
Tβr
⇔ P(Y = r |Y ∈ {r , r + 1}, x) = F (xTβr )
where F is the logistic cdf.
Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 81 / 143
§6.5 Statistical inference for MGLM overview
1 MLE
2 Testing of linear hypotheses
3 Goodness of fit (model adequacy) statistics
4 Power-divergence statistics
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 82 / 143
§6.5.1 Maximum likelihood estimation in MGLM (1)
An extension of the exponential family is the multivariate
exponential family, having the pdf
f (yi |θi , φ, ωi ) = exp
{
yTi θi − b(θi )
φ
· ωi + c(yi , φ, ωi )
}
For q × 1 vector observations y1, · · · , yn which are suppose to be
independent of each other, let µi = E (yi |xi ), i = 1, · · · , n, with xi
being the covariate vector.
The structural assumption of MGLM is that µi ’s depend on the
linear predictor ηi = Ziβ in the form
µi = h(ηi ) = h(Ziβ) = [h1(Ziβ), · · · , h1(Ziβ)]T , i = 1, · · · , n
where
the response function h : S → M is defined on S ⊂ Rq, taking values
in M ⊂ Rq;
Zi is a q × p design matrix for unit i ;
β = (β1, · · · , βp)T is an unknown parameter vector from B ⊂ Rp.
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 83 / 143
§6.5.1 Maximum likelihood estimation in MGLM (2)
The MLE of β may be derived in analogy to the one-dimensional case.
The log-likelihood kernel for observation yi is
i (µi ) =
yTi θi − b(θi )
φ
· ωi , where θi = θ(µi ) (15)
Using the relation µi = h(ηi ) = h(Ziβ), it can be found that the
score function is s(β) =
∂
∂β
=
n∑
i=1
si (β), with
si (β) = Z
T
i Di (β)Σ
−1
i (β)[yi − µi (β)], (16)
where Di (β) =
∂hT (ηi )
∂ηi
is the derivative matrix of hT (ηi ) evaluated
at ηi = Ziβ, which is also called the Jacobian (matrix). And
Σi (β) = cov(yi ).
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 84 / 143
§6.5.1 Maximum likelihood estimation in MGLM (3)
An alternative form for si (β) is
si (β) = Z
T
i Wi (β)
∂g(µi )
∂µTi
[yi − µi (β)], where
Wi (β) = Di (β)Σ
−1
i (β)Di (β) =
{
∂g(µi )
∂µTi
Σi (β)
∂gT (µi )
∂µi
}−1
(17)
which approximately equals (cov(g(yi ))
−1.
The expected Fisher information matrix is
F (β) = cov(s(β)) =
n∑
i=1
ZTi Wi (β)Zi .
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 85 / 143
§6.5.1 Maximum likelihood estimation in MGLM (4)
In matrix form the above may be rewritten as
s(β) = ZTD(β)Σ−1(β)[y− µ(β)]
F (β) = ZTW (β)Z .
where y = (yT1 , · · · , yTn )T , µ(β) = (µ1(β)T , · · · ,µn(β)T )T ,
Σ(β) = diag (Σi (β)), W (β) = diag (Wi (β)), D(β) = diag (Di (β)),
all being block diagonal, and the total design matrix
Z =
[
ZT1 , · · · ,ZTn
]T
.
In the situation of grouped data, yi is the mean vector over ni
observations and Σi (β) is replaced by n
−1
i Σi (β).
Under regularity conditions,
MLE βˆ
a∼ N(β,F−1(βˆ)).
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 86 / 143
§6.5.1 Numerical computing in MGLM
This is the same as in the univariate case, except using the
multivariate versions.
1 Working or pseudo observation vector
y˜(β) = (y˜1(β)
T , · · · , y˜n(β)T )T , where
y˜i (β) = Ziβ +
(
D−1i (β)
)T
[yi − µi (β)]
is an approximation for g(µi (β)).
2 The IRWLS estimate is
βˆ
(k+1)
=
(
ZTW (βˆ
(k)
)Z
)−1
ZTW (βˆ
(k)
)y˜(β(k))
which is equivalent to the Fisher scoring iteration
βˆ
(k+1)
= βˆ
(k)
+
(
ZTW (βˆ
(k)
)Z
)−1
s(β(k)).
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 87 / 143
§6.5.2 Testing linear hypotheses in MGLM
The linear hypotheses
H0 : Cβ = ξ vs. H1 : Cβ 6= ξ
can be tested in the same way as in the univariate GLM, except replacing
the score and Fisher information by their multivariate versions.
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 88 / 143
§6.5.2 Goodness-of-fit statistics in MGLM (1)
Goodness of fit (or model adequacy) of the MGLM can be assessed based
on the Pearson statistics and the deviance (the data need be grouped as
much as possible).
General Pearson Statistics:
χ2 =
g∑
i=1
(yi − µˆi )TΣ−1i (βˆ)(yi − µˆi )
In the case of multi-categorical response variable with multinomial
distribution niyi ∼ M(ni ,pii ), we have
χ2 =
g∑
i=1
χ2p(yi , pˆii ) =
g∑
i=1
ni
k∑
j=1
(yij − pˆiij)2
pˆiij
with yik = 1− yi1 − · · · − yiq and pˆiik = 1− pˆii1 − · · · − pˆiiq.
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 89 / 143
§6.5.2 Goodness-of-fit statistics in MGLM (2)
Deviance or likelihood ratio (LR) statistics:
D = −2
g∑
i=1
{i (pˆii )− i (yi )}
.
For multinomial data,
D = 2
g∑
i=1
χ2D(yi , pˆii ) = 2
g∑
i=1
ni
k∑
j=1
yij log
yij
pˆiij
Under regularity conditions,
χ2
a∼ χ2(g(k − 1)− p) and D a∼ χ2(g(k − 1)− p)
where p is the number of estimated parameters.
For sparse data with small ni , one should use alternative asymtotics.
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 90 / 143
§6.5.3 Power-divergence family (1)
For categorical data a more general single-parameter family of
goodness-of-fit statistics is the power-divergence family, introduced by
The power-divergence statistic with parameter λ ∈ R is given by
Sλ =
g∑
i=1
SDλ(yi , pˆii )
where the sum of deviations over observations at yi is
SDλ(yi , pˆii ) =
2ni
λ(λ+ 1)
k∑
j=1
yij
[(
yij
pˆiij

− 1
]
, −∞ < λ <∞ (18)
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 91 / 143
§6.5.3 Power-divergence family (2)
1 At λ = 1, S1 =
g∑
i=1
ni
k∑
j=1
(yij − pˆiij)2
pˆiij
, the usual Pearson statistic.
2 At λ→ 0, S0 = 2
g∑
i=1
ni
k∑
j=1
yij log
yij
pˆiij
, the likelihood ratio statistic.
3 At λ→ −1, S−1 = 2
g∑
i=1
ni
k∑
j=1
pˆiij log
pˆiij
yij
, Kullback’s minimum
discrimination information statistic.
4 At λ = −2, S−2 =
g∑
i=1
ni
k∑
j=1
(yij − pˆiij)2
yij
, Neyman’s minimum
modified χ2-statistic.
5 At λ = −12 , S− 12 =4
g∑
i=1
k∑
j=1
(

niyij −

ni pˆiij)
2, Freeman-Tukey statistic.
In general, λ ∈ [−1, 2] is recommended.
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 92 / 143
§6.5.3 Asymptotic properties under classical “fixed cells” assumptions
Classical assumptions in asymptotic theory for grouped data imply
1 a fixed number of groups (g is fixed)
2 increasing sample sizes ni →∞, i = 1, · · · , g , such that nin → λi for
fixed proportions λi > 0, i = 1, · · · , g .
3 a fixed “number of cells” k in each group
4 a fixed number of parameters.
If these assumptions hold and in addition, the model under consideration is
correct, then

a∼ χ2(g(k − 1)− p), p is the # of estimated parameters.
And under local alternative hypothesis, the limit distribution of Sλ is
non-central χ2.
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 93 / 143
§6.5.3 Asymptotic properties under sparseness and “increasing-cells”
assumptions
If several explanatory variables are considered, the number of
observations for a fixed explanatory variable x is often small and the
usual asymptotic machinery will fail.
Under such sparseness conditions, it may be assumed that with
increasing sample size n→∞ the number of groups (values of the
explanatory variables) is also increasing with g →∞, resulting in the
“increasing-cells” setting.
Now Sλ is no longer χ
2-distributed in the limit. Rather Sλ has an
asymptotic normal distribution under H0 that the model holds.
Details are not pursued here.
Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 94 / 143
§6.6 Multivariate models for correlated responses
So far we have been assuming there is only one response variable,
possibly multi-categorical, being under consideration.
Now we consider the situations where a vector of correlated or
clustered response variables is observed, together with covariates, for
each unit in the sample.
These situations often happen in longitudinal studies, repeated
measurements studies, and grouped (clustered) studies, etc.
When the response variables are approximately Gaussian, multivariate
linear models are commonly used for the underlying data analysis,
which has been well established.
The situations become more difficult when the response variables are
discrete, categorical or mixed discrete/categorical/continuous.
Three main approaches will be introduced here:
1 conditional models
2 marginal models
3 random effects models
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 95 / 143
§6.6.1 Conditional models: Asymmetric models (optional)1
Asymmetric models
In many applications, the components of a response vector are
ordered in a way that some components are prior to the other
components, e.g. if they refer to events that take place earlier.
Consider the simplest case of a response vector Y = (Y1,Y2), with
categorical components Y1 ∈ {1, · · · , k1} and Y2 ∈ {1, · · · , k2}. Let
Y2 refer to events that may be considered conditional on Y1.
An example: response years in school (Y1) and the statement about
happiness (Y2).
Then Y1 may be modelled to be dependent on the explanatory
variables x, using a model introduced in §3.2 and §3.3.
And Y2 may be modelled based on Y1 and x, also using a model
introduced in §3.2 and §3.3.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 96 / 143
§6.6.1 Conditional models: Asymmetric models (optional)2
Asymmetric models
In general, consider m categorical responses Y1, · · · ,Ym, where Yj
depends on Y1, · · · ,Yj−1 but not on Yj+1, · · · ,Ym. Models that
make use of this dependence structure are based on the decomposition
P(Y1, · · · ,Ym|x) = P(Y1|x) · P(Y2|Y1, x) · · ·P(Ym|Y1, · · · ,Ym−1, x)
(19)
Simple models arise if each component in (19) is specified by a GLM:
P(Yj = r |Y1, · · · ,Yj−1, x) = hj(Zjβ) (20)
where Zj = Z (Y1, · · · ,Yj−1, x) is a function of previous components
Y1, · · · ,Yj−1 and the explanatory variables x. Models of form (20)
are sometimes called data-driven ones.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 97 / 143
§6.6.1 Conditional models: Asymmetric models (optional)3
assumption P(Yj = r |Y1, · · · ,Yj−1, x) = P(Yj = r |Yj−1, x).
One such example for binary responses is
log
P(y1 = 1|x)
P(y1 = 0|x) = β01 + z
T
1 β1
log
P(yj = 1|y1, · · · , yj−1, x)
P(yj = 0|y1, · · · , yj−1, x) = β0j + z
T
j βj + yj−1γj , j = 2, · · · , k.
It is called a regressive logistic model when
log
P(yj = 1|y1, · · · , yj−1, x)
P(yj = 0|y1, · · · , yj−1, x) = β0 + z
T
j β + γ1y1 + · · ·+ γj−1yj−1.
Models of the type (20) may be embedded into the GLM framework,
meaning some built-in functions in R (e.g. glm()) may be used to do
the computing.
In general, however, you need to write your own program to do the
analysis.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 98 / 143
§6.6.1 Conditional models: Asymmetric models (optional)4
Example: Reported happiness. Clogg (1982) investigated the
association between gender (x), years in school (Y1), and reported
happiness (Y2). The data are given in the following. Since x and Y1 are
prior to the statement about happiness, Y2 is modelled conditionally on Y1
and x .
Table 8: Cross classification of gender, reported happiness, and years of schooling
Years of school completed
Gender Reported happiness < 12 12 13-16 ≥ 17
Male Not too happy 40 21 14 3
Pretty happy 131 116 112 27
Very happy 82 61 55 27
Female Not too happy 62 26 12 3
Pretty happy 155 156 95 15
Very happy 87 127 76 15
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 99 / 143
§6.6.1 Conditional models: Symmetric models (optional)1
Symmetric models
Suppose the response vector Y = (y1, · · · , ym).
If there is no natural ordering among y1, · · · , ym, or if one does not
want to use this ordering, models that treat response components in a
symmetric way are more sensible. An example is Visual Impairment
Study seen in Chapter 1.
Now focus on the case where all y1, · · · , ym are binary. (General
cases can be handled similarly.)
Symmetric conditional models can be developed by specifying a
conditional distribution to be used:
P(yj = 1|yk , k 6= j ; xj), j = 1, · · · ,m (21)
For binary responses such conditional distributions uniquely determine
the joint distribution.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 100 / 143
§6.6.1 Example: Visual impairment study
Table 9: Visual impairment data, from Liang, Zeger & Qaqish (1992)
White Black
Visual Age
impairment 40-50 51-60 61-70 70+ 40-50 51-60 61-70 70+ Total
Left eye
Yes 15 24 42 139 29 38 50 85 422
No 617 557 789 673 750 574 473 344 4777
Right eye
Yes 19 25 48 146 31 37 49 93 448
No 613 556 783 666 748 575 474 336 4751
Vector binary response variable (y1, y2), where y1 = 1 if left-eye
impaired, 0 otherwise; y2 = 1 if right-eye impaired, 0 otherwise.
Covariates: Age (yrs.), Race (W or B).
Aim: find the effect of race and age on visual impairment.
Complication: y1 and y2 are correlated.
Methods: multivariate models for correlated responses; conditional
models; asymmetric models, marginal models, GEE, etc..
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 101 / 143
§6.6.1 Conditional models: Symmetric models (optional)2
Symmetric models
Symmetric logistic models are a natural choice for (21):
pi = P(yj = 1|yk , k 6= j ; xj) = h(α(wj ;θ) + xTj βj), j = 1, · · · ,m
(22)
where h(t) =
et
1 + et
is the logistic cdf; and α(·) is some function of a
parameter θ and wj =

k 6=j yk .
When k = 2,
pi = P(yj = 1|yk , k 6= j ; xj) = h(θ0+θ1yk +xTj βj), j , k = 1, 2. (23)
Full likelihood estimation procedure for (22) or (23) may be
computationally cumbersome, because of the normalising constant
involved in the joint pmf.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 102 / 143
§6.6.1 Conditional models: Symmetric models (optional)3
Instead, a quasi-likelihood approach (Conolly and Liang, 1988) may
be used, with an independent working quasi-likelihood and
quasi-score function for each cluster i(= 1, · · · , n):
Li (β,θ) =
m∏
j=1
pi
yij
ij (1− piij)1−yij
Si (β,θ) =
m∑
j=1
∂piij
∂(βT ,θT )T
σ−2ij (yij − piij(β,θ))
where yi = (yi1, · · · , yij , · · · , yim)T are the responses in cluster i ,
piij(β,θ) = P(yij = 1|·) is defined by (22), and
σ2ij = piij(β,θ)(1− piij(β,θ)).
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 103 / 143
§6.6.1 Conditional models: Symmetric models (optional)4
Denoting Mi = diag
{
∂pii1
∂(βT ,θT )T
, · · · , ∂piim
∂(βT ,θT )T
}
,
Σi = diag{σ2i1, · · · , σ2im} and pi = (pii1, · · · , piim)T , we can rewrite
Si (β,θ) in matrix form
Si (β,θ) = MiΣ
−1
i (yi − pii )
which is a multivariate extension of the quasi-score.
Roots (βˆ, θˆ) of the resulting generalised estimating equation (GEE)
S(β,θ) =
n∑
i=1
Si (β,θ) = 0
are consistent & asymptotically normal under regularity assumptions:
(βˆ
T
, θˆ
T
)T
a∼ N((βT ,θT )T , Fˆ−1Vˆ Fˆ−1)
with Fˆ =
∑n
i=1 Mˆi Σˆ
−1
i Mˆi and Vˆ =
∑n
i=1 Sˆi Sˆ
T
i .
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 104 / 143
§6.6.1 Two drawbacks of the conditional models
1 They measure the effect of x on a binary component yj conditional on
incorporating into the effects of other yk , k 6= j . Thus the model is
not able to provide prediction based on x alone.
2 Interpretation of the effects depends on the dimension of y.
Both drawbacks can be avoided by using a marginal modelling approach.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 105 / 143
§6.6.2 Marginal models
In many situations the primary interest is to analyse the marginal
mean of the response given the covariates. The association between
the responses is often of secondary interest. An example is the visual
impairment study.
Marginal models were first proposed by Liang & Zeger (1986) and
Zeger & Liang (1986) in the context of longitudinal data with many
short time series. Their modelling and estimation approach is based
on GEE and has subsequently been modified and further developed.
Focus in this subsection is on modelling and estimating marginal
means of correlated and categorical response data by the first-order
generalised estimating equations (GEE1).
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 106 / 143
§6.6.2 Marginal models for correlated univariate responses (1)
In marginal models, the effects of covariates on responses and the
association between responses is modelled separately.
Let yi = (yi1, · · · , yimi )T be the vector of responses, and
xTi = (x
T
i1, · · · , xTimi )
be the vector of covariates from cluster i , i = 1, · · · , n. Here the
cluster size mi may vary with the cluster index i .
Response observations within the same cluster are correlated.
Response observations between different clusters are assumed
independent of each other.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 107 / 143
§6.6.2 Marginal models for correlated univariate responses (2)
Specification of a marginal model is as following
1 The marginal means of yij , j = 1, · · · ,mi , are assumed correctly
specified by common univariate response models:
µij(β) = E (yij |xij) = h(zTij β) (24)
where h(·) is a response function, e.g. a logit function, and zij is an
appropriate design vector.
2 The marginal variance of each yij is specified as a function of µij :
σ2ij = var(yij |xij) = v(µij)φ (25)
where v(·) is the variance function of known form.
3 The correlation between yij and yik is a function of µij = µij(β),
µik = µik(β) and perhaps of additional association parameters α:
corr(yij , yik) = c(µij , µik ,α) (26)
with the function c(·, ·, ·) being of known form.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 108 / 143
§6.6.2 Marginal models for correlated univariate responses (3)
Remarks:
1 Parameters (β,α) are the same for each cluster. Hence marginal
models are appropriate for analysing population-averaged effects.
2 Marginal effects β can be consistently estimated even if the correlation
function is incorrectly specified. However, the efficiency of the
estimator βˆ can be compromised.
Since the primary scientific objective is often the regression
relationship, it is important to correctly specify the marginal mean
structure (24), while c(µij , µik ,α) only needs to be a working
correlation for the association between yij and yik . Together with
(25) a working covariance matrix cov(yi ) = Σi (β,α) can be
obtained, where an additional dispersion parameter φ is involved but
suppressed for presentation simplicity.
Two main approaches for specifying the working correlations or
covariances have been considered in the literature.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 109 / 143
§6.6.2 Marginal models for correlated univariate responses (4)
First approach: The variance structure (25) is supplemented by a
working correlation matrix Ri (α), so that the working covariance
matrix is of the form
Σi (β,α) = C
1/2
i (β)Ri (α)C
1/2
i (β),
where Ci (β) = diag [var(yij |xij)] = diag{σ2i1, · · · , σ2imi}. Common choices
for Ri (α):
1 working independence model: Ri (α) = I , the identity matrix.
2 equicorrelation (or exchangeable) model: corr(yij , yik) = α for all
j 6= k, i.e. α reduces to be a scalar, and Ri (α) =

1 α · · · α
α 1 · · · α
...
...
. . .
...
α α · · · 1
.
3 Ri (α) completely unspecified, except being positive definite, i.e.
αjk = corr(yij , yik), j < k.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 110 / 143
§6.6.2 Marginal models for correlated univariate responses (5)
Second approach: For binary responses the odds ratio is an alternative
measure of association that is easier to interpret and has some desirable
properties.
The odds ratio for yij , yik , j , k = 1, · · · ,m, j 6= k, is defined as
γijk =
P(yij = 1, yik = 1) · P(yij = 0, yik = 0)
P(yij = 1, yik = 0) · P(yij = 0, yik = 1)
From this definition, the probability P(yij = yik = 1), denoted as pii11,
can be shown to satisfy
pii11 = E (yijyik) =
{
1−(piij+piik )(1−γijk )−s(piij ,piik ,γijk)
2(γijk−1) if γijk 6= 1
piijpiik if γijk = 1
(27)
with
s(piij , piik , γijk) =
(
[1−(piij +piik)(1−γijk)]2 − 4(γijk−1)γijkpiijpiik
)1/2
.
It can be seen that the covariance matrix of yi = (yi1 · · · , yimi )T can
be expressed as a function of (piij , piik , γijk).
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 111 / 143
§6.6.2 Marginal models for correlated univariate responses (6)
One advantage of the odds ratio is that it is unconstrained, being able
to take any positive values. In contrast,
corr(yij , yik) =
pii11 − piijpiik√
piij(1− piij)piik(1− piik)
,
the correlation between yij and yik is constrained (i.e. its possible
values may all fall into a subset of [−1, 1]), because the involved pii11
is constrained by
max(0, piij + piik − 1) ≤ pii11 ≤ min(piij , piik).
Note
pii11 = P(yij = 1) + P(yik = 1)−P(yij = 1 or yik = 1) ≥ piij +piik − 1.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 112 / 143
§6.6.2 Marginal models for correlated univariate responses (7)
To reduce the number of unknown parameters involved in the odds
ratio, we assume γijk = γijk(α), a function of a vector parameter α.
Common choices of γijk(α):
1 γijk = γ, for all i , j , k.
2 log γijk = α
Tωijk .
Using the relation (27), cov(yi ) = Σi (β,α) is also a function of β
and α.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 113 / 143
§6.6.2 Marginal models for correlated univariate responses (8)
Some examples of marginal models:
I Continuous responses:
µij(β) = E (yij |xij) = zTij β; var(yij |xij) = φ = σ2; corr(yij , yik) = αjk .
II Binary responses:
µij(β) = piij(β) = P(yij = 1|xij), log piij(β)
1− piij(β) = z
T
ij β;
var(yij |xij) = piij(β)(1− piij(β));
corr(yij , yik) = 0 (independence struc.) or γijk = α (equal odds ratio).
III Count data:
logµij(β) = log E (yij |xij) = zTij β;
var(yij |xij) = µij(β)φ;
corr(yij , yik) = α (equicorrelation).
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 114 / 143
§6.6.2 GEE approach in statistical inference (1)
For cluster i = 1, · · · , n, let yi = (yi1, · · · , yimi )T , xi = (xTi1, · · · , xTimi )T ,
µi (β) = (µi1(β1), · · · , µimi (βmi ))T and Σi (β,α) be defined as before.
In linear models for Gaussian responses, there is no problem to
perform maximum likelihood estimation, since specification of means
and covariances determines the likelihood function.
This is not the case with non-Gaussian data. Therefore a generalised
estimating approach is proposed.
Keeping the association parameter α and φ, if present, fixed for the
moment, the generalised estimating equation (GEE) for effect β is
Sβ(β,α) =
n∑
i=1
ZTi Di (β)Σ
−1
i (β,α)(yi − µi (β)) = 0, (28)
with mi -row design matrix Zi = (zi1, · · · , zimi )T and diagonal
matrices Di (β) = diag{Dij(β)}, Dij(β) = ∂h∂ηij evaluated at ηij = zTij β.
The middle-part of (28) is a multivariate quasi-score function.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 115 / 143
§6.6.2 GEE approach in statistical inference (2)
A GEE estimate βˆ is computed by iterating a modified Fisher scoring
algorithm w.r.t. β and estimation of α (and φ):
(i) Given current estimates αˆ (and φˆ), the GEE (28) for βˆ is solved by
the iterations
βˆ
(k+1)
= βˆ
(k)
+ (Fˆ (k))−1Sˆβ(βˆ
(k)
, αˆ), k = 0, 1, 2, · · · ,
with
Fˆ (k) =
n∑
i=1
ZTi Di (βˆ
(k)
)Σ−1i (βˆ
(k)
, αˆ)Di (βˆ
(k)
)Zi
being the observed quasi-information matrix.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 116 / 143
§6.6.2 GEE approach in statistical inference (3)
(ii) If α is unknown, it can be consistently estimated by a method of
moments based on Pearson residuals rˆij =
yij − µˆij√
v(µˆij)
.
The dispersion parameter φ is estimated consistently by
φˆ =
1
N − p
n∑
i=1
mi∑
j=1
rˆ 2ij , with N =
∑n
i=1 mi and p = dim(β).
Estimation of α depends on the choice of Ri (α). For exchangeable
correlation matrix Ri (α) with dim(α) = 1,
αˆ =
[
φˆ
{
n∑
i=1
1
2
mi (mi − 1)− p
}]−1 n∑
i=1

k>j
rˆik rˆij .
An unspecified working correlation matrix R can be estimated by
Rˆ =
1
nφˆ
n∑
i=1
C
− 12
i (yi − µˆi )(yi − µˆi )TC
− 12
i
if all cluster sizes mi = m and m << n.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 117 / 143
§6.6.2 GEE approach in statistical inference (4)
Cycling between Fisher scoring steps for β and consistent estimation
of α and φ leads to a consistent estimation of β.
Alternatively, α (and possible φ) can be estimated by simultaneously
solving an additional estimating equation. Details are not pursued
here.
Under regularity conditions, the GEE estimator βˆ satisfies
βˆ
a∼ N(β,F−1VF−1)
with F =
∑n
i=1 Z
T
i DiΣ
−1
i DiZi and V =
∑n
i=1 Z
T
i DiΣ
−1
i SiΣ
−1
i DiZi ,
where Si is the true covariance matrix of yi .
cov(βˆ) is approximated by the “sandwich matrix”:
Aˆ = Fˆ−1
{
n∑
i=1
ZTi Dˆi Σˆ
−1
i (yi − µˆi )(yi − µˆi )T Σˆ−1i DˆiZi
}
Fˆ−1
cov(Sβ(β,α)) = V and Sβ(β,α)
a∼ N(0,V ).
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 118 / 143
§6.6.2 Visual impairment example (1)
We use GEE approach to fit a marginal model to the visual impairment
data. R packages gee with command gee() and geepack with command
geeglm() can be used. (There are a few problems with both packages).
library(gee); library(geepack)
ID Yes No Age Race
1 1 15 617 40to50 White
2 1 19 613 40to50 White
3 2 24 557 51to60 White
4 2 25 556 51to60 White
5 3 42 789 61to70 White
6 3 48 783 61to70 White
7 4 139 673 70+ White
8 4 146 666 70+ White
9 5 29 750 40to50 Black
10 5 31 748 40to50 Black
11 6 38 574 51to60 Black
12 6 37 575 51to60 Black
13 7 50 473 61to70 Black
14 7 49 474 61to70 Black
15 8 85 344 70+ Black
16 8 93 336 70+ Black
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 119 / 143
§6.6.2 Visual impairment example (2)
VI1e<- gee(cbind(Yes, No) ~ Age + Race, id = ID,
data = VI.dat, family = binomial, corstr = "exchangeable")
summary(VI1e)
GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA
gee S-function, version 4.13 modified 98/01/27 (1998)
Model:
Variance to Mean Relation: Binomial
Correlation Structure: Exchangeable
Call:
gee(formula = cbind(Yes, No) ~ Age + Race, id = ID, data = VI.dat,
family = binomial, corstr = "exchangeable")
Summary of Residuals:
Min 1Q Median 3Q Max
15.0 28.0 39.9 58.6 145.8
Coefficients:
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) -3.225 0.1125 -28.66 0.0254 -127.22
Age51to60 0.478 0.1451 3.30 0.0167 28.65
Age61to70 0.837 0.1348 6.21 0.0905 9.26
Age70+ 1.973 0.1227 16.08 0.0531 37.13
RaceWhite -0.350 0.0768 -4.56 0.0662 -5.28
Estimated Scale Parameter: 0.571
Number of Iterations: 1
Working Correlation
[,1] [,2]
[1,] 1.000 0.886
[2,] 0.886 1.000
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 120 / 143
§6.6.2 Visual impairment example (3)
Call:
gee(formula = cbind(Yes, No) ~ Age + Race, id = ID, data = VI.dat,
family = binomial, corstr = "exchangeable")
Summary of Residuals: ###Residuals calculation is not correct in gee().
Min 1Q Median 3Q Max
15.0 28.0 39.9 58.6 145.8
Coefficients:
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) -3.225 0.1125 -28.66 0.0254 -127.22
Age51to60 0.478 0.1451 3.30 0.0167 28.65
Age61to70 0.837 0.1348 6.21 0.0905 9.26
Age70+ 1.973 0.1227 16.08 0.0531 37.13
RaceWhite -0.350 0.0768 -4.56 0.0662 -5.28
Estimated Scale Parameter: 0.571
Number of Iterations: 1
Working Correlation
[,1] [,2]
[1,] 1.000 0.886
[2,] 0.886 1.000
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 121 / 143
§6.6.2 Visual impairment example (4)
total=VI.dat$Yes+VI.dat$No
pearson.res1e=
(VI.dat$Yes-total*VI1e$fitted)/sqrt(total*(VI1e$fitted)*(1-VI1e$fitted))
sum(pearson.res1e^2)/11
 0.5708722 #estimate of \phi based on Pearson residuals
> VI1e$scale  0.5708722 (VI1e$scale*3)^(-1)*sum(pearson.res1e[2*(1:8)-1]*pearson.res1e[2*(1:8)])
 0.8861748
> VI1e$working.correlation [,1] [,2] [1,] 1.0000000 0.8861748 [2,] 0.8861748 1.0000000 Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 122 / 143 §6.6.2 Visual impairment example (5) VI3 <- geeglm(cbind(Yes, No)~Age + Race, family=binomial(link="logit"), data=VI.dat, id=ID, corstr = "exchangeable", std.err="san.se") summary(VI3) Call: geeglm(formula = cbind(Yes, No) ~ Age + Race, family = binomial(link = "logit"), data = VI.dat, id = ID, corstr = "exchangeable", std.err = "san.se") Coefficients: Estimate Std.err Wald Pr(>|W|) (Intercept) -3.2253 0.0254 16184.7 < 2e-16 *** Age51to60 0.4783 0.0167 820.7 < 2e-16 *** Age61to70 0.8374 0.0905 85.7 < 2e-16 *** Age70+ 1.9728 0.0531 1378.3 < 2e-16 *** RaceWhite -0.3498 0.0662 27.9 1.3e-07 *** --- Estimated Scale Parameters: #IScale seems defined differently in gee() and geeglm() Estimate Std.err (Intercept) 0.000604 0.000323 Correlation: Structure = exchangeable Link = identity Estimated Correlation Parameters: Estimate Std.err #The estimation here is about half of that in gee(). alpha 0.441 0.274 Number of clusters: 8 Maximum cluster size: 2 Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 123 / 143 §6.6.2 Visual impairment example (6) pearson.res3=(VI.dat$Yes/total-VI3$fitted)/sqrt((VI3$fitted)*(1-VI3$fitted)/total) phi.est=sum(pearson.res3^2)/11 #=0.5709 sum((summary(VI3)$deviance.resid)^2)/11
 0.577 #very close to 0.5709.
#Using deviance residuals or Pearson residuals give similar results.
VI3$geese$gamma #How is gamma defined?
(Intercept)
0.000604
(phi.est*3)^(-1)*sum(pearson.res3[2*(1:8)-1]*pearson.res3[2*(1:8)])
 0.8862 #estimated alpha
(0.577*3)^(-1)*sum(summary(VI3)$devi[2*(1:8)-1]*summary(VI3)$devi[2*(1:8)])
 0.872
VI3$geese$alpha
alpha
0.4415 #approximately half of 0.8862. Then How is alpha defined here?
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 124 / 143
§6.6.2 Marginal models for correlated responses having k categories (1)
Suppose categorical responses Yij , j = 1, · · · ,mi , are observed in
cluster i , i = 1, · · · , n.
For simplicity, each Yij has the same k categories and is coded by a
dummy vector
yij = (yij1, · · · , yijq)T , q = k − 1
Let yTi = (y
T
i1, · · · , yTimi ) be observations of yij in cluster i ;
xTi = (x
T
i1, · · · , xTimi ) be the corresponding covariate observations.
For data involving categorical responses, a marginal categorical
response model can be defined for each response variable, and then
supplemented by a working association model to relate to the
responses with each other with in each cluster.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 125 / 143
§6.6.2 Marginal models for correlated responses having k categories (2)
Here we focus on ordinal responses. Other types of categorical responses
can be handled similarly.
(i) The vector of marginal means or categorical probabilities of Yij is
assumed being correctly specified by an ordinal response model :
piij(β) = (piij1(β), · · · , piijq(β))T = h(Zijβ)
with piijr = P(Yij ≤ r |xij) =
∑r
=1 P(yij = 1|xij), and the response
function h(·) and design matrix Zij .
(ii) The marginal covariance function of yij is given by
Σij = cov(yij |xij) = diag(piij)− piijpiTij
i.e. the covariance matrix of a multinomial random variable.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 126 / 143
§6.6.2 Marginal models for correlated responses having k categories (3)
(iii) Association between Yij and Yik is modelled by a working correlation matrix
Ri or by global cross-ratios. For example, the working matrix of
exchangeable correlations is
Ri (α) =

I Q · · · Q
QT I · · · Q
...
...
. . .
...
QT QT · · · I

where Qq×q contains α to be estimated by a method of moments.
Global cross-ratios (GCR), for each pair of categories  and m of Yij and
Yik , are defined as
γijk (,m) =
P(Yij ≤ ,Yik ≤ m)P(Yij > ,Yik > m)
P(Yij > ,Yik ≤ m)P(Yij ≤ ,Yik > m)
GCR can be modelled log-linearly, i.e. log(γijk(,m)) = αm or by a
regression model including covariate effects.
For the model specified by (i), (ii) and (iii), the involved regression and association
parameters can be estimated by a multivariate extension of the GEE approach
discussed before. Details not pursued here.
R packages multgee, geepack and repolr may be used to fit the above model.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 127 / 143
§6.6.2 Likelihood-based inference for marginal models
The GEE approach is not likelihood-based as it does not require
specification of the joint distribution of multivariate response vector
yi , i = 1, · · · , n.
Difficulty with the likelihood-based inference is due to the difficulty in
formulating this joint distribution.
So in general it is difficult to perform likelihood-based inference for
marginal models.
But there are new developments:
Constructing joint multivariate distribution by copulas
Vector GLM
Bayesian inference.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 128 / 143
§6.6.3 Marginal models for longitudinal data (1)
In many longitudinal studies, data consist of a small or moderate
number of repeated observations for many subjects; and the main
objective is to analyse the effects of covariates on a response variable,
without conditioning on previous responses.
In this setting it is more natural to view the data as a cross section of
individual time series yi = (y
T
i1, · · · , yTiTi )T , xi = (xTi1, · · · , xTiTi )T ,
i = 1, · · · , n. It is also often assume that, given the covariates,
individual time series are mutually independent.
Marginal models introduced in §3.5.2 can be used for these cases.
Suppose yit = (yit1, · · · , yitq)T are q = k − 1 dummy variables
describing a multi-categorical response Yit with k categories.
For simplicity, assume the number of categories is the same for all Yit
responses.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 129 / 143
§6.6.3 Marginal models for longitudinal data (2)
Marginal GEE models for longitudinal data are then defined as
(i) The marginal mean is correctly specified by a response model
µit(β) = E (yit |xit) = h(ηit), ηit = Zitβ
where ηit is q-dimensional with components ηitr = z
T
itrβ, z
T
itr is the
rth row of Zit .
(ii) The association structure for yi = (y
T
i1, · · · , yTiTi )T is specified by a
“working” covariance matrix Σi (β,α), depending on regression
parameters β, association parameters α, and possibly an additional
nuisance parameter φ.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 130 / 143
§6.6.3 Marginal models for longitudinal data (3)
Σi (β,α) can be determined by a variance function for yit together
with a “working” correlation matrix Ri (α) or some odds ratio model
γ(α).
Common choices for Ri (α):
1 stationary correlations between yis and yit
(Ri (α))st = α(|t − s|) (29)
2 its special form α(|t − s|) = α|t−s|. AR(1) correlation.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 131 / 143
Statistical inference in marginal GEE models (1)
Estimation of β using GEEs is carried out in complete analogy to
§3.5.2.
Denote E (yi |xi ) = µi (β) = (µTi1, · · · ,µTiTi )T , ZTi = (Zi1, · · · ,ZiTi ),
and (block-) diagnol matrices Di (β) = diag{Dit(β)}, with
Dit(β) =
∂hT
∂ηit
|ηit= Zitβ.
The GEE for β is
sβ(β,α) =
n∑
i=1
ZTi Di (β)Σ
−1
i (β,α)(yi − µi (β)) = 0 (30)
Given current estimates αˆ and φˆ, the GEE (30) for βˆ is solved by
βˆ
(k+1)
= βˆ
(k)
+
(
Fˆ (k)
)−1
sˆ(k), k = 1, 2, · · ·
with the “working” Fisher information matrix
Fˆ (k) =
∑n
i=1 Z
T
i Di (βˆ
(k)
)Σ−1i (βˆ
(k)
, αˆ)DTi (βˆ
(k)
)Zi and
sˆ(k) = sβ(βˆ
(k)
, αˆ).
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 132 / 143
Statistical inference in marginal GEE models (2)
Given a current estimate of β, α and φ can be estimated from
Pearson residuals
rˆit =
yit − µˆit
(v(µˆit))1/2
by the method of moments.
1 φˆ =
1
N − p
n∑
i=1
Ti∑
t=1
rˆ 2it , N =
n∑
i=1
Ti
2 Estimation of α depends on the choice of Ri (α). For the exchangeable
correlation model
αˆ =
{
φˆ
[
n∑
i=1
1
2
Ti (Ti − 1)− p
]}−1 n∑
i=1

t>s
rˆit rˆis
3 A totally unspecified R = R(α) can be estimated if T << n.
4 In particular for binary and categorical observations, estimation of α
via GEE2 is often preferrable.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 133 / 143
Statistical inference in marginal GEE models (3)
Consistency and asymptotic normality results for βˆ can be obtained
along the lines of asymptotic theory of quasi-MLEs for independent
observations if Ti , i = 1, · · · , n are fixed and n→∞.
If αˆ is

n-consistent given β and φ; and φˆ is

n-consistent given β,
then βˆ is

n-consistent and asymptotically normal, i.e.
βˆ
a∼ N(β,F−1VF−1)
with F =
n∑
i=1
ZTi DiΣ
−1
i D
T
i Zi , V =
n∑
i=1
ZTi DiΣ
−1
i Cov(yi )Σ
−1
i D
T
i Zi .
Asymptotic covariance matrix A = F−1VF−1 can be consistently
estimated by replacing β, α and φ by their respective consistent
estimates and Cov(yi ) by (yi − µˆi )(yi − µˆi )T , i.e. by sandwich matrix
Aˆ = Ĉov(βˆ)
a≈ Fˆ−1
{
n∑
i=1
ZTi Dˆi Σˆ
−1
i (yi−µˆi )(yi−µˆi )T Σˆ−1i DˆTi Zi
}
Fˆ−1 (31)
Based on the preceeding asymptotic result, confidence intervals for β
and Wald and score tests may be constucted; cf. §2.3.1.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 134 / 143
Example 6.4 Ohio children (1)
The ohio data concern 537 children from Steubenville, Ohio and
were taken as part of a study on the effects of air pollution.
Children were in the study for four years from age seven to ten.
The response is whether they wheezed or not.
The variables are
resp: an indicator of wheeze status (1=yes, 0=no)
id: an identifier for the child, taking values from 0 to 536
age: 7 yrs = −2; 8 yrs = −1; 9 yrs =0; 10 yrs =1
smoke: mother’s smoking status at start of the study
(1=smoker, 0=nonsmoker)
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 135 / 143
Example 6.4 Ohio children (2)
Some analysis has been done to the data in R, producing the following
output.
resp id age smoke
1 0 0 -2 0
2 0 0 -1 0
3 0 0 0 0
4 0 0 1 0
5 0 1 -2 0
6 0 1 -1 0
> tail(ohio)
resp id age smoke
2143 1 535 0 1
2144 1 535 1 1
2145 1 536 -2 1
2146 1 536 -1 1
2147 1 536 0 1
2148 1 536 1 1
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 136 / 143
Example 6.4 Ohio children (3)
> str(ohio)
'data.frame': 2148 obs. of 4 variables:
$resp : int 0 0 0 0 0 0 0 0 0 0 ...$ id : int 0 0 0 0 1 1 1 1 2 2 ...
$age : int -2 -1 0 1 -2 -1 0 1 -2 -1 ...$ smoke: int 0 0 0 0 0 0 0 0 0 0 ...
> library(geepack}
data=ohio, id=id, corstr = "exchangeable", std.err="san.se")
> summary(fit.exch)
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 137 / 143
Example 6.4 Ohio children (4)
> summary(fit.exch)
Call:
geeglm(formula = resp ~ age + smoke, family = binomial(link = "logit"),
data = ohio, id = id, corstr = "exchangeable", std.err = "san.se")
Coefficients:
Estimate Std.err Wald Pr(>|W|)
(Intercept) -1.880 0.114 272.597 < 2e-16 ***
age -0.113 0.044 6.684 0.00973 **
smoke 0.265 0.178 2.224 0.13588
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Estimated Scale Parameters:
Estimate Std.err
(Intercept) 0.9985 0.1116
Correlation: Structure = exchangeable Link = identity
Estimated Correlation Parameters:
Estimate Std.err
alpha 0.3543 0.06244
Number of clusters: 537 Maximum cluster size: 4
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 138 / 143
Example 6.4 Ohio children (5)
Use the above output to answer the following questions.
1 Let yit be the response value resp of child i at age t. Write down the
model involved in the analysis, including the mean, variance and
correlation coefficient of yit ’s. Give the estimates of the parameters
appearing in the model.
2 Write down the model’s design matrix for data where id=536.
3 Estimate the odds ratio of wheezing for a child for every one year
older in age. Also calculate an approximate standard error of your
odds ratio estimate.
4 Estimate the odds ratio of wheezing for a 10-year old child with a
smoking mother versus a 9-year old child with nonsmoking mother.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 139 / 143
Example 6.4 Ohio children (6)
1 Let yit be the response value resp of child i at age t. Write down the
model involved in the analysis, including the mean, variance and
correlation coefficient of yit ’s. Give the estimates of the parameters
appearing in the model.
Let piit = P(yit = 1|age, smoke), t = 1, · · · , 4; i = 1, · · · , 537. Then
the model is specified by the following
log
pˆiit
1− pˆiit = βˆ0 + βˆ1 · age + βˆ2 · smoke
= −1.880− 0.113× age + 0.265× smoke
V̂ar(yit) = φˆpˆiit(1− pˆiit) = 0.9985e
−1.880−0.113·age+0.265·smoke
1 + e−1.880−0.113·age+0.265·smoke
Ĉorr(yit , yis) = αˆ = 0.3543 for t 6= s
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 140 / 143
Example 6.4 Ohio children (7)
2 Write down the model’s design matrix for data where id=536.
Z (id = 536) =

1 −2 1
1 −1 1
1 0 1
1 1 1

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 141 / 143
Example 6.4 Ohio children (8)
3 Estimate the odds ratio of wheezing for a child for every one year
older in age. Also calculate an approximate standard error of your
odds ratio estimate.
ÔR = eβˆ1 = e−0.113 = 0.893
s.e.(ÔR) ≈ eβˆ1 · s.e.(βˆ1) = e−0.113 × 0.044 = 0.039.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 142 / 143
Example 6.4 Ohio children (9)
4 Estimate the odds ratio of wheezing for a 10-year old child with a
smoking mother versus a 9-year old child with nonsmoking mother.
For a 10-year old child with a smoking mother
log
pˆiit
1− pˆiit = βˆ0 + βˆ1 · 1 + βˆ2 · 1 = −1.880− 0.113 + 0.265
For a 9-year old child with a non-smoking mother
log
pˆiit
1− pˆiit = βˆ0 + βˆ1 · 0 + βˆ2 · 0 = −1.880
Therefore
ÔR =
eβˆ0+βˆ1+βˆ2
eβˆ0
= eβˆ1+βˆ2 = e−0.113+0.265 = e0.152 = 1.164.
Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 143 / 143  Email:51zuoyejun

@gmail.com