Multivariate Extensions of GLM

MAST90139 Statistical Modelling for Data Science Slides

Guoqi Qian

SCHOOL OF MATHEMATICS AND STATISTICS

THE UNIVERSITY OF MELBOURNE

Multivariate GLM (Ch6) Guoqi Qian 1 / 143

Outline

1 §6.1 Four examples

§6.1.1 Example 1. Caesarian birth study

§6.1.2 Example 2. Breathing test results

§6.1.3 Example 3. Visual impairment study

§6.1.4 Example 4. Ohio children respiratory infection

2 §6.2 Multicategorical response models

§6.2.1 Multinomial distribution and Data

§6.2.2 The multi-categorical logit model

§6.2.3 Multivariate generalised linear models (MGLM)

3 §6.3 Models for nominal responses

§6.3.1 Principle of maximum random utility

§6.3.2 Modelling explanatory variables: choice of design matrix

4 §6.4 Models for ordinal responses

§6.4.1 Cumulative (or threshold) models

§6.4.2 Extended version of cumulative models

§6.4.3 Link functions and design matrices for cumulative models

§6.4.4 Sequential models and other approaches

5 §6.5 Statistical inference for MGLM (optional)

6 §6.6 Multivariate models for correlated responses

§6.6.1 Conditional models (optional)

§6.6.2 Margin models

§6.6.3 Marginal models for longitudinal data

Multivariate GLM (Ch6) Guoqi Qian 2 / 143

Chapter 6 Outline

In this chapter we consider the following multivariate extensions of GLM:

1 One response variable with k > 2 nominal categories;

2 One response variable with k > 2 ordinal categories;

3 Multiple correlated binary response variables.

Cases 1 and 2 refer to the polychotomous response which is often

modelled by a multinomial distribution.

Case 3 is often seen in situations involving repeated measurements or

longitudinal data.

We first present four examples which will be extensively analysed in

this chapter.

Multivariate GLM (Ch6) Guoqi Qian 3 / 143

Example 1. Caesarian birth study (1)

Table 1: Data on classification of 251 births by caesarian section

Caesarian planned Caesarian not planned

Infection Infection

I II non I II non

Antibiotics

Risk-factors 0 1 17 4 7 87

No risk-factors 0 0 2 0 0 0

No antibiotics

Risk-factors 11 17 30 10 13 3

No risk-factors 4 4 32 0 0 9

Response variable — Infection, with 3 levels (I, II, non)

Three dichotomous covariates (factors)

1 CP: caesarian planned or not;

2 RF: risk factors (e.g. diabetes, over-weight, etc.);

3 AB: antibiotics given as a prophylaxis?

Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 4 / 143

Example 1. Caesarian birth study (2)

Aim: Analysing the effects of the covariates on the risk of infection.

Note:Response variable is also called dependent variable or outcome

variable.

Note: Covariate is also called independent variable, explanatory

variable, or predictor.

The response variable Infection is categorical having > 2 levels.

The level values are nominal, but may also be treated as ordinal

which makes the analysis even more difficult.

Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 5 / 143

Example 1. Caesarian birth study (3)

The data in Table 1 are grouped data. They can be converted to

ungrouped data.

Individual CP RF AB Infection

1 1 1 1 II

2 1 1 1 non

...

...

...

...

...

18 1 1 1 non

19 1 0 1 non

20 1 0 1 non

21 1 1 0 I

...

...

...

...

...

251 0 0 0 non

CP=1 if yes and 0 if no; RF=1 if yes and 0 if no; AB=1 if yes and 0 if no.

Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 6 / 143

Example 2. Breathing test results

Forthofer & Lehnen (1981) considered the effect of age & smoking on

breathing test results (BTRs) for workers in industrial plants in Texas.

The response variable is BTR, ordinal with K = 3 levels.

Age (2 levels) and smoking status (3 levels) are predictors.

Table 2: BTS of Houston industrial workers

Breathing test results

Age Smoking status Normal Borderline Abnormal

< 40

Never smoked 577 27 7

Former smoker 192 20 3

Current smoker 682 46 11

40 ∼ 59

Never smoked 164 4 0

Former smoker 145 15 7

Current smoker 245 47 27

Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 7 / 143

Example 3. Visual impairment study

Table 3: Visual impairment data, from Liang, Zeger & Qaqish (1992)

White Black

Visual Age

impairment 40-50 51-60 61-70 70+ 40-50 51-60 61-70 70+ Total

Left eye

Yes 15 24 42 139 29 38 50 85 422

No 617 557 789 673 750 574 473 344 4777

Right eye

Yes 19 25 48 146 31 37 49 93 448

No 613 556 783 666 748 575 474 336 4751

Vector binary response variable (y1, y2), where y1 = 1 if left-eye

impaired, 0 otherwise; y2 = 1 if right-eye impaired, 0 otherwise.

Covariates: Age (yrs.), Race (W or B).

Aim: find the effect of race and age on visual impairment.

Complication: y1 and y2 are correlated.

Methods: multivariate models for correlated responses; conditional

models; asymmetric models, marginal models, GEE, etc..

Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 8 / 143

Example 4. Respiratory infection in Ohio children (1)

Table 4: Presence and absence of respiratory infection

Mother did not smoke Mother smoked

Age of child Frequency Age of child Frequency

7 8 9 10 7 8 9 10

0 0 0 0 237 0 0 0 0 118

0 0 0 1 10 0 0 0 1 6

0 0 1 0 15 0 0 1 0 8

0 0 1 1 4 0 0 1 1 2

0 1 0 0 16 0 1 0 0 11

0 1 0 1 2 0 1 0 1 1

0 1 1 0 7 0 1 1 0 6

0 1 1 1 3 0 1 1 1 4

1 0 0 0 24 1 0 0 0 7

1 0 0 1 3 1 0 0 1 3

1 0 1 0 3 1 0 1 0 3

1 0 1 1 2 1 0 1 1 1

1 1 0 0 6 1 1 0 0 4

1 1 0 1 2 1 1 0 1 2

1 1 1 0 5 1 1 1 0 4

1 1 1 1 1 1 1 1 1 7

Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 9 / 143

Example 4. Respiratory infection in Ohio children (2)

Data in Table 4 are grouped ones. They can be equivalently expressed as

the following ungrouped ones:

individual mother smoking status Infection History

1 0 0 0 0 0

...

...

...

...

...

...

237 0 0 0 0 0

238 0 0 0 0 1

...

...

...

...

...

...

247 0 0 0 0 1

...

...

...

...

...

...

531 1 1 1 1 1

...

...

...

...

...

...

537 1 1 1 1 1

Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 10 / 143

Example 4. Respiratory infection in Ohio children (3)

Data come from the Harvard Study of Air Pollution and Health

(Laird, Beck & Ware, 1984).

537 children were exam annually from age 7 to 10 on the presence or

absence of R.I.

Response variable: presence/absence of R.I.

Repeated measurements of the response variable for each child,

regarded as a short time series, may be referred to as longitudinal

data. Responses of one child may be correlated.

Covariate: mother’s smoking status (regular, non-regular) at the

beginning of the study.

Aim: Analysing the effect of mother’s smoking on children’s R.I.

Methods: Multivariate models for correlated responses, generalised

estimating equations (GEEs), generalised linear mixed effects models

(GLMM).

For examples 1 to 4, statistical modelling techniques and models are

needed for performing data analysis and drawing conclusions.

Multivariate GLM (Ch6) §6.1 Four examples Guoqi Qian 11 / 143

§6.2.1 Multinomial distribution (1)

Multinomial distribution is an important one for multi-categorical response

variables.

Response variable Y has k levels, labelled 1, 2, · · · , k, i.e.

Y ∈ {1, 2, · · · , k}.

Y can be represented by y = (y1, · · · , yq)T , q = k − 1, with binary

dummy variable yr =

{

1 if Y = r

0 otherwise

, r = 1, · · · , q.

Hence Y = r ⇐⇒ y = (0, · · · , 0, 1︸ ︷︷ ︸

r

, 0, · · · , 0)T and

Pr(Y = r) = Pr(yr = 1 and yj = 0 for j 6= r).

Suppose we have m independent observations of Y : Y1, · · · ,Ym.

They can be represented by y1, · · · , ym, with yi = (yi1, · · · , yiq)T ,

i = 1, · · · ,m; with yir = I (Yi = r).

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 12 / 143

§6.2.1 Multinomial distribution (2)

Suppose pir = Pr(Yi = r), r = 1, · · · , k, remain constants for all

i = 1, · · · ,m. Note

k∑

r=1

pir = 1 and pi = (pi1, · · · , piq)T , q = k − 1.

Let y˜ =

m∑

i=1

yi =

(

m∑

i=1

yi1, · · · ,

m∑

i=1

yiq

)T

denoted

= (y˜1, · · · , y˜q)T which

is a vector of frequencies of Yi = r , r = 1, · · · , q. Then y˜ follows a

multinomial distribution M(m,pi) with pmf

Pr (y˜ = (m1, · · · ,mq)) =

m!

m1! · · ·mq!(m−m1−· · ·−mq)!pi

m1

1 · · ·pimqq (1−pi1−· · ·−piq)m−m1−···−mq

It can be shown that E (y˜) = (mpi1, · · · ,mpiq)T = mpi.

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 13 / 143

§6.2.1 Multinomial distribution (3)

It can also be shown that

Cov(y˜) = m

[

diag(pi)− pipiT

]

=

mpi1(1−pi1) −mpi1pi2 · · · −mpi1piq

−mpi2pi1 mpi2(1−pi2) · · · −mpi2piq

...

...

. . .

...

−mpiqpi1 −mpiqpi2 · · · mpiq(1−piq)

Let y¯ = y˜/m be the vector of relevant frequencies.

It can be shown that E (y¯) = (pi1, · · · , piq)T = pi and

Cov(y¯) =

1

m

[

diag(pi)− pipiT ] = 1

m2

Cov(y˜).

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 14 / 143

§6.2.1 Data (1)

Data under analysis are usually of two forms:

Ungrouped data

Response variable obs. Explanatory variables obs.

Unit 1

y11 · · · y1q

... · · · ...

yi1 · · · yiq

... · · · ...

yn1 · · · ynq

=

yT1

...

yTi

...

yTn

n×q

,

x11 · · · x1L

... · · · ...

xi1 · · · xiL

... · · · ...

xn1 · · · xnL

=

xT1

...

xTi

...

xTn

n×L

...

Unit i

...

Unit n

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 15 / 143

§6.2.1 Data (2)

Grouped data

Suppose y

(1)

i , · · · , y(ni )i are response observations with fixed covariate

observations xi = (xi1, · · · , xiL)T .

Let y¯i =

1

ni

∑ni

j=1 y

(j)

i be the mean vector over the group-i units,

giving the relative frequencies of (Yi = 1, · · · ,Yi = q).

Let y˜i =

∑ni

j=1 y

(j)

i be the total vector over the group-i units, giving

the frequencies of (Yi = 1, · · · ,Yi = q).

Then the grouped data have the following form

Group Size Response variable obs. Explanatory variables obs.

1 n1

y∗11 · · · y∗1q

... · · ·

...

y∗i1 · · · y∗iq

... · · ·

...

y∗g1 · · · y∗gq

=

y∗T1

...

y∗Ti

...

y∗Tg

g×q

,

x11 · · · x1L

... · · ·

...

xi1 · · · xiL

... · · ·

...

xg1 · · · xgL

=

xT1

...

xTi

...

xTg

g×L

...

...

i ni

...

...

g ng

where y∗i is either the group mean y¯i or the group total y˜i , i = 1, · · · , g .

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 16 / 143

§6.2.2 The multi-categorical logit model (1)

In the multinomial response case pii = µi = E (yi |xi ) is a q × 1

vector. pii = (pii1, · · · , piiq)T .

The structure part of the GLM has the form pii = h(Ziβ), where

h(·) is a q × 1 vector-valued response function (also called inverse

link function);

Zi is a q × p design matrix constructed from xi ;

β is a p × 1 unknown vector parameter.

The q × 1 vector linear predictor for unit i is ηi = Ziβ.

And

[

ηT1 , , · · · ,ηTn

]T

= vec(η1, · · · ,ηn) is the nq × 1 vector linear

predictor for all n units.

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 17 / 143

§6.2.2 The multi-categorical logit model (2)

A multi-categorical logit model is expressed as

piir = P(Yi = r) =

exp

(

βr0 + zTi βr

)

1 +

∑q

s=1 exp

(

βs0 + zTi βs

) , r = 1, 2, · · · , q

where zi is an (a− 1)× 1 vector from xi . Equivalently,

log

piir

piik

= log

P(Yi = r)

P(Yi = k)

= βr0 + z

T

i βr , r = 1, 2, · · · , q.

Note piik = 1−

∑q

s=1 piis for all i = 1, · · · , n.

We see for this model, h = (h1, · · · , hq)T is a q × 1 vector response

function, where for each unit i ,

hr = hr (ηi ) = hr (Ziβ) =

exp(ηir )

1+

∑q

s=1 exp(ηis)

; r =1, 2, · · ·, q; i =1, · · ·, n

where ηir = βr0 + z

T

i βr and ηi = (ηi1, · · · , ηiq)T .

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 18 / 143

§6.2.2 The multi-categorical logit model (3)

We also see for this model, the design matrix for each unit i is

Zi =

1 zTi 0 0

T · · · 0 0T

0 0T 1 zTi · · · 0 0T

...

...

...

...

. . .

...

...

0 0T 0 0T · · · 1 zTi

q×(aq)

β = (β10,β

T

1 , · · · , βq0,βTq )T is an (aq)× 1 vector parameter. η1...

ηn

= [ηT1 , · · · ,ηTn ]T = [η11, · · · , η1q, · · · , ηn1, · · · , ηnq]T

=

[

(Z1β)

T , · · · , (Znβ)T

]T

=

Z1β...

Znβ

=

Z1...

Zn

(nq)×(aq)

β

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 19 / 143

§6.2.2 The multi-categorical logit model (4)

Note that the vector link function of this multi-categorical logit model

is g = (g1, · · · , gq)T , where for each unit i ,

gr = gr (pii1, · · · , piiq) = log piir

piik

= log

piiir

1− (pii1 + · · ·+ piiq) ;

r = 1, · · · , q and i = 1, · · · , n.

It shows that a multivariate GLM with multinomial responses is

characterised by two specifications:

the vector response function h (or the vector link function g = h−1);

the design matrix that depends on the covariates and the model.

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 20 / 143

§6.2.2 Example: Caesarian birth study (1)

Consider the data on infection following Caesarian birth given in §6.1.

Now we want to distinguish between the two different types of

infection.

Therefore the response variable Y has 3 possible outcomes: 1 for type

I infection, 2 for type II infection, and 3 for no infection.

Represent Yi by 2 dummy variables (yi1, yi2), where

yi1 = I (the ith Caesarian was followed by infection I)

yi2 = I (the ith Caesarian was followed by infection II)

There are 3 binary covariates: NoPlan, Antib, RiskF, coded by 3

dummy variables:

NoPlan = I (the Caesarian has not been planned)

Antib = I (antibiotics were given as a prophylaxis)

RiskF = I (risk factors were present)

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 21 / 143

§6.2.2 Example: Caesarian birth study (2)

The data can be condensed in the grouped data (group total) form.

Size Response variable obs. Explanatory variables obs.

(y˜i1, y˜i2) NoPlan, Antib, RiskF

Group 1 n1 = 40

4 4

11 17

0 0

0 1

0 0

10 13

4 7

0 0 0

0 0 1

0 1 0

0 1 1

1 0 0

1 0 1

1 1 1

Group 2 n2 = 58

Group 3 n3 = 2

Group 4 n4 = 18

Group 5 n5 = 9

Group 6 n6 = 26

Group 7 n7 = 98

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 22 / 143

§6.2.2 Example: Caesarian birth study (3)

We fit the following multi-categorical logit model to the data

log

pii1

pii3

= log

P(infection type I)

P(no infection)

= β10 + β1NNoPlan + β1AAntib + β1RRiskF

log

pii2

pii3

= log

P(infection type II)

P(no infection)

= β20 + β2NNoPlan + β2AAntib + β2RRiskF

Equivalently,

P(infection type j)

P(no infection)

= eβj0eβjNNoPlaneβjAAntibeβjRRiskF, j = I, II.

Thus, the exponential of the parameter gives the factorial

contribution to the odds if the corresponding explanatory variable

takes value 1 instead of 0.

Alternatively, the exponential of the parameter may be seen as odds

ratio between odds for value 1 and odds for value 0.

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 23 / 143

§6.2.2 Example: Caesarian birth study (4)

For example, for NoPlan one obtains

eβjN =

P(infection type j |NoPlan=1,Antib,RiskF)

P(no infection|NoPlan=1,Antib,RiskF)

P(infection type j |NoPlan=0,Antib,RiskF)

P(no infection|NoPlan=0,Antib,RiskF)

, j = I, II.

where Antib and RiskF can take any fixed values.

Note the vector-valued link function for this model is

g(pii1, pii2) =

(

log

pii1

1− pii1 − pii2 , log

pii2

1− pii1 − pii2

)T

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 24 / 143

§6.2.2 Example: Caesarian birth study (5)

The estimates of the parameters and their exponential are given in

the following table

Table 5: Parameter estimates in Caesarian birth study

β exp(β) β exp(β)

constant -2.621 0.072 constant -2.560 0.077

NoPlan (I) 1.174 3.235 NoPlan (II) 0.996 2.707

Antib (I) -3.520 0.030 Antib (II) -3.087 0.046

RiskF (I) 1.829 6.228 RiskF (II) 2.195 8.980

NoPlan of 1 vs. 0 increases the type I infection odds by a factor of

3.235.

Taking Antib decreases the type I infection OR to 0.030 (from 1).

RiskF increases the type I infection odds ratio to 6.228.

Similar things can be said of the type II infection odds ratio.

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 25 / 143

§6.2.2 Example: Caesarian birth study (6)

Caes.dat <- read.csv("D:/MAST90139/Example6-1Caes.csv")

is.data.frame(Caes.dat) #TRUE

Caes.dat

size noInf Inf1 Inf2 NoPlan Antib RiskF

1 40 32 4 4 0 0 0

2 58 30 11 17 0 0 1

3 2 2 0 0 0 1 0

4 18 17 0 1 0 1 1

5 9 9 0 0 1 0 0

6 26 3 10 13 1 0 1

7 98 87 4 7 1 1 1

library(nnet)

##package nnet is used for fitting multinomial log-linear model

##or multi-categorical logit model.

Caes1=multinom(as.matrix(Caes.dat[,2:4])~NoPlan+Antib+RiskF,data=Caes.dat)

# weights: 15 (8 variable)

initial value 275.751684

iter 10 value 161.068578

final value 160.937147

converged

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 26 / 143

§6.2.2 Example: Caesarian birth study (7)

Caes1=multinom(as.matrix(Caes.dat[,2:4])~NoPlan+Antib+RiskF,data=Caes.dat)

summary(Caes1)

Call:

multinom(formula=as.matrix(Caes.dat[,2:4]) ~ NoPlan + Antib + RiskF, data=Caes.dat)

Coefficients:

(Intercept) NoPlan Antib RiskF

Inf1 -2.621012 1.1742513 -3.520249 1.829241

Inf2 -2.559917 0.9959762 -3.087160 2.195458

Std. Errors:

(Intercept) NoPlan Antib RiskF

Inf1 0.5567209 0.5213014 0.6717416 0.6023322

Inf2 0.5462995 0.4813634 0.5498675 0.5869601

Residual Deviance: 321.8743 AIC: 337.8743

attributes(Caes1)

$names

[1] "n" "nunits" "nconn" "conn" "nsunits"

[6] "decay" "entropy" "softmax" "censored" "value"

[11] "wts" "convergence" "fitted.values" "residuals" "call"

[16] "terms" "weights" "deviance" "rank" "lab"

[21] "coefnames" "vcoefnames" "xlevels" "edf" "AIC"

$class

[1] "multinom" "nnet"

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 27 / 143

§6.2.2 Example: Caesarian birth study (8)

Caes1$fitted ##fitted category probabilities

noInf Inf1 Inf2

1 0.8695347 0.063240568 0.067224753

2 0.4656329 0.210951192 0.323415876

3 0.9943521 0.002140051 0.003507889

4 0.9568456 0.012827892 0.030326540

5 0.6922135 0.162899513 0.144886971

6 0.2300766 0.337273035 0.432650355

7 0.8855925 0.038416539 0.075990954

Caes1$residuals

#resid(Caes1) does the same thing, = category means - fitted category probabilities.

noInf Inf1 Inf2

1 -0.069534679 0.036759432 0.032775247

2 0.051608447 -0.021296019 -0.030312428

3 0.005647940 -0.002140051 -0.003507889

4 -0.012401124 -0.012827892 0.025229016

5 0.307786484 -0.162899513 -0.144886971

6 -0.114691994 0.047342349 0.067349645

7 0.002162595 0.002399788 -0.004562382

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 28 / 143

§6.2.2 Example: Caesarian birth study (9)

Caes1$deviance #residual deviance of the model, also given by deviance(Caes1)

[1] 321.8743

Caes1$edf

[1] 8

predict(Caes1, data.frame(NoPlan=1, Antib=1, RiskF=0), type="probs")

noInf Inf1 Inf2

0.983753294 0.006850796 0.009395909

predict(Caes1, data.frame(NoPlan=1, Antib=1, RiskF=0), type="class")

[1] noInf

Levels: noInf Inf1 Inf2

predict(Caes1, data.frame(NoPlan=1, Antib=1, RiskF=0), type="probs", se.fit=TRUE)

noInf Inf1 Inf2

0.983753294 0.006850796 0.009395909

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 29 / 143

§6.2.2 Example: Caesarian birth study (10)

step(Caes1)

Start: AIC=337.87

as.matrix(Caes.dat[, 2:4]) ~ NoPlan + Antib + RiskF

trying - NoPlan

# weights: 12 (6 variable)

...................................................................

converged

Df AIC

- NoPlan 6 340.8649

- RiskF 6 358.3103

- Antib 6 397.4663

###The model ~ NoPlan + Antib + RiskF has the smallest AIC and is the best model.

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 30 / 143

§6.2.2 Example: Caesarian birth study (11)

Caes2=multinom(as.matrix(Caes.dat[,2:4])~NoPlan+Antib,data=Caes.dat)

summary(Caes2)

Coefficients:

(Intercept) NoPlan Antib

Inf1 -1.431678 1.255595 -2.962218

Inf2 -1.058278 1.086508 -2.484319

Std. Errors:

(Intercept) NoPlan Antib

Inf1 0.2870129 0.4954738 0.6408664

Inf2 0.2470012 0.4475669 0.5136334

Residual Deviance: 346.3103

AIC: 358.3103

Caes2$edf

[1] 6

Caes2$deviance-Caes1$deviance

[1] 24.43605

1-pchisq(24.43605,2)

[1] 4.940594e-06

## Hence RiskF has significant effect on the response variable.

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 31 / 143

§6.2.3 Extending Multicategory logit model to MGLM (1)

Consider a multivariate q × 1 response variable yi for unit i . Let

µi = E (yi |xi ), i = 1, · · · , n.

Distributional assumption: The yi ’s given respective xi ’s are

independent. Each yi has a distribution belonging to a simple

(multivariate) exponential family with pdf

f (yi |θi , φ, ωi ) = exp

{

yTi θi − b(θi )

φ

· ωi + c(yi , φ, ωi )

}

Structural assumption: µi ’s depends on the linear predictor

ηi = Ziβ in the form

µi = h(ηi ) = h(Ziβ) = [h1(Ziβ), · · · , hq(Ziβ)]T , i = 1, · · · , n

where

the response function h : S → M is defined on S ⊂ Rq, taking values

in M ⊂ Rq;

Zi is a q × p design matrix for unit i ;

β = (β1, · · · , βp)T is an unknown parameter vector from B ⊂ Rp.

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 32 / 143

§6.2.3 Extending Multicategory logit model to MGLM (2)

For the case of a multi-categorical response, one has to consider the

multinomial distribution which may be embedded into the framework

of a simple multivariate exponential family.

For yi = (yi1, · · · , yiq)T ∼ Multinomial(ni ,pii ), the pmf of y¯i = n−1i yi

has the form

f (y¯i |θi , φ, ωi ) = exp

{

y¯Ti θi − b(θi )

φ

· ωi + c(y¯i , φ, ωi )

}

, i = 1, · · · , g

where the natural parameter θi is given by

θi =

[

log

pii1

piik

, · · · , log piiq

piik

]T

; piik = 1−

q∑

j=1

piij ; and

b(θi )=− log(1−pii1−· · ·−piiq) = − log piik = log

(

1+eθi1 +· · ·+eθiq);

c(yi , φ, ωi ) = log

ni !

yi1!···yiq!(ni−yi1−···−yiq)! ; and ωi = ni .

Multivariate GLM (Ch6) §6.2 Multicategorical response models Guoqi Qian 33 / 143

§6.3.1 Principle of maximum random utility (1)

Models for response variables with unordered categories may be

motivated from a consideration of latent variables.

In probabilistic choice theory, it is often assumed that an unobserved

utility Ur is associated with category r of the response variable Y .

Let Ur be a latent variable associated with category r . Now assume

Ur = ur + εr , with ur being a fixed value and ε1, · · · , εk i.i.d. having

continuous cdf F .

Following the principle of maximum random utility

Y = r ⇔ Ur = max{U1, · · · ,Uk}, r = 1, · · · , k.

Now it follows that

P(Y = r) = P(Ur − U1 ≥ 0, · · · ,Ur − Uk ≥ 0)

= P(ε1 ≤ ur − u1 + εr , · · · , εk ≤ ur − uk + εr )

=

∫ +∞

−∞

∏

s 6=r

F (ur−us +ε) · f (ε)dε, where f =F ′ is the pdf of εr .

Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 34 / 143

§6.3.1 Principle of maximum random utility (2)

Different distributional assumption for εr ’s yields different models.

If εr ’s are i.i.d. N(0, 1), one gets independent probit model for Y .

(This is true if k = 2. But I am not able to prove it when k > 2.)

Try the case k = 3.

Then Ui ∼ N(ui , 1), i = 1, 2, 3; and U1,U2,U3 are independent.

Let W1 = U1 − U2 and W2 = U1 − U3. Then[

W1

W2

]

∼ N

([

u1 − u2

u1 − u3

]

,

[

2 1

1 2

])

, with joint pdf

f (w1,w2) =

1

2

√

3pi

exp

{

−1

6

[

w1−(u1−u2)

w2−(u1−u3)

]T[

2 −1

−1 2

][

w1−(u1−u2)

w2−(u1−u3)

]}

=

1

2

√

3pi

e{− 13 [(w1−(u1−u2))2−(w1−(u1−u2))(w2−(u1−u3))+(w2−(u1−u3))2]}

Then e.g. P(Y =1)=P(U1−U2 ≥ 0,U1−U3 ≥ 0)=P(W1 ≥ 0,W2 ≥ 0)

=

∫∞

0

∫∞

0 f (w1,w2)dw1dw2 =

∫∞

0

1

4

√

pi

e

−1

4

(w2−(u1−u3))2

[

erf

(

1

2

√

3

[w2−(u1−u3)+2(u1−u2)]

)

+1

]

dw2

where erf(x) = 2√

pi

∫ x

0

e−t

2

dt = 2[Φ(

√

2x)− 12 ]. This does not seem to

result in a probit model.

Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 35 / 143

§6.3.1 Principle of maximum random utility (3)

If ε1, · · · , εk i.i.d. following the extreme-value distribution

with cdf F (x) = exp{−e−x} and pdf f (x) = exp{−e−x} exp(−x),

then we get the multinomial logit model

P(Y = r) =

eur∑k

s=1 e

us

=

eur−uk

1+

∑q

s=1 e

us−uk =

e u˜r

1+

∑q

s=1 e

u˜s

, with u˜r = ur − uk .

We see a specific cdf F determines the link or response function of

the model.

Proof.

P(Y = r) =

∫ +∞

−∞

∏

s 6=r

exp{−e−ur+us−ε} · exp{−e−ε} · exp{−ε}dε

=

∫ +∞

−∞

e−

∑

s 6=r e

us−ur−ε−e−εe−εdε =

∫ +∞

−∞

e−[

∑

s 6=r e

us−ur +1]e−εe−εdε

=

e−[

∑

s 6=r e

us−ur +1]e−ε∑

s 6=r e

us−ur + 1

∣∣∣∣∣

+∞

−∞

=

1∑

s 6=r e

us−ur + 1

=

eur∑k

s=1 e

us

.

Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 36 / 143

§6.3.2 Choice of design matrix (1)

Explanatory variables and their coefficients in a linear predictor

function ηi = Ziβ are the terms influencing the response variable.

Assume the response variable has multiple categories. Explanatory

variables can be classified into 3 types:

1 global: the ones whose values depend on individuals but not the

response categories;

2 alternative-specific: the ones whose values depend on the response

categories but not the individuals;

3 neither (1) or (2); or both (1) and (2).

The coefficient parameters (which always do not depend on

individuals) are classified into

1 global: whose values don’t depend on response categories;

2 category-specific: depend on categories.

Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 37 / 143

§6.3.2 Choice of design matrix (2)

Example.

Suppose the mean utility for individual i is uir , satisfying

uir = αr0 + z

T

i αr , r = 1, · · · , k; i = 1, · · · , n.

Then zi is global and αr is category-specific.

The multinomial logit model becomes

P(Yi = r |zi ) = e

βr0+zTi βr

1 +

∑q

s=1 e

βs0+zTi βs

where βr0 + zTi βr = u˜ir = uir − uik = (αr0 − αk0) + zTi (αr −αk).

Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 38 / 143

§6.3.2 Choice of design matrix (3)

Example (continued 1)

The associated GLM is pii1...

piiq

=

P(Yi = 1|zi )...

P(Yi = q|zi )

=

eηi1

1+

∑q

s=1 e

ηis

...

e

ηiq

1+

∑q

s=1 e

ηis

denoted= h(ηi )

where the linear predictor has the form ηi = Ziβ, with the design

matrix for individual i being

Zi =

1 zTi 0 0

T · · · 0 0T

0 0T 1 zTi · · · 0 0T

...

...

...

...

. . .

...

...

0 0T 0 0T · · · 1 zTi

q×p

and β = (β10,β

T

1 , · · · , βq0,βTq )T is a p × 1 parameter vector.

Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 39 / 143

§6.3.2 Choice of design matrix (4)

Example (continued 2)

If uir also depends on some alternative-specific explanatory variables,

then uir may be assumed of the form

uir = αr0 + z

T

i αr + w

T

r γ, r = 1, · · · , k; i = 1, · · · , n

where wr is an alternative-specific explanatory vector variable and γ

is a global parameter.

Then u˜ir = uir − uik = βr0 + zTi βr + (wr −wk)Tγ.

The resultant multinomial logit model will have the following design

matrix for individual i

Zi =

1 zTi 0 0

T · · · 0 0T (w1 −wk)T

0 0T 1 zTi · · · 0 0T (w2 −wk)T

...

...

...

...

. . .

...

...

...

0 0T 0 0T · · · 1 zTi (wq −wk)T

q×p

and β = (β10,β

T

1 , · · · , βq0,βTq ,γT )T is a p × 1 parameter vector.

Multivariate GLM (Ch6) §6.3 Models for nominal responses Guoqi Qian 40 / 143

§6.4 Models for ordinal responses

Response variable having k > 2 categories may be of ordinal nature.

Example: Breathing test results

Ordinal variables may come from quite different mechanisms. Two

distinguished types are: grouped continuous and assessed ordered

categorical variables.

Grouped continuous — merely a categorized continuous variable

Assessed ordered — obtained after an assessor’s judgment.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 41 / 143

§6.4.1 Cumulative models: the threshold approach

Suppose there is an unobserved continuous variable U for the

response variable Y having k categories.

Suppose Y is determined by U in the following way

Y = r ⇔ θr−1 < U ≤ θr ; r = 1, · · · , k

where −∞ = θ0 < θ1 < · · · < θk = +∞ unknown.

That means Y is a coarser (categorized) version of U determined by

the thresholds θ1, · · · , θk−1.

Suppose U depends on a p × 1 vector explanatory variable x in the

following way

U = −xTγ + ε

where γ = (γ1, · · · , γp)T is an unknown vector parameter and ε is

random with cdf F .

It follows that

P(Y ≤ r |x) = P(U ≤ θr ) = F (θr + xTγ).

This is called a cumulative model or threshold model with cdf F .

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 42 / 143

§6.4.1 Cumulative logistic (or proportional odds) model (1)

If choose F (x) =

[

1 + e−x

]−1

which is the cdf of a logistic

distribution, then

P(Y ≤ r |x) = e

θr+xTγ

1 + eθr+x

Tγ , r = 1, · · · , q = k − 1. (1)

It is equivalent to

log

P(Y ≤ r |x)

P(Y > r |x) = θr + x

Tγ or (2)

P(Y ≤ r |x)

P(Y > r |x) = exp{θr + x

Tγ} (3)

The model given by (1), (2) or (3) is called the cumulative logistic

model.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 43 / 143

§6.4.1 Cumulative logistic (or proportional odds) model (2)

The cumulative logistic model is also called a proportional odds

model due to following interpretation:

Suppose there are 2 populations (groups) identified by x1 and x2

respectively. Then

P(Y ≤ r |x1)

P(Y > r |x1) = e

θr+xT1 γ and

P(Y ≤ r |x2)

P(Y > r |x2) = e

θr+xT2 γ .

It follows that

cumulative odds ratio =

P(Y ≤ r |x1)/P(Y > r |x1)

P(Y ≤ r |x2)/P(Y > r |x2) = e

(x1−x2)Tγ

which is the same for all r = 1, · · · , q.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 44 / 143

§6.4.1 Grouped Cox (or proportional hazards) model

If choose F as the cdf of the extreme minimal-value distribution, i.e.

F (x) = 1− exp{−ex}, then

P(Y ≤ r |x) = 1− exp{−eθr+xTγ} (4)

⇐⇒ log [− log P(Y > r |x)]︸ ︷︷ ︸

complementary log-log link

= θr + x

Tγ, r = 1, · · · , q

Model (4) is referred to as the grouped Cox model or proportional

hazards model due to the following interpretation:

For 2 populations (groups) identified by x1 and x2 respectively

log P(Y > r |x1) = log P(Y > r |x2) · e(x1−x2)Tγ , r = 1, · · · , q

Differentiating w.r.t. r on both sides results in

λ(r |x1) = λ(r |x2) · e(x1−x2)Tγ , ⇔ λ(r |x1)

λ(r |x2) = e

(x1−x2)Tγ ,

which is the same for all r = 1, · · · , q. Here λ(r |·) = − d

dr

logP(Y > r |·)

is the hazard function.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 45 / 143

§6.4.1 Cumulative model based on extreme maximal-value CDF

If choose F as the cdf of the extreme maximal-value distribution, i.e.

F (x) = exp{−e−x}, then

P(Y ≤ r |x) = exp{−e−(θr+xTγ)} (5)

⇐⇒ log [− log P(Y ≤ r |x)]︸ ︷︷ ︸

log-log link

= −(θr + xTγ), r = 1, · · · , q. (6)

Model (5) or (6) is referred to as the extreme maximal-value

distribution model.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 46 / 143

S6.4.2 Extended version of cumulative models (1)

Recall that an ordinal categorical response Y with k categories is

determined by a latent variable U as

Y = r ⇔ θr−1

__and U = −xTγ + ε, with ε having cdf F .__

Then a cumulative model or threshold model with cdf F is

P(Y ≤ r |x) = P(U ≤ θr ) = F (θr + xTγ). (7)

One may assume a linear form for thresholds θ1, · · · , θq,

θr = βr0 + w

Tβr

where w is a vector explanatory variable and βr = (βr1, · · · , βrm)T is

a category-specific parameter vector.

Then an extended cumulative model is obtained:

P(Y ≤ r |x,w) = F (βr0 + wTβr + xTγ). (8)

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 47 / 143

S6.4.2 Extended version of cumulative models (2)

Remarks

1 Model (8) becomes unidentifiable if wi = xi for each individual

i = 1, · · · , n; i.e., it will not be possible to separate γ from each βr if

wi = xi .

2 So wi and xi must not be the same.

3 βr can be global, while γ can be replaced by a category-specific

parameter.

4 wi are called threshold variables, while xi are called shift variables.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 48 / 143

§6.4.3 Link functions for cumulative models

For a cumulative GLM, the link function g =(g1, · · · , gq), as a vector

function of (pi1 = P(Y =1), · · · , piq = P(Y =q)), is given by

gr (pi1, · · · , piq) = F−1(pi1 + · · ·+ pir ), r = 1, · · · , q,

where F−1(·) is the inverse function for F that specifies the

cumulative model

pi1 + · · ·+ pir = P(Y ≤ r |x,w) = F (βr0 + wTβr + xTγ).

F being the N(0, 1) cdf gives the probit link

gr (pi1, · · · , piq) = Φ−1(pi1 + · · ·+ pir ), r = 1, · · · , q.

For proportional odds model, the link is the cumulative logit

gr (pi1, · · · , piq) = log pi1 + · · ·+ pir

1− (pi1 + · · ·+ pir ) , r = 1, · · · , q.

For proportional hazards model, the link is the complement log-log

gr (pi1, · · · , piq) = log{− log(1− (pi1 + · · ·+ pir ))}, r = 1, · · · , q.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 49 / 143

§6.4.3 Design matrices for cumulative models (1)

For the simple cumulative model

P(Yi ≤ r |xi ) = F (θr + xTi γ),

the design matrix Zi in the linear predictor ηi = Ziβ is given by

Zi =

1 xTi

1 xTi

. . .

...

1 xTi

q×(q+dim(xi ))

and

β = (θ1, · · · , θq,γT )T .

Here xi should not contain a constant component.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 50 / 143

§6.4.3 Design matrices for cumulative models (2)

For the extended cumulative model

P(Yi ≤ r |xi ,wi ) = F (βr0 + wTi βr + xTi γ),

the design matrix Zi in the linear predictor ηi = Ziβ is given by

Zi =

1 wTi x

T

i

1 wTi x

T

i

. . .

. . .

...

1 wTi x

T

i

q×(q(1+dim(wi ))+dim(xi ))

and

β = (β10,β

T

1 , · · · , βq0,βTq ,γT )T .

Here neither wi nor xi should contain a constant component.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 51 / 143

§6.4.3 Alternative formulation/parameterisation in

cumulative models (1)

The threshold parameters θ1, · · · , θq in cumulative models

P(Y ≤ r |x) = F (θr + xTγ) are restricted by

θ1 < θ2 < · · · < θq

The corresponding restriction for the extended cumulative model is

β10 + w

Tβ1 < β20 + w

Tβ2 < · · · < βq0 + wTβq

which is more complex.

If such constraints are not explicitly accounted for in estimation, the

iterative estimation procedure may fail due to fitting inadmissible

parameters.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 52 / 143

§6.4.3 Alternative formulation/parameterisation in

cumulative models (2)

For the simple cumulative model, the problem can be avoided by

using an alternative formulation or reparameterisation:

α1 = θ1, αr = log(θr − θr−1), r = 2, · · · , q

⇐⇒ θ1 = α1, θr = θ1 +

r∑

i=2

eαi , r = 2, · · · , q.

The new parameters α1, · · · , αq are unconstrained.

The cumulative model under the new formulation has a linear

structure of the following form

F−1 (P(Y = 1|x)) = α1 + xTγ

log

[

F−1 (P(Y ≤ r |x))− F−1 (P(Y ≤ r − 1|x))] = αr , r = 2, · · · , q.

The link function corresponding to the new formulation can be

determined accordingly.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 53 / 143

§6.4.3 Alternative formulation/parameterisation in

cumulative models (3)

For the special case of the cumulative logistic model, one obtains the

following link function (log-logit link)

g1(pi1, · · · , piq) = log

(

pi1

1− pi1

)

gr (pi1, · · · , piq) = log

[

log

{

pi1 + · · ·+ pir

1− pi1 − · · · − pir

}

− log

{

pi1 + · · ·+ pir−1

1− pi1 − · · · − pir−1

}]

,

q = 2, · · · , q.

If this alternative link function is used, the design matrix Zi has to be

adapted. Now

Zi =

1 xTi

1 0

. . .

...

1 0

q×(q+dim(xi ))

and the parameter vector becomes β = (α1, · · · , αq,γT )T .

Here xi should not contain a constant component.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 54 / 143

§6.4.3 Computing packages and an example

The polr() function in the R MASS package can be used to fit

cumulative models (but not extended cumulative models).

More generally, the ordinal package in R can be used for fitting

ordinal regression models.

Example 2 in §6.1. Breathing test results.

The response variable is BTR, ordinal with k = 3 levels.

Age (2 levels) and Smoking status (3 levels) are predictors.

The aim is to see how Age and Smoking are associated with BTR.

Command polr in R MASS package is used in the analysis.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 55 / 143

§6.4.3 Breathing testing example (1)

Usage of polr:

polr(formula, data, weights, start, ..., subset, na.action,

contrasts = NULL, Hess = FALSE, model = TRUE,

method = c("logistic", "probit", "loglog", "cloglog", "cauchit"))

Note the estimates of γ are those from polr multiplied by −1 due to

the model used in polr being P(Y ≤ r |x) = F (θr − xTγ).

> library(MASS); BTR.dat <- read.csv("D:/MAST90139/Example6-2BTR.csv"); BTR.dat

BTR Age Smoking Freq

1 1Normal <40 1Never 577

2 2Border <40 1Never 27

3 3Abnorm <40 1Never 7

4 1Normal <40 2Former 192

5 2Border <40 2Former 20

6 3Abnorm <40 2Former 3

7 1Normal <40 3Current 682

8 2Border <40 3Current 46

9 3Abnorm <40 3Current 11

10 1Normal 40to59 1Never 164

11 2Border 40to59 1Never 4

12 3Abnorm 40to59 1Never 0

13 1Normal 40to59 2Former 145

14 2Border 40to59 2Former 15

15 3Abnorm 40to59 2Former 7

16 1Normal 40to59 3Current 245

17 2Border 40to59 3Current 47

18 3Abnorm 40to59 3Current 27

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 56 / 143

§6.4.3 Breathing testing example (2)

> is.factor(BTR.dat$BTR) #TRUE; > is.ordered(BTR.dat$BTR) #FALSE

> is.factor(BTR.dat$Age) #TRUE; > is.factor(BTR.dat$Smoking) #TRUE

> is.factor(BTR.dat$Freq) #FALSE;

> BTR.dat$BTR=as.ordered(BTR.dat$BTR) #Set BTR as an ordinal factor

> help(polr)

> BTR.logit=polr(BTR~Age+Smoking+Age:Smoking, data=BTR.dat, weights=Freq,

Hess=T,method="logistic")

> summary(BTR.logit)

Coefficients:

Value Std. Error t value

Age40to59 -0.8857 0.5359 -1.653

Smoking2Former 0.6973 0.2822 2.471

Smoking3Current 0.3472 0.2238 1.551

Age40to59:Smoking2Former 1.1458 0.6228 1.840

Age40to59:Smoking3Current 2.2007 0.5689 3.868

Intercepts:

Value Std. Error t value

1Normal|2Border 2.8334 0.1764 16.0603

2Border|3Abnorm 4.3078 0.2130 20.2210

Residual Deviance: 1564.968

AIC: 1578.968

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 57 / 143

§6.4.3 Breathing testing example (3)

BTR.logit1=polr(BTR~Age+Smoking,data=BTR.dat, weights=Freq,Hess=T,method="logistic")

summary(BTR.logit1)

Coefficients:

Value Std. Error t value

Age40to59 0.7772 0.1482 5.243

Smoking2Former 0.7815 0.2333 3.350

Smoking3Current 0.9607 0.1918 5.010

Intercepts:

Value Std. Error t value

1Normal|2Border 3.1927 0.1749 18.2544

2Border|3Abnorm 4.6543 0.2125 21.9076

Residual Deviance: 1589.744

AIC: 1599.744

anova(BTR.logit, BTR.logit1)

Likelihood ratio tests of ordinal regression models

Response: BTR

Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)

1 Age + Smoking 2214 1589.744

2 Age + Smoking + Age:Smoking 2212 1564.968 1 vs 2 2 24.77585 4.168e-06

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 58 / 143

§6.4.3 Breathing testing example (4)

> BTR.logit$fitted

1Normal 2Border 3Abnorm

1 0.9444565 0.04225961 0.013283873

2 0.9444565 0.04225961 0.013283873

3 0.9444565 0.04225961 0.013283873

4 0.8943685 0.07930628 0.026325236

5 0.8943685 0.07930628 0.026325236

6 0.8943685 0.07930628 0.026325236

7 0.9231700 0.05813458 0.018695369

8 0.9231700 0.05813458 0.018695369

9 0.9231700 0.05813458 0.018695369

10 0.9763207 0.01815786 0.005521451

11 0.9763207 0.01815786 0.005521451

12 0.9763207 0.01815786 0.005521451

13 0.8671630 0.09895796 0.033879045

14 0.8671630 0.09895796 0.033879045

15 0.8671630 0.09895796 0.033879045

16 0.7633793 0.17036521 0.066255462

17 0.7633793 0.17036521 0.066255462

18 0.7633793 0.17036521 0.066255462

> BTR.logit$residual #does not exist.

> BTR.logit$lp #linear predictor values

1 2 3 4 5 6 7 8 9

0.0000000 0.0000000 0.0000000 0.6972823 0.6972823 0.6972823 0.3472245 0.3472245 0.3472245

10 11 12 13 14 15 16 17 18

-0.8857462 -0.8857462 -0.8857462 0.9573393 0.9573393 0.9573393 1.6621466 1.6621466 1.6621466

> attributes(BTR.logit)

[1] "coefficients" "zeta" "deviance" "fitted.values" "lev" "terms"

[7] "df.residual" "edf" "n" "nobs" "call" "method"

[13] "convergence" "niter" "lp" "Hessian" "model" "contrasts"

[19] "xlevels"

$class

[1] "polr"

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 59 / 143

§6.4.3 Breathing testing example (5)

On the previous 3 slides we fit two cumulative logistic models to BTR

using Age and Smoking: the first (BTR.logit includes the

Age:Smokings interaction-effect terms as well as the main-effect

terms; while the second (BTR.logit1 includes main-effect terms only.

Dummy coding (i.e. contr.treatment coding in R) is used for

representing Age and Smoking:

Age40to59 =

{

1, if age is 40 ∼ 59

0 if age < 40

Smoking2Former =

{

1, former smoker

0 else

,

Smoking3Current =

{

1, current smoker

0 else

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 60 / 143

§6.4.3 Breathing testing example (6)

The first model can be written as

log

P(BTR = Normal)

P(BTR > Normal)

= θ1 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current

log

P(BTR ≤ Bordering)

P(BTR > Bordering)

= θ2 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current

MLEs of the parameters in the first model are

θˆ1 = 2.833, θˆ2 = 4.308, γˆ1 = 0.886, γˆ2 = −0.697, γˆ3 = −0.347, γˆ4 = −1.146, γˆ5 = −2.201

with their standard errors given in the R output.

Estimated values of the γ parameters can be interpreted as log

cumulative odds ratios. For example, the odds of having a normal

test result (or normal or bordering result) for a worker aged 40 to 59

who never smoked is estimated to be e−γˆ2−γˆ4 = e0.697+1.146 = 6.32

times of that for a worker aged 40 to 59 who is a former worker.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 61 / 143

§6.4.3 Breathing testing example (7)

The second model can be written as

log

P(BTR = Normal)

P(BTR > Normal)

= θ1 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

log

P(BTR ≤ Bordering)

P(BTR > Bordering)

= θ2 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

MLEs of the parameters in the first model are

θˆ1 = 3.193, θˆ2 = 4.654, γˆ1 = −0.777, γˆ2 = −0.782, γˆ3 = −0.961

with their standard errors given in the R output.

“H0 : γ4 = γ5 = 0 vs. H1 : not H0” is the hypothesis about the

Age:Smoking interaction effects on BTR.

A χ2 analysis of deviance test comparing models BTR.logit and

textttBTR.logit1 gives an LR statistic of 22.776 and p-value of

4.168× 10−6. Therefore, there is very strong evidence to reject H0.

We conclude that Age and Smoking have very strong interaction

effects on BTR.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 62 / 143

§6.4.3 Breathing testing example (8)

Linear predictor of the first model is

η1 = θ1 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current

η2 = θ2 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current

However, the linear predictor returned from polr is just xTγ, i.e.

γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current

thus has just 6 different values knowing that there are 6

combinations of Age and Smoking.

Knowing these tricks, there should be no difficulty to reveal how the

fitted values returned by BTR.logit$fitted are obtained.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 63 / 143

§6.4.3 Breathing testing example (9)

Effect coding (i.e. contr.sum coding in R, −1 for last category) may

also be used for representing Age and Smoking:

Age1 =

{

1, if age < 40

−1, if age is 40 ∼ 59

Smoking1 =

1, never smoked0, former smoker−1, current smoker , Smoking2 =

0, never smoked1, former smoker−1, current smoker

With this coding, the first model can be written as

log

P(BTR = Normal)

P(BTR > Normal)

= θ1 + γ1Age1 + γ2Smoking1 + γ3Smoking2

+γ4Age1 · Smoking1 + γ5Age1 · Smoking2

log

P(BTR ≤ Bordering)

P(BTR > Bordering)

= θ2 + γ1Age1 + γ2Smoking1 + γ3Smoking2

+γ4Age1 · Smoking1 + γ5Age1 · Smoking2

Although the parameter estimates will be different now, other results

from the model will be the same as that using dummy coding.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 64 / 143

§6.4.3 Breathing testing example (10)

> options(contrasts = c("contr.sum", "contr.poly"))

> BTR.logit=polr(BTR~Age+Smoking+Age:Smoking, data=BTR.dat, weights=Freq, Hess=T,method="logistic")

> summary(BTR.logit)

Coefficients:

Value Std. Error t value

Age1 -0.11487 0.1086 -1.0580

Smoking1 -0.90596 0.1890 -4.7933

Smoking2 0.36428 0.1421 2.5644

Age1:Smoking1 0.55777 0.1890 2.9512

Age1:Smoking2 -0.01517 0.1421 -0.1068

Intercepts:

Value Std. Error t value

1Normal|2Border 2.3704 0.1086 21.8209

2Border|3Abnorm 3.8447 0.1599 24.0486

Residual Deviance: 1564.968

AIC: 1578.968

> BTR.logit1=polr(BTR~Age+Smoking, data=BTR.dat, weights=Freq, Hess=T,method="logistic")

> anova(BTR.logit, BTR.logit1)

Likelihood ratio tests of ordinal regression models

Response: BTR

Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)

1 Age + Smoking 2214 1589.744

2 Age + Smoking + Age:Smoking 2212 1564.968 1 vs 2 2 24.77585 4.168618e-06

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 65 / 143

§6.4.3 Breathing testing example (11)

> attributes(BTR.logit)

$names

[1] "coefficients" "zeta" "deviance" "fitted.values" "lev" "terms" "df.residual"

[8] "edf" "n" "nobs" "call" "method" "convergence" "niter"

[15] "lp" "Hessian" "model" "contrasts" "xlevels"

> BTR.logit$fitted

1Normal 2Border 3Abnorm

1 0.9444565 0.04225911 0.013284361

2 0.9444565 0.04225911 0.013284361

3 0.9444565 0.04225911 0.013284361

4 0.8943660 0.07930714 0.026326874

5 0.8943660 0.07930714 0.026326874

6 0.8943660 0.07930714 0.026326874

7 0.9231672 0.05813603 0.018696801

8 0.9231672 0.05813603 0.018696801

9 0.9231672 0.05813603 0.018696801

10 0.9763222 0.01815653 0.005521309

11 0.9763222 0.01815653 0.005521309

12 0.9763222 0.01815653 0.005521309

13 0.8671585 0.09895993 0.033881540

14 0.8671585 0.09895993 0.033881540

15 0.8671585 0.09895993 0.033881540

16 0.7633676 0.17037058 0.066261785

17 0.7633676 0.17037058 0.066261785

18 0.7633676 0.17037058 0.066261785

> BTR.logit$lp #linear predictor values

1 2 3 4 5 6 7 8 9 10

-0.4630592 -0.4630592 -0.4630592 0.2342498 0.2342498 0.2342498 -0.1157938 -0.1157938 -0.1157938 -1.3488684

11 12 13 14 15 16 17 18

-1.3488684 -1.3488684 0.4943191 0.4943191 0.4943191 1.1991525 1.1991525 1.1991525

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 66 / 143

§6.4.3 Breathing testing example (12)

We also fit a grouped Cox or proportional hazards model.

> BTR.ph=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat, weights=Freq,

Hess=T, method="cloglog")

> summary(BTR.ph)

Coefficients:

Value Std. Error t value

Age40to59 -0.2865 0.14264 -2.008

Smoking2Former 0.2125 0.10054 2.114

Smoking3Current 0.1088 0.07301 1.490

Age40to59:Smoking2Former 0.4333 0.18941 2.288

Age40to59:Smoking3Current 0.8379 0.16357 5.122

Intercepts:

Value Std. Error t value

1Normal|2Border 1.0477 0.0558 18.7612

2Border|3Abnorm 1.5528 0.0629 24.6916

Residual Deviance: 1559.949

AIC: 1573.949

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 67 / 143

§6.4.3 Breathing testing example (13)

We further fit a cumulative model with the extreme maximal-value

distribution.

> BTR.extm=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat,weights=Freq,

Hess=T,method="loglog")

> summary(BTR.extm)

Coefficients:

Value Std. Error t value

Age40to59 -0.8672 0.5286 -1.640

Smoking2Former 0.6755 0.2700 2.501

Smoking3Current 0.3368 0.2167 1.554

Age40to59:Smoking2Former 1.1003 0.6070 1.813

Age40to59:Smoking3Current 2.0724 0.5573 3.719

Intercepts:

Value Std. Error t value

1Normal|2Border 2.8614 0.1715 16.6838

2Border|3Abnorm 4.2761 0.2080 20.5605

Residual Deviance: 1566.335

AIC: 1580.335

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 68 / 143

§6.4.3 Breathing testing example (14)

We fit a grouped Cox or proportional hazards model using the effect

coding.

> options(contrasts = c("contr.sum", "contr.poly"))

> BTR.ph=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat, weights=Freq,

Hess=T, method="cloglog")

> summary(BTR.ph)

Coefficients:

Value Std. Error t value

Age1 -0.06862 0.03421 -2.00575

Smoking1 -0.31898 0.05366 -5.94408

Smoking2 0.11022 0.04961 2.22168

Age1:Smoking1 0.21188 0.05361 3.95205

Age1:Smoking2 -0.00481 0.04960 -0.09699

Intercepts:

Value Std. Error t value

1Normal|2Border 0.8720 0.0353 24.7072

2Border|3Abnorm 1.3771 0.0441 31.2213

Residual Deviance: 1559.949

AIC: 1573.949

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 69 / 143

§6.4.3 Breathing testing example (15)

We finally fit a cumulative model with the extreme maximal-value

distribution using the effect coding.

> options(contrasts = c("contr.sum", "contr.poly"))

> BTR.extm=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat,weights=Freq,

Hess=T,method="loglog")

> summary(BTR.extm)

Coefficients:

Value Std. Error t value

Age1 -0.09523 0.1053 -0.904

Smoking1 -0.86618 0.1854 -4.671

Smoking2 0.35943 0.1361 2.642

Age1:Smoking1 0.52877 0.1854 2.852

Age1:Smoking2 -0.02136 0.1361 -0.157

Intercepts:

Value Std. Error t value

1Normal|2Border 2.4288 0.1054 23.0536

2Border|3Abnorm 3.8434 0.1573 24.4373

Residual Deviance: 1566.335

AIC: 1580.335

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 70 / 143

§6.4.4 Motivating example for sequential models

In many applications the ordering of the response categories is due to

a sequential mechanism. The categories are ordered since they can be

reached only successively.

Example: Tonsil size. Children have been classified according to their

tonsil size and whether or not they are carriers of Streptococcus pyogenes.

Table 6: Tonsil size and Streptococcus pyogenes (Holmes & Williams, 1954)

Present but

not enlarged

Enlarged

Greatly

enlarged

Carriers 19 29 24

Noncarriers 497 560 269

It may be assumed that tonsil size always starts in the normal state “present but not

enlarged” (category 1). If the tonsils grow abnormally, they may become “enlarged”

(category 2); if the process does not stop, tonsils may become “greatly enlarged” (category

3). But in order to get greatly enlarged tonsils, they first have to be enlarged for the

duration of the intermediate state “enlarged.”

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 71 / 143

§6.4.4 Sequential models (1)

Let latent variables Ur , r = 1, · · · , q with q = k − 1 have the linear

form

Ur = −xTγ + εr , where εr has cdf F .

We assume the ordinal response variable Y is determined by Ur ’s in

the following sequential way:

Y = 1 ⇔ U1 ≤ θ1

Y = 2 given Y ≥ 2 ⇔ U2 ≤ θ2; and in general

Y = r given Y ≥ r ⇔ Ur ≤ θr ; or equivalently

Y > r given Y ≥ r ⇔ Ur > θr , r = 1, · · · , q (9)

The sequential mechanism (9) models the transition from category r

to category r + 1 given that category r is reached.

The main difference to the threshold approach used in the cumulative

model is the conditional modelling of the transitions. The sequential

mechanism assumes a binary decision in each step.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 72 / 143

§6.4.4 Sequential models (2)

The sequential response mechanism (9) combined with the linear

form of the latent variable

Ur = −xTγ + εr , where εr has cdf F ,

leads to the sequential model with cdf F

P(Y = r |Y ≥ r , x) = P(Ur ≤ θr ) = F (θr + xTγ), r = 1, · · · , k,

(10)

where θk = +∞. It follows that

P(Y = r |x) = P(Y = r |Y ≥ r , x) · P(Y ≥ r |x)

= F (θr + x

Tγ) · P(Y > r − 1|Y ≥ r − 1, x) ·

P(Y > r − 2|Y ≥ r − 2, x) · · ·P(Y > 1|Y ≥ 1, x)

= F (θr + x

Tγ)[1− F (θr−1 + xTγ)] · · · [1− F (θ1 + xTγ)]

= F (θr + x

Tγ)

r−1∏

i=1

[1− F (θi + xTγ)], r = 1, · · · , k; Note ∏0i=1[·] = 0

Model (10) is also called the continuation ratio model.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 73 / 143

§6.4.4 Sequential models (3)

Note that there is no need to impose any order restriction on

θ1, · · · , θq in (10).

Now specify a cdf F will give a specific sequential model.

1 Choosing F (x) = (1 + e−x)−1 gives the sequential logit model

P(Y = r |Y ≥ r , x) = exp(θr + x

Tγ)

1 + exp(θr + xTγ)

, r = 1, · · · , k

which is equivalent to log

{

P(Y = r |x)

P(Y > r |x)

}

= θr + x

Tγ.

2 Choosing F (x) = 1− exp(−ex), the sequential model has the form

P(Y = r |Y ≥ r , x) = 1− exp

(

−eθr+xTγ

)

, r = 1, · · · , k (11)

which is equivalent to

log

[

− log

{

P(Y > r |x)

P(Y ≥ r |x)

}]

= θr + x

Tγ, r = 1, · · · , k. (12)

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 74 / 143

§6.4.4 Sequential models (4)

It is worth to note that (12) is equivalent to the cumulative model

with F (x) = 1− exp(−ex). This means (12) is a special parametric

form of the grouped Cox model. This equivalence follows by the

reparameterisation:

θr = log{e θ˜r − e θ˜r−1}, r = 1, · · · , k − 1.

Note θ˜0 = −∞ and θ1 = θ˜1. Further θ˜r = log

(

r∑

i=1

eθi

)

.

Substituting θr = log{e θ˜r − e θ˜r−1} into (12), one obtains

− logP(Y > r |x) + logP(Y > r − 1|x) = e θ˜r+xTγ − e θ˜r−1+xTγ , r = 1, · · · , q

⇒ − logP(Y > r |x) = e θ˜r+xTγ , r = 1, · · · , q

⇒ P(Y ≤ r |x) = 1− exp

(

−e θ˜r+xTγ

)

which is a grouped Cox model.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 75 / 143

§6.4.4 Sequential models (5)

3 Choosing exponential cdf F (x) = 1− e−x , the exponential

sequential model becomes

P(Y = r |Y ≥ r , x) = 1− e−(θr+xTγ)

⇔ − log

{

P(Y>r |x)

P(Y≥r |x)

}

= θr + xTγ, r = 1, · · · , q.

Generalised sequential models:

If assuming θr = δr0 + zTδr , where δr = (δr1, · · · , δrm)T is a

category-specific vector parameter, then one gets the Generalised

sequential model :

P(Y = r |Y ≥ r , x, z) = F (δr0 + zTδr + xTγ) (13)

Remark: The effect of z is non-homogeneous over the categories, while

the effect of x is homogeneous.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 76 / 143

§6.4.4 Link functions and design matrices in sequential models

Response and link functions may be derived directly from model (10).

The link function g = (g1, · · · , gq)T is given by

gr (pi1, · · · , piq) = F−1

(

pir

1− pi1 − · · · − pir−1

)

, r = 1, · · · , q

and the response function h = (h1, · · · , hq)T has the form

hr (η1, · · · , ηq) = F (ηr )

r−1∏

i=1

(1− F (ηi )), r = 1, · · · , q.

For the sequential logit model, with r = 1, · · · , q

gr (pi1, · · · , piq) = log pir

1−pi1−· · ·−pir−1 and hr (η1, · · · , ηq) = e

ηr

r−1∏

i=1

(1 + eηi )−1.

In regard to the design matrices there is no difference between

sequential and cumulative models. Thus the design matrices from

§3.3.3 apply.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 77 / 143

§6.4.4 Strict stochastic ordering (optional)

Strict stochastic ordering is a concept generalizing the proportional

odds for simple cumulative model (7)

∆c(x1, x2) = F

−1 (P(Y ≤ r |x1))− F−1 (P(Y ≤ r |x2)) .

If ∆c(x1, x2) = γT (x1 − x2), we say strict stochastic ordering holds.

Then for general cumulative model (8) where w = x is set and xTγ is

omitted, one can test strict stochastic ordering by testing

β1 = β2 = · · · = βq.

The concept can be generalized to the sequential model (10) as well.

Let

∆s(x1, x2) = F

−1 (P(Y = r |Y ≥ r , x1))−F−1 (P(Y = r |Y ≥ r , x2)) .

If ∆s(x1, x2) = γT (x1 − x2), we say strict stochastic ordering holds

for the sequential model.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 78 / 143

§6.4.4 Two-step models (optional) (1)

It may be more appropriate to divide the ordered categories into sets

of categories with very homogeneous responses in the first step, then

model the categories within each set separately in the second step.

This is the two-step modelling approach.

Example: Rheumatoid arthritis. Mehta, Patel & Tsiatis (1984) analysed

data of patients with acute rheumatoid arthritis. A new agent was compared

with an active control, and each patient was evaluated on a 5-point

assessment scale.

Table 7: Clinical trial of a new agent and an active control

Global assessment

Drug

Much

improved

Improved

No

change

Worse

Much

worse

New agent 24 37 21 19 6

Active control 11 51 22 21 7

Global assessment may be subdivided into“improvement”, “no change” and

“worse” first. Then “improvement” is split up into “much improved” and

“improved” and the “worse” category is split into “worse” and “much worse.”

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 79 / 143

§6.4.4 Two-step models (optional) (2)

Specifically, let the categories 1, · · · , k be subdivided into t basic sets

S1, · · · ,St , where Sj = {mj−1 + 1, · · · ,mj}, m0 = 0 and mt = k.

In Step 1: Y ∈ Sj ⇔ θj−1 < U0 ≤ θj , j = 1, · · · , t.

Assume U0 = −xTγ0 + ε.

In Step 2: Y = R | Y ∈ Sj ⇔ θj ,r−1 < Uj ≤ θjr .

Assume Uj = −xTγ j + εj . Also ε, εj ’s are iid with cdf F .

Then the two-step model becomes

P(Y ∈ Tj |x) = F (θj + xTγ0), P(Y ≤ r |Y ∈ Sj , x) = F (θjr + xTγ j)

(14)

where Tj = S1 ∪ · · · ∪ Sj , θ1 < · · · < θt−1, θt =∞;

θj ,mj−1+1 < · · · < θj ,mj−1 , θj ,mj =∞, j = 1, · · · , t.

Since cumulative mechanisms are used in both steps, (14) is called a

two-step cumulative model. Two-step sequential model can be

similarly defined.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 80 / 143

§6.4.4 Alternative approaches (optional)

1 Anderson (1984).

P(Y = r |x) = e

βr0−φrβT x

1 +

∑q

i=1 e

βi0−φiβT x

, r = 1, · · · , q;

where 1 = φ1 > · · · > φk = 0.

2 Williams and Grizzle (1972).

k∑

r=1

srP(Y = r |x) = xTβ

where s1, · · · , sk are some given scores for the categories.

3 Adjacent categories logit model (Agresti 1984).

log

P(Y = r |x)

P(Y = r − 1|x) = x

Tβr

⇔ P(Y = r |Y ∈ {r , r + 1}, x) = F (xTβr )

where F is the logistic cdf.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 81 / 143

§6.5 Statistical inference for MGLM overview

1 MLE

2 Testing of linear hypotheses

3 Goodness of fit (model adequacy) statistics

4 Power-divergence statistics

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 82 / 143

§6.5.1 Maximum likelihood estimation in MGLM (1)

An extension of the exponential family is the multivariate

exponential family, having the pdf

f (yi |θi , φ, ωi ) = exp

{

yTi θi − b(θi )

φ

· ωi + c(yi , φ, ωi )

}

For q × 1 vector observations y1, · · · , yn which are suppose to be

independent of each other, let µi = E (yi |xi ), i = 1, · · · , n, with xi

being the covariate vector.

The structural assumption of MGLM is that µi ’s depend on the

linear predictor ηi = Ziβ in the form

µi = h(ηi ) = h(Ziβ) = [h1(Ziβ), · · · , h1(Ziβ)]T , i = 1, · · · , n

where

the response function h : S → M is defined on S ⊂ Rq, taking values

in M ⊂ Rq;

Zi is a q × p design matrix for unit i ;

β = (β1, · · · , βp)T is an unknown parameter vector from B ⊂ Rp.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 83 / 143

§6.5.1 Maximum likelihood estimation in MGLM (2)

The MLE of β may be derived in analogy to the one-dimensional case.

The log-likelihood kernel for observation yi is

`i (µi ) =

yTi θi − b(θi )

φ

· ωi , where θi = θ(µi ) (15)

Using the relation µi = h(ηi ) = h(Ziβ), it can be found that the

score function is s(β) =

∂`

∂β

=

n∑

i=1

si (β), with

si (β) = Z

T

i Di (β)Σ

−1

i (β)[yi − µi (β)], (16)

where Di (β) =

∂hT (ηi )

∂ηi

is the derivative matrix of hT (ηi ) evaluated

at ηi = Ziβ, which is also called the Jacobian (matrix). And

Σi (β) = cov(yi ).

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 84 / 143

§6.5.1 Maximum likelihood estimation in MGLM (3)

An alternative form for si (β) is

si (β) = Z

T

i Wi (β)

∂g(µi )

∂µTi

[yi − µi (β)], where

Wi (β) = Di (β)Σ

−1

i (β)Di (β) =

{

∂g(µi )

∂µTi

Σi (β)

∂gT (µi )

∂µi

}−1

(17)

which approximately equals (cov(g(yi ))

−1.

The expected Fisher information matrix is

F (β) = cov(s(β)) =

n∑

i=1

ZTi Wi (β)Zi .

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 85 / 143

§6.5.1 Maximum likelihood estimation in MGLM (4)

In matrix form the above may be rewritten as

s(β) = ZTD(β)Σ−1(β)[y− µ(β)]

F (β) = ZTW (β)Z .

where y = (yT1 , · · · , yTn )T , µ(β) = (µ1(β)T , · · · ,µn(β)T )T ,

Σ(β) = diag (Σi (β)), W (β) = diag (Wi (β)), D(β) = diag (Di (β)),

all being block diagonal, and the total design matrix

Z =

[

ZT1 , · · · ,ZTn

]T

.

In the situation of grouped data, yi is the mean vector over ni

observations and Σi (β) is replaced by n

−1

i Σi (β).

Under regularity conditions,

MLE βˆ

a∼ N(β,F−1(βˆ)).

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 86 / 143

§6.5.1 Numerical computing in MGLM

This is the same as in the univariate case, except using the

multivariate versions.

1 Working or pseudo observation vector

y˜(β) = (y˜1(β)

T , · · · , y˜n(β)T )T , where

y˜i (β) = Ziβ +

(

D−1i (β)

)T

[yi − µi (β)]

is an approximation for g(µi (β)).

2 The IRWLS estimate is

βˆ

(k+1)

=

(

ZTW (βˆ

(k)

)Z

)−1

ZTW (βˆ

(k)

)y˜(β(k))

which is equivalent to the Fisher scoring iteration

βˆ

(k+1)

= βˆ

(k)

+

(

ZTW (βˆ

(k)

)Z

)−1

s(β(k)).

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 87 / 143

§6.5.2 Testing linear hypotheses in MGLM

The linear hypotheses

H0 : Cβ = ξ vs. H1 : Cβ 6= ξ

can be tested in the same way as in the univariate GLM, except replacing

the score and Fisher information by their multivariate versions.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 88 / 143

§6.5.2 Goodness-of-fit statistics in MGLM (1)

Goodness of fit (or model adequacy) of the MGLM can be assessed based

on the Pearson statistics and the deviance (the data need be grouped as

much as possible).

General Pearson Statistics:

χ2 =

g∑

i=1

(yi − µˆi )TΣ−1i (βˆ)(yi − µˆi )

In the case of multi-categorical response variable with multinomial

distribution niyi ∼ M(ni ,pii ), we have

χ2 =

g∑

i=1

χ2p(yi , pˆii ) =

g∑

i=1

ni

k∑

j=1

(yij − pˆiij)2

pˆiij

with yik = 1− yi1 − · · · − yiq and pˆiik = 1− pˆii1 − · · · − pˆiiq.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 89 / 143

§6.5.2 Goodness-of-fit statistics in MGLM (2)

Deviance or likelihood ratio (LR) statistics:

D = −2

g∑

i=1

{`i (pˆii )− `i (yi )}

.

For multinomial data,

D = 2

g∑

i=1

χ2D(yi , pˆii ) = 2

g∑

i=1

ni

k∑

j=1

yij log

yij

pˆiij

Under regularity conditions,

χ2

a∼ χ2(g(k − 1)− p) and D a∼ χ2(g(k − 1)− p)

where p is the number of estimated parameters.

For sparse data with small ni , one should use alternative asymtotics.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 90 / 143

§6.5.3 Power-divergence family (1)

For categorical data a more general single-parameter family of

goodness-of-fit statistics is the power-divergence family, introduced by

Cressie and Read (1984, JRSSB).

The power-divergence statistic with parameter λ ∈ R is given by

Sλ =

g∑

i=1

SDλ(yi , pˆii )

where the sum of deviations over observations at yi is

SDλ(yi , pˆii ) =

2ni

λ(λ+ 1)

k∑

j=1

yij

[(

yij

pˆiij

)λ

− 1

]

, −∞ < λ <∞ (18)

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 91 / 143

§6.5.3 Power-divergence family (2)

1 At λ = 1, S1 =

g∑

i=1

ni

k∑

j=1

(yij − pˆiij)2

pˆiij

, the usual Pearson statistic.

2 At λ→ 0, S0 = 2

g∑

i=1

ni

k∑

j=1

yij log

yij

pˆiij

, the likelihood ratio statistic.

3 At λ→ −1, S−1 = 2

g∑

i=1

ni

k∑

j=1

pˆiij log

pˆiij

yij

, Kullback’s minimum

discrimination information statistic.

4 At λ = −2, S−2 =

g∑

i=1

ni

k∑

j=1

(yij − pˆiij)2

yij

, Neyman’s minimum

modified χ2-statistic.

5 At λ = −12 , S− 12 =4

g∑

i=1

k∑

j=1

(

√

niyij −

√

ni pˆiij)

2, Freeman-Tukey statistic.

In general, λ ∈ [−1, 2] is recommended.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 92 / 143

§6.5.3 Asymptotic properties under classical “fixed cells” assumptions

Classical assumptions in asymptotic theory for grouped data imply

1 a fixed number of groups (g is fixed)

2 increasing sample sizes ni →∞, i = 1, · · · , g , such that nin → λi for

fixed proportions λi > 0, i = 1, · · · , g .

3 a fixed “number of cells” k in each group

4 a fixed number of parameters.

If these assumptions hold and in addition, the model under consideration is

correct, then

Sλ

a∼ χ2(g(k − 1)− p), p is the # of estimated parameters.

And under local alternative hypothesis, the limit distribution of Sλ is

non-central χ2.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 93 / 143

§6.5.3 Asymptotic properties under sparseness and “increasing-cells”

assumptions

If several explanatory variables are considered, the number of

observations for a fixed explanatory variable x is often small and the

usual asymptotic machinery will fail.

Under such sparseness conditions, it may be assumed that with

increasing sample size n→∞ the number of groups (values of the

explanatory variables) is also increasing with g →∞, resulting in the

“increasing-cells” setting.

Now Sλ is no longer χ

2-distributed in the limit. Rather Sλ has an

asymptotic normal distribution under H0 that the model holds.

Details are not pursued here.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 94 / 143

§6.6 Multivariate models for correlated responses

So far we have been assuming there is only one response variable,

possibly multi-categorical, being under consideration.

Now we consider the situations where a vector of correlated or

clustered response variables is observed, together with covariates, for

each unit in the sample.

These situations often happen in longitudinal studies, repeated

measurements studies, and grouped (clustered) studies, etc.

When the response variables are approximately Gaussian, multivariate

linear models are commonly used for the underlying data analysis,

which has been well established.

The situations become more difficult when the response variables are

discrete, categorical or mixed discrete/categorical/continuous.

Three main approaches will be introduced here:

1 conditional models

2 marginal models

3 random effects models

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 95 / 143

§6.6.1 Conditional models: Asymmetric models (optional)1

Asymmetric models

In many applications, the components of a response vector are

ordered in a way that some components are prior to the other

components, e.g. if they refer to events that take place earlier.

Consider the simplest case of a response vector Y = (Y1,Y2), with

categorical components Y1 ∈ {1, · · · , k1} and Y2 ∈ {1, · · · , k2}. Let

Y2 refer to events that may be considered conditional on Y1.

An example: response years in school (Y1) and the statement about

happiness (Y2).

Then Y1 may be modelled to be dependent on the explanatory

variables x, using a model introduced in §3.2 and §3.3.

And Y2 may be modelled based on Y1 and x, also using a model

introduced in §3.2 and §3.3.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 96 / 143

§6.6.1 Conditional models: Asymmetric models (optional)2

Asymmetric models

In general, consider m categorical responses Y1, · · · ,Ym, where Yj

depends on Y1, · · · ,Yj−1 but not on Yj+1, · · · ,Ym. Models that

make use of this dependence structure are based on the decomposition

P(Y1, · · · ,Ym|x) = P(Y1|x) · P(Y2|Y1, x) · · ·P(Ym|Y1, · · · ,Ym−1, x)

(19)

Simple models arise if each component in (19) is specified by a GLM:

P(Yj = r |Y1, · · · ,Yj−1, x) = hj(Zjβ) (20)

where Zj = Z (Y1, · · · ,Yj−1, x) is a function of previous components

Y1, · · · ,Yj−1 and the explanatory variables x. Models of form (20)

are sometimes called data-driven ones.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 97 / 143

§6.6.1 Conditional models: Asymmetric models (optional)3

Markov-type transition models follow from the additional

assumption P(Yj = r |Y1, · · · ,Yj−1, x) = P(Yj = r |Yj−1, x).

One such example for binary responses is

log

P(y1 = 1|x)

P(y1 = 0|x) = β01 + z

T

1 β1

log

P(yj = 1|y1, · · · , yj−1, x)

P(yj = 0|y1, · · · , yj−1, x) = β0j + z

T

j βj + yj−1γj , j = 2, · · · , k.

It is called a regressive logistic model when

log

P(yj = 1|y1, · · · , yj−1, x)

P(yj = 0|y1, · · · , yj−1, x) = β0 + z

T

j β + γ1y1 + · · ·+ γj−1yj−1.

Models of the type (20) may be embedded into the GLM framework,

meaning some built-in functions in R (e.g. glm()) may be used to do

the computing.

In general, however, you need to write your own program to do the

analysis.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 98 / 143

§6.6.1 Conditional models: Asymmetric models (optional)4

Example: Reported happiness. Clogg (1982) investigated the

association between gender (x), years in school (Y1), and reported

happiness (Y2). The data are given in the following. Since x and Y1 are

prior to the statement about happiness, Y2 is modelled conditionally on Y1

and x .

Table 8: Cross classification of gender, reported happiness, and years of schooling

Years of school completed

Gender Reported happiness < 12 12 13-16 ≥ 17

Male Not too happy 40 21 14 3

Pretty happy 131 116 112 27

Very happy 82 61 55 27

Female Not too happy 62 26 12 3

Pretty happy 155 156 95 15

Very happy 87 127 76 15

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 99 / 143

§6.6.1 Conditional models: Symmetric models (optional)1

Symmetric models

Suppose the response vector Y = (y1, · · · , ym).

If there is no natural ordering among y1, · · · , ym, or if one does not

want to use this ordering, models that treat response components in a

symmetric way are more sensible. An example is Visual Impairment

Study seen in Chapter 1.

Now focus on the case where all y1, · · · , ym are binary. (General

cases can be handled similarly.)

Symmetric conditional models can be developed by specifying a

conditional distribution to be used:

P(yj = 1|yk , k 6= j ; xj), j = 1, · · · ,m (21)

For binary responses such conditional distributions uniquely determine

the joint distribution.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 100 / 143

§6.6.1 Example: Visual impairment study

Table 9: Visual impairment data, from Liang, Zeger & Qaqish (1992)

White Black

Visual Age

impairment 40-50 51-60 61-70 70+ 40-50 51-60 61-70 70+ Total

Left eye

Yes 15 24 42 139 29 38 50 85 422

No 617 557 789 673 750 574 473 344 4777

Right eye

Yes 19 25 48 146 31 37 49 93 448

No 613 556 783 666 748 575 474 336 4751

Vector binary response variable (y1, y2), where y1 = 1 if left-eye

impaired, 0 otherwise; y2 = 1 if right-eye impaired, 0 otherwise.

Covariates: Age (yrs.), Race (W or B).

Aim: find the effect of race and age on visual impairment.

Complication: y1 and y2 are correlated.

Methods: multivariate models for correlated responses; conditional

models; asymmetric models, marginal models, GEE, etc..

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 101 / 143

§6.6.1 Conditional models: Symmetric models (optional)2

Symmetric models

Symmetric logistic models are a natural choice for (21):

pi = P(yj = 1|yk , k 6= j ; xj) = h(α(wj ;θ) + xTj βj), j = 1, · · · ,m

(22)

where h(t) =

et

1 + et

is the logistic cdf; and α(·) is some function of a

parameter θ and wj =

∑

k 6=j yk .

When k = 2,

pi = P(yj = 1|yk , k 6= j ; xj) = h(θ0+θ1yk +xTj βj), j , k = 1, 2. (23)

Full likelihood estimation procedure for (22) or (23) may be

computationally cumbersome, because of the normalising constant

involved in the joint pmf.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 102 / 143

§6.6.1 Conditional models: Symmetric models (optional)3

Instead, a quasi-likelihood approach (Conolly and Liang, 1988) may

be used, with an independent working quasi-likelihood and

quasi-score function for each cluster i(= 1, · · · , n):

Li (β,θ) =

m∏

j=1

pi

yij

ij (1− piij)1−yij

Si (β,θ) =

m∑

j=1

∂piij

∂(βT ,θT )T

σ−2ij (yij − piij(β,θ))

where yi = (yi1, · · · , yij , · · · , yim)T are the responses in cluster i ,

piij(β,θ) = P(yij = 1|·) is defined by (22), and

σ2ij = piij(β,θ)(1− piij(β,θ)).

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 103 / 143

§6.6.1 Conditional models: Symmetric models (optional)4

Denoting Mi = diag

{

∂pii1

∂(βT ,θT )T

, · · · , ∂piim

∂(βT ,θT )T

}

,

Σi = diag{σ2i1, · · · , σ2im} and pi = (pii1, · · · , piim)T , we can rewrite

Si (β,θ) in matrix form

Si (β,θ) = MiΣ

−1

i (yi − pii )

which is a multivariate extension of the quasi-score.

Roots (βˆ, θˆ) of the resulting generalised estimating equation (GEE)

S(β,θ) =

n∑

i=1

Si (β,θ) = 0

are consistent & asymptotically normal under regularity assumptions:

(βˆ

T

, θˆ

T

)T

a∼ N((βT ,θT )T , Fˆ−1Vˆ Fˆ−1)

with Fˆ =

∑n

i=1 Mˆi Σˆ

−1

i Mˆi and Vˆ =

∑n

i=1 Sˆi Sˆ

T

i .

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 104 / 143

§6.6.1 Two drawbacks of the conditional models

1 They measure the effect of x on a binary component yj conditional on

incorporating into the effects of other yk , k 6= j . Thus the model is

not able to provide prediction based on x alone.

2 Interpretation of the effects depends on the dimension of y.

Both drawbacks can be avoided by using a marginal modelling approach.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 105 / 143

§6.6.2 Marginal models

In many situations the primary interest is to analyse the marginal

mean of the response given the covariates. The association between

the responses is often of secondary interest. An example is the visual

impairment study.

Marginal models were first proposed by Liang & Zeger (1986) and

Zeger & Liang (1986) in the context of longitudinal data with many

short time series. Their modelling and estimation approach is based

on GEE and has subsequently been modified and further developed.

Focus in this subsection is on modelling and estimating marginal

means of correlated and categorical response data by the first-order

generalised estimating equations (GEE1).

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 106 / 143

§6.6.2 Marginal models for correlated univariate responses (1)

In marginal models, the effects of covariates on responses and the

association between responses is modelled separately.

Let yi = (yi1, · · · , yimi )T be the vector of responses, and

xTi = (x

T

i1, · · · , xTimi )

be the vector of covariates from cluster i , i = 1, · · · , n. Here the

cluster size mi may vary with the cluster index i .

Response observations within the same cluster are correlated.

Response observations between different clusters are assumed

independent of each other.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 107 / 143

§6.6.2 Marginal models for correlated univariate responses (2)

Specification of a marginal model is as following

1 The marginal means of yij , j = 1, · · · ,mi , are assumed correctly

specified by common univariate response models:

µij(β) = E (yij |xij) = h(zTij β) (24)

where h(·) is a response function, e.g. a logit function, and zij is an

appropriate design vector.

2 The marginal variance of each yij is specified as a function of µij :

σ2ij = var(yij |xij) = v(µij)φ (25)

where v(·) is the variance function of known form.

3 The correlation between yij and yik is a function of µij = µij(β),

µik = µik(β) and perhaps of additional association parameters α:

corr(yij , yik) = c(µij , µik ,α) (26)

with the function c(·, ·, ·) being of known form.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 108 / 143

§6.6.2 Marginal models for correlated univariate responses (3)

Remarks:

1 Parameters (β,α) are the same for each cluster. Hence marginal

models are appropriate for analysing population-averaged effects.

2 Marginal effects β can be consistently estimated even if the correlation

function is incorrectly specified. However, the efficiency of the

estimator βˆ can be compromised.

Since the primary scientific objective is often the regression

relationship, it is important to correctly specify the marginal mean

structure (24), while c(µij , µik ,α) only needs to be a working

correlation for the association between yij and yik . Together with

(25) a working covariance matrix cov(yi ) = Σi (β,α) can be

obtained, where an additional dispersion parameter φ is involved but

suppressed for presentation simplicity.

Two main approaches for specifying the working correlations or

covariances have been considered in the literature.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 109 / 143

§6.6.2 Marginal models for correlated univariate responses (4)

First approach: The variance structure (25) is supplemented by a

working correlation matrix Ri (α), so that the working covariance

matrix is of the form

Σi (β,α) = C

1/2

i (β)Ri (α)C

1/2

i (β),

where Ci (β) = diag [var(yij |xij)] = diag{σ2i1, · · · , σ2imi}. Common choices

for Ri (α):

1 working independence model: Ri (α) = I , the identity matrix.

2 equicorrelation (or exchangeable) model: corr(yij , yik) = α for all

j 6= k, i.e. α reduces to be a scalar, and Ri (α) =

1 α · · · α

α 1 · · · α

...

...

. . .

...

α α · · · 1

.

3 Ri (α) completely unspecified, except being positive definite, i.e.

αjk = corr(yij , yik), j < k.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 110 / 143

§6.6.2 Marginal models for correlated univariate responses (5)

Second approach: For binary responses the odds ratio is an alternative

measure of association that is easier to interpret and has some desirable

properties.

The odds ratio for yij , yik , j , k = 1, · · · ,m, j 6= k, is defined as

γijk =

P(yij = 1, yik = 1) · P(yij = 0, yik = 0)

P(yij = 1, yik = 0) · P(yij = 0, yik = 1)

From this definition, the probability P(yij = yik = 1), denoted as pii11,

can be shown to satisfy

pii11 = E (yijyik) =

{

1−(piij+piik )(1−γijk )−s(piij ,piik ,γijk)

2(γijk−1) if γijk 6= 1

piijpiik if γijk = 1

(27)

with

s(piij , piik , γijk) =

(

[1−(piij +piik)(1−γijk)]2 − 4(γijk−1)γijkpiijpiik

)1/2

.

It can be seen that the covariance matrix of yi = (yi1 · · · , yimi )T can

be expressed as a function of (piij , piik , γijk).

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 111 / 143

§6.6.2 Marginal models for correlated univariate responses (6)

One advantage of the odds ratio is that it is unconstrained, being able

to take any positive values. In contrast,

corr(yij , yik) =

pii11 − piijpiik√

piij(1− piij)piik(1− piik)

,

the correlation between yij and yik is constrained (i.e. its possible

values may all fall into a subset of [−1, 1]), because the involved pii11

is constrained by

max(0, piij + piik − 1) ≤ pii11 ≤ min(piij , piik).

Note

pii11 = P(yij = 1) + P(yik = 1)−P(yij = 1 or yik = 1) ≥ piij +piik − 1.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 112 / 143

§6.6.2 Marginal models for correlated univariate responses (7)

To reduce the number of unknown parameters involved in the odds

ratio, we assume γijk = γijk(α), a function of a vector parameter α.

Common choices of γijk(α):

1 γijk = γ, for all i , j , k.

2 log γijk = α

Tωijk .

Using the relation (27), cov(yi ) = Σi (β,α) is also a function of β

and α.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 113 / 143

§6.6.2 Marginal models for correlated univariate responses (8)

Some examples of marginal models:

I Continuous responses:

µij(β) = E (yij |xij) = zTij β; var(yij |xij) = φ = σ2; corr(yij , yik) = αjk .

II Binary responses:

µij(β) = piij(β) = P(yij = 1|xij), log piij(β)

1− piij(β) = z

T

ij β;

var(yij |xij) = piij(β)(1− piij(β));

corr(yij , yik) = 0 (independence struc.) or γijk = α (equal odds ratio).

III Count data:

logµij(β) = log E (yij |xij) = zTij β;

var(yij |xij) = µij(β)φ;

corr(yij , yik) = α (equicorrelation).

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 114 / 143

§6.6.2 GEE approach in statistical inference (1)

For cluster i = 1, · · · , n, let yi = (yi1, · · · , yimi )T , xi = (xTi1, · · · , xTimi )T ,

µi (β) = (µi1(β1), · · · , µimi (βmi ))T and Σi (β,α) be defined as before.

In linear models for Gaussian responses, there is no problem to

perform maximum likelihood estimation, since specification of means

and covariances determines the likelihood function.

This is not the case with non-Gaussian data. Therefore a generalised

estimating approach is proposed.

Keeping the association parameter α and φ, if present, fixed for the

moment, the generalised estimating equation (GEE) for effect β is

Sβ(β,α) =

n∑

i=1

ZTi Di (β)Σ

−1

i (β,α)(yi − µi (β)) = 0, (28)

with mi -row design matrix Zi = (zi1, · · · , zimi )T and diagonal

matrices Di (β) = diag{Dij(β)}, Dij(β) = ∂h∂ηij evaluated at ηij = zTij β.

The middle-part of (28) is a multivariate quasi-score function.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 115 / 143

§6.6.2 GEE approach in statistical inference (2)

A GEE estimate βˆ is computed by iterating a modified Fisher scoring

algorithm w.r.t. β and estimation of α (and φ):

(i) Given current estimates αˆ (and φˆ), the GEE (28) for βˆ is solved by

the iterations

βˆ

(k+1)

= βˆ

(k)

+ (Fˆ (k))−1Sˆβ(βˆ

(k)

, αˆ), k = 0, 1, 2, · · · ,

with

Fˆ (k) =

n∑

i=1

ZTi Di (βˆ

(k)

)Σ−1i (βˆ

(k)

, αˆ)Di (βˆ

(k)

)Zi

being the observed quasi-information matrix.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 116 / 143

§6.6.2 GEE approach in statistical inference (3)

(ii) If α is unknown, it can be consistently estimated by a method of

moments based on Pearson residuals rˆij =

yij − µˆij√

v(µˆij)

.

The dispersion parameter φ is estimated consistently by

φˆ =

1

N − p

n∑

i=1

mi∑

j=1

rˆ 2ij , with N =

∑n

i=1 mi and p = dim(β).

Estimation of α depends on the choice of Ri (α). For exchangeable

correlation matrix Ri (α) with dim(α) = 1,

αˆ =

[

φˆ

{

n∑

i=1

1

2

mi (mi − 1)− p

}]−1 n∑

i=1

∑

k>j

rˆik rˆij .

An unspecified working correlation matrix R can be estimated by

Rˆ =

1

nφˆ

n∑

i=1

C

− 12

i (yi − µˆi )(yi − µˆi )TC

− 12

i

if all cluster sizes mi = m and m << n.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 117 / 143

§6.6.2 GEE approach in statistical inference (4)

Cycling between Fisher scoring steps for β and consistent estimation

of α and φ leads to a consistent estimation of β.

Alternatively, α (and possible φ) can be estimated by simultaneously

solving an additional estimating equation. Details are not pursued

here.

Under regularity conditions, the GEE estimator βˆ satisfies

βˆ

a∼ N(β,F−1VF−1)

with F =

∑n

i=1 Z

T

i DiΣ

−1

i DiZi and V =

∑n

i=1 Z

T

i DiΣ

−1

i SiΣ

−1

i DiZi ,

where Si is the true covariance matrix of yi .

cov(βˆ) is approximated by the “sandwich matrix”:

Aˆ = Fˆ−1

{

n∑

i=1

ZTi Dˆi Σˆ

−1

i (yi − µˆi )(yi − µˆi )T Σˆ−1i DˆiZi

}

Fˆ−1

cov(Sβ(β,α)) = V and Sβ(β,α)

a∼ N(0,V ).

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 118 / 143

§6.6.2 Visual impairment example (1)

We use GEE approach to fit a marginal model to the visual impairment

data. R packages gee with command gee() and geepack with command

geeglm() can be used. (There are a few problems with both packages).

library(gee); library(geepack)

VI.dat <- read.csv("D:/MAST90139/Visual-impairment.csv"); VI.dat

ID Yes No Age Race

1 1 15 617 40to50 White

2 1 19 613 40to50 White

3 2 24 557 51to60 White

4 2 25 556 51to60 White

5 3 42 789 61to70 White

6 3 48 783 61to70 White

7 4 139 673 70+ White

8 4 146 666 70+ White

9 5 29 750 40to50 Black

10 5 31 748 40to50 Black

11 6 38 574 51to60 Black

12 6 37 575 51to60 Black

13 7 50 473 61to70 Black

14 7 49 474 61to70 Black

15 8 85 344 70+ Black

16 8 93 336 70+ Black

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 119 / 143

§6.6.2 Visual impairment example (2)

VI1e<- gee(cbind(Yes, No) ~ Age + Race, id = ID,

data = VI.dat, family = binomial, corstr = "exchangeable")

summary(VI1e)

GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA

gee S-function, version 4.13 modified 98/01/27 (1998)

Model:

Link: Logit

Variance to Mean Relation: Binomial

Correlation Structure: Exchangeable

Call:

gee(formula = cbind(Yes, No) ~ Age + Race, id = ID, data = VI.dat,

family = binomial, corstr = "exchangeable")

Summary of Residuals:

Min 1Q Median 3Q Max

15.0 28.0 39.9 58.6 145.8

Coefficients:

Estimate Naive S.E. Naive z Robust S.E. Robust z

(Intercept) -3.225 0.1125 -28.66 0.0254 -127.22

Age51to60 0.478 0.1451 3.30 0.0167 28.65

Age61to70 0.837 0.1348 6.21 0.0905 9.26

Age70+ 1.973 0.1227 16.08 0.0531 37.13

RaceWhite -0.350 0.0768 -4.56 0.0662 -5.28

Estimated Scale Parameter: 0.571

Number of Iterations: 1

Working Correlation

[,1] [,2]

[1,] 1.000 0.886

[2,] 0.886 1.000

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 120 / 143

§6.6.2 Visual impairment example (3)

Call:

gee(formula = cbind(Yes, No) ~ Age + Race, id = ID, data = VI.dat,

family = binomial, corstr = "exchangeable")

Summary of Residuals: ###Residuals calculation is not correct in gee().

Min 1Q Median 3Q Max

15.0 28.0 39.9 58.6 145.8

Coefficients:

Estimate Naive S.E. Naive z Robust S.E. Robust z

(Intercept) -3.225 0.1125 -28.66 0.0254 -127.22

Age51to60 0.478 0.1451 3.30 0.0167 28.65

Age61to70 0.837 0.1348 6.21 0.0905 9.26

Age70+ 1.973 0.1227 16.08 0.0531 37.13

RaceWhite -0.350 0.0768 -4.56 0.0662 -5.28

Estimated Scale Parameter: 0.571

Number of Iterations: 1

Working Correlation

[,1] [,2]

[1,] 1.000 0.886

[2,] 0.886 1.000

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 121 / 143

§6.6.2 Visual impairment example (4)

total=VI.dat$Yes+VI.dat$No

pearson.res1e=

(VI.dat$Yes-total*VI1e$fitted)/sqrt(total*(VI1e$fitted)*(1-VI1e$fitted))

sum(pearson.res1e^2)/11

[1] 0.5708722 #estimate of \phi based on Pearson residuals

> VI1e$scale

[1] 0.5708722

(VI1e$scale*3)^(-1)*sum(pearson.res1e[2*(1:8)-1]*pearson.res1e[2*(1:8)])

[1] 0.8861748

> VI1e$working.correlation

[,1] [,2]

[1,] 1.0000000 0.8861748

[2,] 0.8861748 1.0000000

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 122 / 143

§6.6.2 Visual impairment example (5)

VI3 <- geeglm(cbind(Yes, No)~Age + Race, family=binomial(link="logit"),

data=VI.dat, id=ID, corstr = "exchangeable", std.err="san.se")

summary(VI3)

Call:

geeglm(formula = cbind(Yes, No) ~ Age + Race, family = binomial(link = "logit"),

data = VI.dat, id = ID, corstr = "exchangeable", std.err = "san.se")

Coefficients:

Estimate Std.err Wald Pr(>|W|)

(Intercept) -3.2253 0.0254 16184.7 < 2e-16 ***

Age51to60 0.4783 0.0167 820.7 < 2e-16 ***

Age61to70 0.8374 0.0905 85.7 < 2e-16 ***

Age70+ 1.9728 0.0531 1378.3 < 2e-16 ***

RaceWhite -0.3498 0.0662 27.9 1.3e-07 ***

---

Estimated Scale Parameters: #IScale seems defined differently in gee() and geeglm()

Estimate Std.err

(Intercept) 0.000604 0.000323

Correlation: Structure = exchangeable Link = identity

Estimated Correlation Parameters:

Estimate Std.err #The estimation here is about half of that in gee().

alpha 0.441 0.274

Number of clusters: 8 Maximum cluster size: 2

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 123 / 143

§6.6.2 Visual impairment example (6)

pearson.res3=(VI.dat$Yes/total-VI3$fitted)/sqrt((VI3$fitted)*(1-VI3$fitted)/total)

phi.est=sum(pearson.res3^2)/11 #=0.5709

sum((summary(VI3)$deviance.resid)^2)/11

[1] 0.577 #very close to 0.5709.

#Using deviance residuals or Pearson residuals give similar results.

VI3$geese$gamma #How is gamma defined?

(Intercept)

0.000604

(phi.est*3)^(-1)*sum(pearson.res3[2*(1:8)-1]*pearson.res3[2*(1:8)])

[1] 0.8862 #estimated alpha

(0.577*3)^(-1)*sum(summary(VI3)$devi[2*(1:8)-1]*summary(VI3)$devi[2*(1:8)])

[1] 0.872

VI3$geese$alpha

alpha

0.4415 #approximately half of 0.8862. Then How is alpha defined here?

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 124 / 143

§6.6.2 Marginal models for correlated responses having k categories (1)

Suppose categorical responses Yij , j = 1, · · · ,mi , are observed in

cluster i , i = 1, · · · , n.

For simplicity, each Yij has the same k categories and is coded by a

dummy vector

yij = (yij1, · · · , yijq)T , q = k − 1

Let yTi = (y

T

i1, · · · , yTimi ) be observations of yij in cluster i ;

xTi = (x

T

i1, · · · , xTimi ) be the corresponding covariate observations.

For data involving categorical responses, a marginal categorical

response model can be defined for each response variable, and then

supplemented by a working association model to relate to the

responses with each other with in each cluster.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 125 / 143

§6.6.2 Marginal models for correlated responses having k categories (2)

Here we focus on ordinal responses. Other types of categorical responses

can be handled similarly.

(i) The vector of marginal means or categorical probabilities of Yij is

assumed being correctly specified by an ordinal response model :

piij(β) = (piij1(β), · · · , piijq(β))T = h(Zijβ)

with piijr = P(Yij ≤ r |xij) =

∑r

`=1 P(yij` = 1|xij), and the response

function h(·) and design matrix Zij .

(ii) The marginal covariance function of yij is given by

Σij = cov(yij |xij) = diag(piij)− piijpiTij

i.e. the covariance matrix of a multinomial random variable.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 126 / 143

§6.6.2 Marginal models for correlated responses having k categories (3)

(iii) Association between Yij and Yik is modelled by a working correlation matrix

Ri or by global cross-ratios. For example, the working matrix of

exchangeable correlations is

Ri (α) =

I Q · · · Q

QT I · · · Q

...

...

. . .

...

QT QT · · · I

where Qq×q contains α to be estimated by a method of moments.

Global cross-ratios (GCR), for each pair of categories ` and m of Yij and

Yik , are defined as

γijk (`,m) =

P(Yij ≤ `,Yik ≤ m)P(Yij > `,Yik > m)

P(Yij > `,Yik ≤ m)P(Yij ≤ `,Yik > m)

GCR can be modelled log-linearly, i.e. log(γijk(`,m)) = α`m or by a

regression model including covariate effects.

For the model specified by (i), (ii) and (iii), the involved regression and association

parameters can be estimated by a multivariate extension of the GEE approach

discussed before. Details not pursued here.

R packages multgee, geepack and repolr may be used to fit the above model.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 127 / 143

§6.6.2 Likelihood-based inference for marginal models

The GEE approach is not likelihood-based as it does not require

specification of the joint distribution of multivariate response vector

yi , i = 1, · · · , n.

Difficulty with the likelihood-based inference is due to the difficulty in

formulating this joint distribution.

So in general it is difficult to perform likelihood-based inference for

marginal models.

But there are new developments:

Constructing joint multivariate distribution by copulas

Vector GLM

Bayesian inference.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 128 / 143

§6.6.3 Marginal models for longitudinal data (1)

In many longitudinal studies, data consist of a small or moderate

number of repeated observations for many subjects; and the main

objective is to analyse the effects of covariates on a response variable,

without conditioning on previous responses.

In this setting it is more natural to view the data as a cross section of

individual time series yi = (y

T

i1, · · · , yTiTi )T , xi = (xTi1, · · · , xTiTi )T ,

i = 1, · · · , n. It is also often assume that, given the covariates,

individual time series are mutually independent.

Marginal models introduced in §3.5.2 can be used for these cases.

Suppose yit = (yit1, · · · , yitq)T are q = k − 1 dummy variables

describing a multi-categorical response Yit with k categories.

For simplicity, assume the number of categories is the same for all Yit

responses.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 129 / 143

§6.6.3 Marginal models for longitudinal data (2)

Marginal GEE models for longitudinal data are then defined as

(i) The marginal mean is correctly specified by a response model

µit(β) = E (yit |xit) = h(ηit), ηit = Zitβ

where ηit is q-dimensional with components ηitr = z

T

itrβ, z

T

itr is the

rth row of Zit .

(ii) The association structure for yi = (y

T

i1, · · · , yTiTi )T is specified by a

“working” covariance matrix Σi (β,α), depending on regression

parameters β, association parameters α, and possibly an additional

nuisance parameter φ.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 130 / 143

§6.6.3 Marginal models for longitudinal data (3)

Σi (β,α) can be determined by a variance function for yit together

with a “working” correlation matrix Ri (α) or some odds ratio model

γ(α).

Common choices for Ri (α):

1 stationary correlations between yis and yit

(Ri (α))st = α(|t − s|) (29)

2 its special form α(|t − s|) = α|t−s|. AR(1) correlation.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 131 / 143

Statistical inference in marginal GEE models (1)

Estimation of β using GEEs is carried out in complete analogy to

§3.5.2.

Denote E (yi |xi ) = µi (β) = (µTi1, · · · ,µTiTi )T , ZTi = (Zi1, · · · ,ZiTi ),

and (block-) diagnol matrices Di (β) = diag{Dit(β)}, with

Dit(β) =

∂hT

∂ηit

|ηit= Zitβ.

The GEE for β is

sβ(β,α) =

n∑

i=1

ZTi Di (β)Σ

−1

i (β,α)(yi − µi (β)) = 0 (30)

Given current estimates αˆ and φˆ, the GEE (30) for βˆ is solved by

βˆ

(k+1)

= βˆ

(k)

+

(

Fˆ (k)

)−1

sˆ(k), k = 1, 2, · · ·

with the “working” Fisher information matrix

Fˆ (k) =

∑n

i=1 Z

T

i Di (βˆ

(k)

)Σ−1i (βˆ

(k)

, αˆ)DTi (βˆ

(k)

)Zi and

sˆ(k) = sβ(βˆ

(k)

, αˆ).

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 132 / 143

Statistical inference in marginal GEE models (2)

Given a current estimate of β, α and φ can be estimated from

Pearson residuals

rˆit =

yit − µˆit

(v(µˆit))1/2

by the method of moments.

1 φˆ =

1

N − p

n∑

i=1

Ti∑

t=1

rˆ 2it , N =

n∑

i=1

Ti

2 Estimation of α depends on the choice of Ri (α). For the exchangeable

correlation model

αˆ =

{

φˆ

[

n∑

i=1

1

2

Ti (Ti − 1)− p

]}−1 n∑

i=1

∑

t>s

rˆit rˆis

3 A totally unspecified R = R(α) can be estimated if T << n.

4 In particular for binary and categorical observations, estimation of α

via GEE2 is often preferrable.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 133 / 143

Statistical inference in marginal GEE models (3)

Consistency and asymptotic normality results for βˆ can be obtained

along the lines of asymptotic theory of quasi-MLEs for independent

observations if Ti , i = 1, · · · , n are fixed and n→∞.

If αˆ is

√

n-consistent given β and φ; and φˆ is

√

n-consistent given β,

then βˆ is

√

n-consistent and asymptotically normal, i.e.

βˆ

a∼ N(β,F−1VF−1)

with F =

n∑

i=1

ZTi DiΣ

−1

i D

T

i Zi , V =

n∑

i=1

ZTi DiΣ

−1

i Cov(yi )Σ

−1

i D

T

i Zi .

Asymptotic covariance matrix A = F−1VF−1 can be consistently

estimated by replacing β, α and φ by their respective consistent

estimates and Cov(yi ) by (yi − µˆi )(yi − µˆi )T , i.e. by sandwich matrix

Aˆ = Ĉov(βˆ)

a≈ Fˆ−1

{

n∑

i=1

ZTi Dˆi Σˆ

−1

i (yi−µˆi )(yi−µˆi )T Σˆ−1i DˆTi Zi

}

Fˆ−1 (31)

Based on the preceeding asymptotic result, confidence intervals for β

and Wald and score tests may be constucted; cf. §2.3.1.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 134 / 143

Example 6.4 Ohio children (1)

The ohio data concern 537 children from Steubenville, Ohio and

were taken as part of a study on the effects of air pollution.

Children were in the study for four years from age seven to ten.

The response is whether they wheezed or not.

The variables are

resp: an indicator of wheeze status (1=yes, 0=no)

id: an identifier for the child, taking values from 0 to 536

age: 7 yrs = −2; 8 yrs = −1; 9 yrs =0; 10 yrs =1

smoke: mother’s smoking status at start of the study

(1=smoker, 0=nonsmoker)

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 135 / 143

Example 6.4 Ohio children (2)

Some analysis has been done to the data in R, producing the following

output.

> head(ohio)

resp id age smoke

1 0 0 -2 0

2 0 0 -1 0

3 0 0 0 0

4 0 0 1 0

5 0 1 -2 0

6 0 1 -1 0

> tail(ohio)

resp id age smoke

2143 1 535 0 1

2144 1 535 1 1

2145 1 536 -2 1

2146 1 536 -1 1

2147 1 536 0 1

2148 1 536 1 1

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 136 / 143

Example 6.4 Ohio children (3)

> str(ohio)

'data.frame': 2148 obs. of 4 variables:

$ resp : int 0 0 0 0 0 0 0 0 0 0 ...

$ id : int 0 0 0 0 1 1 1 1 2 2 ...

$ age : int -2 -1 0 1 -2 -1 0 1 -2 -1 ...

$ smoke: int 0 0 0 0 0 0 0 0 0 0 ...

> library(geepack}

> fit.exch <- geeglm(resp~age+smoke, family=binomial(link="logit"),

data=ohio, id=id, corstr = "exchangeable", std.err="san.se")

> summary(fit.exch)

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 137 / 143

Example 6.4 Ohio children (4)

> summary(fit.exch)

Call:

geeglm(formula = resp ~ age + smoke, family = binomial(link = "logit"),

data = ohio, id = id, corstr = "exchangeable", std.err = "san.se")

Coefficients:

Estimate Std.err Wald Pr(>|W|)

(Intercept) -1.880 0.114 272.597 < 2e-16 ***

age -0.113 0.044 6.684 0.00973 **

smoke 0.265 0.178 2.224 0.13588

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimated Scale Parameters:

Estimate Std.err

(Intercept) 0.9985 0.1116

Correlation: Structure = exchangeable Link = identity

Estimated Correlation Parameters:

Estimate Std.err

alpha 0.3543 0.06244

Number of clusters: 537 Maximum cluster size: 4

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 138 / 143

Example 6.4 Ohio children (5)

Use the above output to answer the following questions.

1 Let yit be the response value resp of child i at age t. Write down the

model involved in the analysis, including the mean, variance and

correlation coefficient of yit ’s. Give the estimates of the parameters

appearing in the model.

2 Write down the model’s design matrix for data where id=536.

3 Estimate the odds ratio of wheezing for a child for every one year

older in age. Also calculate an approximate standard error of your

odds ratio estimate.

4 Estimate the odds ratio of wheezing for a 10-year old child with a

smoking mother versus a 9-year old child with nonsmoking mother.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 139 / 143

Example 6.4 Ohio children (6)

1 Let yit be the response value resp of child i at age t. Write down the

model involved in the analysis, including the mean, variance and

correlation coefficient of yit ’s. Give the estimates of the parameters

appearing in the model.

Let piit = P(yit = 1|age, smoke), t = 1, · · · , 4; i = 1, · · · , 537. Then

the model is specified by the following

log

pˆiit

1− pˆiit = βˆ0 + βˆ1 · age + βˆ2 · smoke

= −1.880− 0.113× age + 0.265× smoke

V̂ar(yit) = φˆpˆiit(1− pˆiit) = 0.9985e

−1.880−0.113·age+0.265·smoke

1 + e−1.880−0.113·age+0.265·smoke

Ĉorr(yit , yis) = αˆ = 0.3543 for t 6= s

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 140 / 143

Example 6.4 Ohio children (7)

2 Write down the model’s design matrix for data where id=536.

Z (id = 536) =

1 −2 1

1 −1 1

1 0 1

1 1 1

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 141 / 143

Example 6.4 Ohio children (8)

3 Estimate the odds ratio of wheezing for a child for every one year

older in age. Also calculate an approximate standard error of your

odds ratio estimate.

ÔR = eβˆ1 = e−0.113 = 0.893

s.e.(ÔR) ≈ eβˆ1 · s.e.(βˆ1) = e−0.113 × 0.044 = 0.039.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 142 / 143

Example 6.4 Ohio children (9)

4 Estimate the odds ratio of wheezing for a 10-year old child with a

smoking mother versus a 9-year old child with nonsmoking mother.

For a 10-year old child with a smoking mother

log

pˆiit

1− pˆiit = βˆ0 + βˆ1 · 1 + βˆ2 · 1 = −1.880− 0.113 + 0.265

For a 9-year old child with a non-smoking mother

log

pˆiit

1− pˆiit = βˆ0 + βˆ1 · 0 + βˆ2 · 0 = −1.880

Therefore

ÔR =

eβˆ0+βˆ1+βˆ2

eβˆ0

= eβˆ1+βˆ2 = e−0.113+0.265 = e0.152 = 1.164.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 143 / 143

Then a cumulative model or threshold model with cdf F is

P(Y ≤ r |x) = P(U ≤ θr ) = F (θr + xTγ). (7)

One may assume a linear form for thresholds θ1, · · · , θq,

θr = βr0 + w

Tβr

where w is a vector explanatory variable and βr = (βr1, · · · , βrm)T is

a category-specific parameter vector.

Then an extended cumulative model is obtained:

P(Y ≤ r |x,w) = F (βr0 + wTβr + xTγ). (8)

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 47 / 143

S6.4.2 Extended version of cumulative models (2)

Remarks

1 Model (8) becomes unidentifiable if wi = xi for each individual

i = 1, · · · , n; i.e., it will not be possible to separate γ from each βr if

wi = xi .

2 So wi and xi must not be the same.

3 βr can be global, while γ can be replaced by a category-specific

parameter.

4 wi are called threshold variables, while xi are called shift variables.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 48 / 143

§6.4.3 Link functions for cumulative models

For a cumulative GLM, the link function g =(g1, · · · , gq), as a vector

function of (pi1 = P(Y =1), · · · , piq = P(Y =q)), is given by

gr (pi1, · · · , piq) = F−1(pi1 + · · ·+ pir ), r = 1, · · · , q,

where F−1(·) is the inverse function for F that specifies the

cumulative model

pi1 + · · ·+ pir = P(Y ≤ r |x,w) = F (βr0 + wTβr + xTγ).

F being the N(0, 1) cdf gives the probit link

gr (pi1, · · · , piq) = Φ−1(pi1 + · · ·+ pir ), r = 1, · · · , q.

For proportional odds model, the link is the cumulative logit

gr (pi1, · · · , piq) = log pi1 + · · ·+ pir

1− (pi1 + · · ·+ pir ) , r = 1, · · · , q.

For proportional hazards model, the link is the complement log-log

gr (pi1, · · · , piq) = log{− log(1− (pi1 + · · ·+ pir ))}, r = 1, · · · , q.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 49 / 143

§6.4.3 Design matrices for cumulative models (1)

For the simple cumulative model

P(Yi ≤ r |xi ) = F (θr + xTi γ),

the design matrix Zi in the linear predictor ηi = Ziβ is given by

Zi =

1 xTi

1 xTi

. . .

...

1 xTi

q×(q+dim(xi ))

and

β = (θ1, · · · , θq,γT )T .

Here xi should not contain a constant component.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 50 / 143

§6.4.3 Design matrices for cumulative models (2)

For the extended cumulative model

P(Yi ≤ r |xi ,wi ) = F (βr0 + wTi βr + xTi γ),

the design matrix Zi in the linear predictor ηi = Ziβ is given by

Zi =

1 wTi x

T

i

1 wTi x

T

i

. . .

. . .

...

1 wTi x

T

i

q×(q(1+dim(wi ))+dim(xi ))

and

β = (β10,β

T

1 , · · · , βq0,βTq ,γT )T .

Here neither wi nor xi should contain a constant component.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 51 / 143

§6.4.3 Alternative formulation/parameterisation in

cumulative models (1)

The threshold parameters θ1, · · · , θq in cumulative models

P(Y ≤ r |x) = F (θr + xTγ) are restricted by

θ1 < θ2 < · · · < θq

The corresponding restriction for the extended cumulative model is

β10 + w

Tβ1 < β20 + w

Tβ2 < · · · < βq0 + wTβq

which is more complex.

If such constraints are not explicitly accounted for in estimation, the

iterative estimation procedure may fail due to fitting inadmissible

parameters.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 52 / 143

§6.4.3 Alternative formulation/parameterisation in

cumulative models (2)

For the simple cumulative model, the problem can be avoided by

using an alternative formulation or reparameterisation:

α1 = θ1, αr = log(θr − θr−1), r = 2, · · · , q

⇐⇒ θ1 = α1, θr = θ1 +

r∑

i=2

eαi , r = 2, · · · , q.

The new parameters α1, · · · , αq are unconstrained.

The cumulative model under the new formulation has a linear

structure of the following form

F−1 (P(Y = 1|x)) = α1 + xTγ

log

[

F−1 (P(Y ≤ r |x))− F−1 (P(Y ≤ r − 1|x))] = αr , r = 2, · · · , q.

The link function corresponding to the new formulation can be

determined accordingly.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 53 / 143

§6.4.3 Alternative formulation/parameterisation in

cumulative models (3)

For the special case of the cumulative logistic model, one obtains the

following link function (log-logit link)

g1(pi1, · · · , piq) = log

(

pi1

1− pi1

)

gr (pi1, · · · , piq) = log

[

log

{

pi1 + · · ·+ pir

1− pi1 − · · · − pir

}

− log

{

pi1 + · · ·+ pir−1

1− pi1 − · · · − pir−1

}]

,

q = 2, · · · , q.

If this alternative link function is used, the design matrix Zi has to be

adapted. Now

Zi =

1 xTi

1 0

. . .

...

1 0

q×(q+dim(xi ))

and the parameter vector becomes β = (α1, · · · , αq,γT )T .

Here xi should not contain a constant component.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 54 / 143

§6.4.3 Computing packages and an example

The polr() function in the R MASS package can be used to fit

cumulative models (but not extended cumulative models).

More generally, the ordinal package in R can be used for fitting

ordinal regression models.

Example 2 in §6.1. Breathing test results.

The response variable is BTR, ordinal with k = 3 levels.

Age (2 levels) and Smoking status (3 levels) are predictors.

The aim is to see how Age and Smoking are associated with BTR.

Command polr in R MASS package is used in the analysis.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 55 / 143

§6.4.3 Breathing testing example (1)

Usage of polr:

polr(formula, data, weights, start, ..., subset, na.action,

contrasts = NULL, Hess = FALSE, model = TRUE,

method = c("logistic", "probit", "loglog", "cloglog", "cauchit"))

Note the estimates of γ are those from polr multiplied by −1 due to

the model used in polr being P(Y ≤ r |x) = F (θr − xTγ).

> library(MASS); BTR.dat <- read.csv("D:/MAST90139/Example6-2BTR.csv"); BTR.dat

BTR Age Smoking Freq

1 1Normal <40 1Never 577

2 2Border <40 1Never 27

3 3Abnorm <40 1Never 7

4 1Normal <40 2Former 192

5 2Border <40 2Former 20

6 3Abnorm <40 2Former 3

7 1Normal <40 3Current 682

8 2Border <40 3Current 46

9 3Abnorm <40 3Current 11

10 1Normal 40to59 1Never 164

11 2Border 40to59 1Never 4

12 3Abnorm 40to59 1Never 0

13 1Normal 40to59 2Former 145

14 2Border 40to59 2Former 15

15 3Abnorm 40to59 2Former 7

16 1Normal 40to59 3Current 245

17 2Border 40to59 3Current 47

18 3Abnorm 40to59 3Current 27

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 56 / 143

§6.4.3 Breathing testing example (2)

> is.factor(BTR.dat$BTR) #TRUE; > is.ordered(BTR.dat$BTR) #FALSE

> is.factor(BTR.dat$Age) #TRUE; > is.factor(BTR.dat$Smoking) #TRUE

> is.factor(BTR.dat$Freq) #FALSE;

> BTR.dat$BTR=as.ordered(BTR.dat$BTR) #Set BTR as an ordinal factor

> help(polr)

> BTR.logit=polr(BTR~Age+Smoking+Age:Smoking, data=BTR.dat, weights=Freq,

Hess=T,method="logistic")

> summary(BTR.logit)

Coefficients:

Value Std. Error t value

Age40to59 -0.8857 0.5359 -1.653

Smoking2Former 0.6973 0.2822 2.471

Smoking3Current 0.3472 0.2238 1.551

Age40to59:Smoking2Former 1.1458 0.6228 1.840

Age40to59:Smoking3Current 2.2007 0.5689 3.868

Intercepts:

Value Std. Error t value

1Normal|2Border 2.8334 0.1764 16.0603

2Border|3Abnorm 4.3078 0.2130 20.2210

Residual Deviance: 1564.968

AIC: 1578.968

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 57 / 143

§6.4.3 Breathing testing example (3)

BTR.logit1=polr(BTR~Age+Smoking,data=BTR.dat, weights=Freq,Hess=T,method="logistic")

summary(BTR.logit1)

Coefficients:

Value Std. Error t value

Age40to59 0.7772 0.1482 5.243

Smoking2Former 0.7815 0.2333 3.350

Smoking3Current 0.9607 0.1918 5.010

Intercepts:

Value Std. Error t value

1Normal|2Border 3.1927 0.1749 18.2544

2Border|3Abnorm 4.6543 0.2125 21.9076

Residual Deviance: 1589.744

AIC: 1599.744

anova(BTR.logit, BTR.logit1)

Likelihood ratio tests of ordinal regression models

Response: BTR

Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)

1 Age + Smoking 2214 1589.744

2 Age + Smoking + Age:Smoking 2212 1564.968 1 vs 2 2 24.77585 4.168e-06

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 58 / 143

§6.4.3 Breathing testing example (4)

> BTR.logit$fitted

1Normal 2Border 3Abnorm

1 0.9444565 0.04225961 0.013283873

2 0.9444565 0.04225961 0.013283873

3 0.9444565 0.04225961 0.013283873

4 0.8943685 0.07930628 0.026325236

5 0.8943685 0.07930628 0.026325236

6 0.8943685 0.07930628 0.026325236

7 0.9231700 0.05813458 0.018695369

8 0.9231700 0.05813458 0.018695369

9 0.9231700 0.05813458 0.018695369

10 0.9763207 0.01815786 0.005521451

11 0.9763207 0.01815786 0.005521451

12 0.9763207 0.01815786 0.005521451

13 0.8671630 0.09895796 0.033879045

14 0.8671630 0.09895796 0.033879045

15 0.8671630 0.09895796 0.033879045

16 0.7633793 0.17036521 0.066255462

17 0.7633793 0.17036521 0.066255462

18 0.7633793 0.17036521 0.066255462

> BTR.logit$residual #does not exist.

> BTR.logit$lp #linear predictor values

1 2 3 4 5 6 7 8 9

0.0000000 0.0000000 0.0000000 0.6972823 0.6972823 0.6972823 0.3472245 0.3472245 0.3472245

10 11 12 13 14 15 16 17 18

-0.8857462 -0.8857462 -0.8857462 0.9573393 0.9573393 0.9573393 1.6621466 1.6621466 1.6621466

> attributes(BTR.logit)

[1] "coefficients" "zeta" "deviance" "fitted.values" "lev" "terms"

[7] "df.residual" "edf" "n" "nobs" "call" "method"

[13] "convergence" "niter" "lp" "Hessian" "model" "contrasts"

[19] "xlevels"

$class

[1] "polr"

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 59 / 143

§6.4.3 Breathing testing example (5)

On the previous 3 slides we fit two cumulative logistic models to BTR

using Age and Smoking: the first (BTR.logit includes the

Age:Smokings interaction-effect terms as well as the main-effect

terms; while the second (BTR.logit1 includes main-effect terms only.

Dummy coding (i.e. contr.treatment coding in R) is used for

representing Age and Smoking:

Age40to59 =

{

1, if age is 40 ∼ 59

0 if age < 40

Smoking2Former =

{

1, former smoker

0 else

,

Smoking3Current =

{

1, current smoker

0 else

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 60 / 143

§6.4.3 Breathing testing example (6)

The first model can be written as

log

P(BTR = Normal)

P(BTR > Normal)

= θ1 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current

log

P(BTR ≤ Bordering)

P(BTR > Bordering)

= θ2 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current

MLEs of the parameters in the first model are

θˆ1 = 2.833, θˆ2 = 4.308, γˆ1 = 0.886, γˆ2 = −0.697, γˆ3 = −0.347, γˆ4 = −1.146, γˆ5 = −2.201

with their standard errors given in the R output.

Estimated values of the γ parameters can be interpreted as log

cumulative odds ratios. For example, the odds of having a normal

test result (or normal or bordering result) for a worker aged 40 to 59

who never smoked is estimated to be e−γˆ2−γˆ4 = e0.697+1.146 = 6.32

times of that for a worker aged 40 to 59 who is a former worker.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 61 / 143

§6.4.3 Breathing testing example (7)

The second model can be written as

log

P(BTR = Normal)

P(BTR > Normal)

= θ1 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

log

P(BTR ≤ Bordering)

P(BTR > Bordering)

= θ2 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

MLEs of the parameters in the first model are

θˆ1 = 3.193, θˆ2 = 4.654, γˆ1 = −0.777, γˆ2 = −0.782, γˆ3 = −0.961

with their standard errors given in the R output.

“H0 : γ4 = γ5 = 0 vs. H1 : not H0” is the hypothesis about the

Age:Smoking interaction effects on BTR.

A χ2 analysis of deviance test comparing models BTR.logit and

textttBTR.logit1 gives an LR statistic of 22.776 and p-value of

4.168× 10−6. Therefore, there is very strong evidence to reject H0.

We conclude that Age and Smoking have very strong interaction

effects on BTR.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 62 / 143

§6.4.3 Breathing testing example (8)

Linear predictor of the first model is

η1 = θ1 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current

η2 = θ2 + γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current

However, the linear predictor returned from polr is just xTγ, i.e.

γ1Age40to59 + γ2Smoking2Former + γ3Smoking3Current

+γ4Age40to59 · Smoking2Former + γ5Age40to59 · Smoking3Current

thus has just 6 different values knowing that there are 6

combinations of Age and Smoking.

Knowing these tricks, there should be no difficulty to reveal how the

fitted values returned by BTR.logit$fitted are obtained.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 63 / 143

§6.4.3 Breathing testing example (9)

Effect coding (i.e. contr.sum coding in R, −1 for last category) may

also be used for representing Age and Smoking:

Age1 =

{

1, if age < 40

−1, if age is 40 ∼ 59

Smoking1 =

1, never smoked0, former smoker−1, current smoker , Smoking2 =

0, never smoked1, former smoker−1, current smoker

With this coding, the first model can be written as

log

P(BTR = Normal)

P(BTR > Normal)

= θ1 + γ1Age1 + γ2Smoking1 + γ3Smoking2

+γ4Age1 · Smoking1 + γ5Age1 · Smoking2

log

P(BTR ≤ Bordering)

P(BTR > Bordering)

= θ2 + γ1Age1 + γ2Smoking1 + γ3Smoking2

+γ4Age1 · Smoking1 + γ5Age1 · Smoking2

Although the parameter estimates will be different now, other results

from the model will be the same as that using dummy coding.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 64 / 143

§6.4.3 Breathing testing example (10)

> options(contrasts = c("contr.sum", "contr.poly"))

> BTR.logit=polr(BTR~Age+Smoking+Age:Smoking, data=BTR.dat, weights=Freq, Hess=T,method="logistic")

> summary(BTR.logit)

Coefficients:

Value Std. Error t value

Age1 -0.11487 0.1086 -1.0580

Smoking1 -0.90596 0.1890 -4.7933

Smoking2 0.36428 0.1421 2.5644

Age1:Smoking1 0.55777 0.1890 2.9512

Age1:Smoking2 -0.01517 0.1421 -0.1068

Intercepts:

Value Std. Error t value

1Normal|2Border 2.3704 0.1086 21.8209

2Border|3Abnorm 3.8447 0.1599 24.0486

Residual Deviance: 1564.968

AIC: 1578.968

> BTR.logit1=polr(BTR~Age+Smoking, data=BTR.dat, weights=Freq, Hess=T,method="logistic")

> anova(BTR.logit, BTR.logit1)

Likelihood ratio tests of ordinal regression models

Response: BTR

Model Resid. df Resid. Dev Test Df LR stat. Pr(Chi)

1 Age + Smoking 2214 1589.744

2 Age + Smoking + Age:Smoking 2212 1564.968 1 vs 2 2 24.77585 4.168618e-06

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 65 / 143

§6.4.3 Breathing testing example (11)

> attributes(BTR.logit)

$names

[1] "coefficients" "zeta" "deviance" "fitted.values" "lev" "terms" "df.residual"

[8] "edf" "n" "nobs" "call" "method" "convergence" "niter"

[15] "lp" "Hessian" "model" "contrasts" "xlevels"

> BTR.logit$fitted

1Normal 2Border 3Abnorm

1 0.9444565 0.04225911 0.013284361

2 0.9444565 0.04225911 0.013284361

3 0.9444565 0.04225911 0.013284361

4 0.8943660 0.07930714 0.026326874

5 0.8943660 0.07930714 0.026326874

6 0.8943660 0.07930714 0.026326874

7 0.9231672 0.05813603 0.018696801

8 0.9231672 0.05813603 0.018696801

9 0.9231672 0.05813603 0.018696801

10 0.9763222 0.01815653 0.005521309

11 0.9763222 0.01815653 0.005521309

12 0.9763222 0.01815653 0.005521309

13 0.8671585 0.09895993 0.033881540

14 0.8671585 0.09895993 0.033881540

15 0.8671585 0.09895993 0.033881540

16 0.7633676 0.17037058 0.066261785

17 0.7633676 0.17037058 0.066261785

18 0.7633676 0.17037058 0.066261785

> BTR.logit$lp #linear predictor values

1 2 3 4 5 6 7 8 9 10

-0.4630592 -0.4630592 -0.4630592 0.2342498 0.2342498 0.2342498 -0.1157938 -0.1157938 -0.1157938 -1.3488684

11 12 13 14 15 16 17 18

-1.3488684 -1.3488684 0.4943191 0.4943191 0.4943191 1.1991525 1.1991525 1.1991525

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 66 / 143

§6.4.3 Breathing testing example (12)

We also fit a grouped Cox or proportional hazards model.

> BTR.ph=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat, weights=Freq,

Hess=T, method="cloglog")

> summary(BTR.ph)

Coefficients:

Value Std. Error t value

Age40to59 -0.2865 0.14264 -2.008

Smoking2Former 0.2125 0.10054 2.114

Smoking3Current 0.1088 0.07301 1.490

Age40to59:Smoking2Former 0.4333 0.18941 2.288

Age40to59:Smoking3Current 0.8379 0.16357 5.122

Intercepts:

Value Std. Error t value

1Normal|2Border 1.0477 0.0558 18.7612

2Border|3Abnorm 1.5528 0.0629 24.6916

Residual Deviance: 1559.949

AIC: 1573.949

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 67 / 143

§6.4.3 Breathing testing example (13)

We further fit a cumulative model with the extreme maximal-value

distribution.

> BTR.extm=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat,weights=Freq,

Hess=T,method="loglog")

> summary(BTR.extm)

Coefficients:

Value Std. Error t value

Age40to59 -0.8672 0.5286 -1.640

Smoking2Former 0.6755 0.2700 2.501

Smoking3Current 0.3368 0.2167 1.554

Age40to59:Smoking2Former 1.1003 0.6070 1.813

Age40to59:Smoking3Current 2.0724 0.5573 3.719

Intercepts:

Value Std. Error t value

1Normal|2Border 2.8614 0.1715 16.6838

2Border|3Abnorm 4.2761 0.2080 20.5605

Residual Deviance: 1566.335

AIC: 1580.335

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 68 / 143

§6.4.3 Breathing testing example (14)

We fit a grouped Cox or proportional hazards model using the effect

coding.

> options(contrasts = c("contr.sum", "contr.poly"))

> BTR.ph=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat, weights=Freq,

Hess=T, method="cloglog")

> summary(BTR.ph)

Coefficients:

Value Std. Error t value

Age1 -0.06862 0.03421 -2.00575

Smoking1 -0.31898 0.05366 -5.94408

Smoking2 0.11022 0.04961 2.22168

Age1:Smoking1 0.21188 0.05361 3.95205

Age1:Smoking2 -0.00481 0.04960 -0.09699

Intercepts:

Value Std. Error t value

1Normal|2Border 0.8720 0.0353 24.7072

2Border|3Abnorm 1.3771 0.0441 31.2213

Residual Deviance: 1559.949

AIC: 1573.949

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 69 / 143

§6.4.3 Breathing testing example (15)

We finally fit a cumulative model with the extreme maximal-value

distribution using the effect coding.

> options(contrasts = c("contr.sum", "contr.poly"))

> BTR.extm=polr(BTR~Age+Smoking+Age:Smoking,data=BTR.dat,weights=Freq,

Hess=T,method="loglog")

> summary(BTR.extm)

Coefficients:

Value Std. Error t value

Age1 -0.09523 0.1053 -0.904

Smoking1 -0.86618 0.1854 -4.671

Smoking2 0.35943 0.1361 2.642

Age1:Smoking1 0.52877 0.1854 2.852

Age1:Smoking2 -0.02136 0.1361 -0.157

Intercepts:

Value Std. Error t value

1Normal|2Border 2.4288 0.1054 23.0536

2Border|3Abnorm 3.8434 0.1573 24.4373

Residual Deviance: 1566.335

AIC: 1580.335

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 70 / 143

§6.4.4 Motivating example for sequential models

In many applications the ordering of the response categories is due to

a sequential mechanism. The categories are ordered since they can be

reached only successively.

Example: Tonsil size. Children have been classified according to their

tonsil size and whether or not they are carriers of Streptococcus pyogenes.

Table 6: Tonsil size and Streptococcus pyogenes (Holmes & Williams, 1954)

Present but

not enlarged

Enlarged

Greatly

enlarged

Carriers 19 29 24

Noncarriers 497 560 269

It may be assumed that tonsil size always starts in the normal state “present but not

enlarged” (category 1). If the tonsils grow abnormally, they may become “enlarged”

(category 2); if the process does not stop, tonsils may become “greatly enlarged” (category

3). But in order to get greatly enlarged tonsils, they first have to be enlarged for the

duration of the intermediate state “enlarged.”

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 71 / 143

§6.4.4 Sequential models (1)

Let latent variables Ur , r = 1, · · · , q with q = k − 1 have the linear

form

Ur = −xTγ + εr , where εr has cdf F .

We assume the ordinal response variable Y is determined by Ur ’s in

the following sequential way:

Y = 1 ⇔ U1 ≤ θ1

Y = 2 given Y ≥ 2 ⇔ U2 ≤ θ2; and in general

Y = r given Y ≥ r ⇔ Ur ≤ θr ; or equivalently

Y > r given Y ≥ r ⇔ Ur > θr , r = 1, · · · , q (9)

The sequential mechanism (9) models the transition from category r

to category r + 1 given that category r is reached.

The main difference to the threshold approach used in the cumulative

model is the conditional modelling of the transitions. The sequential

mechanism assumes a binary decision in each step.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 72 / 143

§6.4.4 Sequential models (2)

The sequential response mechanism (9) combined with the linear

form of the latent variable

Ur = −xTγ + εr , where εr has cdf F ,

leads to the sequential model with cdf F

P(Y = r |Y ≥ r , x) = P(Ur ≤ θr ) = F (θr + xTγ), r = 1, · · · , k,

(10)

where θk = +∞. It follows that

P(Y = r |x) = P(Y = r |Y ≥ r , x) · P(Y ≥ r |x)

= F (θr + x

Tγ) · P(Y > r − 1|Y ≥ r − 1, x) ·

P(Y > r − 2|Y ≥ r − 2, x) · · ·P(Y > 1|Y ≥ 1, x)

= F (θr + x

Tγ)[1− F (θr−1 + xTγ)] · · · [1− F (θ1 + xTγ)]

= F (θr + x

Tγ)

r−1∏

i=1

[1− F (θi + xTγ)], r = 1, · · · , k; Note ∏0i=1[·] = 0

Model (10) is also called the continuation ratio model.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 73 / 143

§6.4.4 Sequential models (3)

Note that there is no need to impose any order restriction on

θ1, · · · , θq in (10).

Now specify a cdf F will give a specific sequential model.

1 Choosing F (x) = (1 + e−x)−1 gives the sequential logit model

P(Y = r |Y ≥ r , x) = exp(θr + x

Tγ)

1 + exp(θr + xTγ)

, r = 1, · · · , k

which is equivalent to log

{

P(Y = r |x)

P(Y > r |x)

}

= θr + x

Tγ.

2 Choosing F (x) = 1− exp(−ex), the sequential model has the form

P(Y = r |Y ≥ r , x) = 1− exp

(

−eθr+xTγ

)

, r = 1, · · · , k (11)

which is equivalent to

log

[

− log

{

P(Y > r |x)

P(Y ≥ r |x)

}]

= θr + x

Tγ, r = 1, · · · , k. (12)

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 74 / 143

§6.4.4 Sequential models (4)

It is worth to note that (12) is equivalent to the cumulative model

with F (x) = 1− exp(−ex). This means (12) is a special parametric

form of the grouped Cox model. This equivalence follows by the

reparameterisation:

θr = log{e θ˜r − e θ˜r−1}, r = 1, · · · , k − 1.

Note θ˜0 = −∞ and θ1 = θ˜1. Further θ˜r = log

(

r∑

i=1

eθi

)

.

Substituting θr = log{e θ˜r − e θ˜r−1} into (12), one obtains

− logP(Y > r |x) + logP(Y > r − 1|x) = e θ˜r+xTγ − e θ˜r−1+xTγ , r = 1, · · · , q

⇒ − logP(Y > r |x) = e θ˜r+xTγ , r = 1, · · · , q

⇒ P(Y ≤ r |x) = 1− exp

(

−e θ˜r+xTγ

)

which is a grouped Cox model.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 75 / 143

§6.4.4 Sequential models (5)

3 Choosing exponential cdf F (x) = 1− e−x , the exponential

sequential model becomes

P(Y = r |Y ≥ r , x) = 1− e−(θr+xTγ)

⇔ − log

{

P(Y>r |x)

P(Y≥r |x)

}

= θr + xTγ, r = 1, · · · , q.

Generalised sequential models:

If assuming θr = δr0 + zTδr , where δr = (δr1, · · · , δrm)T is a

category-specific vector parameter, then one gets the Generalised

sequential model :

P(Y = r |Y ≥ r , x, z) = F (δr0 + zTδr + xTγ) (13)

Remark: The effect of z is non-homogeneous over the categories, while

the effect of x is homogeneous.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 76 / 143

§6.4.4 Link functions and design matrices in sequential models

Response and link functions may be derived directly from model (10).

The link function g = (g1, · · · , gq)T is given by

gr (pi1, · · · , piq) = F−1

(

pir

1− pi1 − · · · − pir−1

)

, r = 1, · · · , q

and the response function h = (h1, · · · , hq)T has the form

hr (η1, · · · , ηq) = F (ηr )

r−1∏

i=1

(1− F (ηi )), r = 1, · · · , q.

For the sequential logit model, with r = 1, · · · , q

gr (pi1, · · · , piq) = log pir

1−pi1−· · ·−pir−1 and hr (η1, · · · , ηq) = e

ηr

r−1∏

i=1

(1 + eηi )−1.

In regard to the design matrices there is no difference between

sequential and cumulative models. Thus the design matrices from

§3.3.3 apply.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 77 / 143

§6.4.4 Strict stochastic ordering (optional)

Strict stochastic ordering is a concept generalizing the proportional

odds for simple cumulative model (7)

∆c(x1, x2) = F

−1 (P(Y ≤ r |x1))− F−1 (P(Y ≤ r |x2)) .

If ∆c(x1, x2) = γT (x1 − x2), we say strict stochastic ordering holds.

Then for general cumulative model (8) where w = x is set and xTγ is

omitted, one can test strict stochastic ordering by testing

β1 = β2 = · · · = βq.

The concept can be generalized to the sequential model (10) as well.

Let

∆s(x1, x2) = F

−1 (P(Y = r |Y ≥ r , x1))−F−1 (P(Y = r |Y ≥ r , x2)) .

If ∆s(x1, x2) = γT (x1 − x2), we say strict stochastic ordering holds

for the sequential model.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 78 / 143

§6.4.4 Two-step models (optional) (1)

It may be more appropriate to divide the ordered categories into sets

of categories with very homogeneous responses in the first step, then

model the categories within each set separately in the second step.

This is the two-step modelling approach.

Example: Rheumatoid arthritis. Mehta, Patel & Tsiatis (1984) analysed

data of patients with acute rheumatoid arthritis. A new agent was compared

with an active control, and each patient was evaluated on a 5-point

assessment scale.

Table 7: Clinical trial of a new agent and an active control

Global assessment

Drug

Much

improved

Improved

No

change

Worse

Much

worse

New agent 24 37 21 19 6

Active control 11 51 22 21 7

Global assessment may be subdivided into“improvement”, “no change” and

“worse” first. Then “improvement” is split up into “much improved” and

“improved” and the “worse” category is split into “worse” and “much worse.”

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 79 / 143

§6.4.4 Two-step models (optional) (2)

Specifically, let the categories 1, · · · , k be subdivided into t basic sets

S1, · · · ,St , where Sj = {mj−1 + 1, · · · ,mj}, m0 = 0 and mt = k.

In Step 1: Y ∈ Sj ⇔ θj−1 < U0 ≤ θj , j = 1, · · · , t.

Assume U0 = −xTγ0 + ε.

In Step 2: Y = R | Y ∈ Sj ⇔ θj ,r−1 < Uj ≤ θjr .

Assume Uj = −xTγ j + εj . Also ε, εj ’s are iid with cdf F .

Then the two-step model becomes

P(Y ∈ Tj |x) = F (θj + xTγ0), P(Y ≤ r |Y ∈ Sj , x) = F (θjr + xTγ j)

(14)

where Tj = S1 ∪ · · · ∪ Sj , θ1 < · · · < θt−1, θt =∞;

θj ,mj−1+1 < · · · < θj ,mj−1 , θj ,mj =∞, j = 1, · · · , t.

Since cumulative mechanisms are used in both steps, (14) is called a

two-step cumulative model. Two-step sequential model can be

similarly defined.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 80 / 143

§6.4.4 Alternative approaches (optional)

1 Anderson (1984).

P(Y = r |x) = e

βr0−φrβT x

1 +

∑q

i=1 e

βi0−φiβT x

, r = 1, · · · , q;

where 1 = φ1 > · · · > φk = 0.

2 Williams and Grizzle (1972).

k∑

r=1

srP(Y = r |x) = xTβ

where s1, · · · , sk are some given scores for the categories.

3 Adjacent categories logit model (Agresti 1984).

log

P(Y = r |x)

P(Y = r − 1|x) = x

Tβr

⇔ P(Y = r |Y ∈ {r , r + 1}, x) = F (xTβr )

where F is the logistic cdf.

Multivariate GLM (Ch6) §6.4 Models for ordinal responses Guoqi Qian 81 / 143

§6.5 Statistical inference for MGLM overview

1 MLE

2 Testing of linear hypotheses

3 Goodness of fit (model adequacy) statistics

4 Power-divergence statistics

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 82 / 143

§6.5.1 Maximum likelihood estimation in MGLM (1)

An extension of the exponential family is the multivariate

exponential family, having the pdf

f (yi |θi , φ, ωi ) = exp

{

yTi θi − b(θi )

φ

· ωi + c(yi , φ, ωi )

}

For q × 1 vector observations y1, · · · , yn which are suppose to be

independent of each other, let µi = E (yi |xi ), i = 1, · · · , n, with xi

being the covariate vector.

The structural assumption of MGLM is that µi ’s depend on the

linear predictor ηi = Ziβ in the form

µi = h(ηi ) = h(Ziβ) = [h1(Ziβ), · · · , h1(Ziβ)]T , i = 1, · · · , n

where

the response function h : S → M is defined on S ⊂ Rq, taking values

in M ⊂ Rq;

Zi is a q × p design matrix for unit i ;

β = (β1, · · · , βp)T is an unknown parameter vector from B ⊂ Rp.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 83 / 143

§6.5.1 Maximum likelihood estimation in MGLM (2)

The MLE of β may be derived in analogy to the one-dimensional case.

The log-likelihood kernel for observation yi is

`i (µi ) =

yTi θi − b(θi )

φ

· ωi , where θi = θ(µi ) (15)

Using the relation µi = h(ηi ) = h(Ziβ), it can be found that the

score function is s(β) =

∂`

∂β

=

n∑

i=1

si (β), with

si (β) = Z

T

i Di (β)Σ

−1

i (β)[yi − µi (β)], (16)

where Di (β) =

∂hT (ηi )

∂ηi

is the derivative matrix of hT (ηi ) evaluated

at ηi = Ziβ, which is also called the Jacobian (matrix). And

Σi (β) = cov(yi ).

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 84 / 143

§6.5.1 Maximum likelihood estimation in MGLM (3)

An alternative form for si (β) is

si (β) = Z

T

i Wi (β)

∂g(µi )

∂µTi

[yi − µi (β)], where

Wi (β) = Di (β)Σ

−1

i (β)Di (β) =

{

∂g(µi )

∂µTi

Σi (β)

∂gT (µi )

∂µi

}−1

(17)

which approximately equals (cov(g(yi ))

−1.

The expected Fisher information matrix is

F (β) = cov(s(β)) =

n∑

i=1

ZTi Wi (β)Zi .

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 85 / 143

§6.5.1 Maximum likelihood estimation in MGLM (4)

In matrix form the above may be rewritten as

s(β) = ZTD(β)Σ−1(β)[y− µ(β)]

F (β) = ZTW (β)Z .

where y = (yT1 , · · · , yTn )T , µ(β) = (µ1(β)T , · · · ,µn(β)T )T ,

Σ(β) = diag (Σi (β)), W (β) = diag (Wi (β)), D(β) = diag (Di (β)),

all being block diagonal, and the total design matrix

Z =

[

ZT1 , · · · ,ZTn

]T

.

In the situation of grouped data, yi is the mean vector over ni

observations and Σi (β) is replaced by n

−1

i Σi (β).

Under regularity conditions,

MLE βˆ

a∼ N(β,F−1(βˆ)).

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 86 / 143

§6.5.1 Numerical computing in MGLM

This is the same as in the univariate case, except using the

multivariate versions.

1 Working or pseudo observation vector

y˜(β) = (y˜1(β)

T , · · · , y˜n(β)T )T , where

y˜i (β) = Ziβ +

(

D−1i (β)

)T

[yi − µi (β)]

is an approximation for g(µi (β)).

2 The IRWLS estimate is

βˆ

(k+1)

=

(

ZTW (βˆ

(k)

)Z

)−1

ZTW (βˆ

(k)

)y˜(β(k))

which is equivalent to the Fisher scoring iteration

βˆ

(k+1)

= βˆ

(k)

+

(

ZTW (βˆ

(k)

)Z

)−1

s(β(k)).

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 87 / 143

§6.5.2 Testing linear hypotheses in MGLM

The linear hypotheses

H0 : Cβ = ξ vs. H1 : Cβ 6= ξ

can be tested in the same way as in the univariate GLM, except replacing

the score and Fisher information by their multivariate versions.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 88 / 143

§6.5.2 Goodness-of-fit statistics in MGLM (1)

Goodness of fit (or model adequacy) of the MGLM can be assessed based

on the Pearson statistics and the deviance (the data need be grouped as

much as possible).

General Pearson Statistics:

χ2 =

g∑

i=1

(yi − µˆi )TΣ−1i (βˆ)(yi − µˆi )

In the case of multi-categorical response variable with multinomial

distribution niyi ∼ M(ni ,pii ), we have

χ2 =

g∑

i=1

χ2p(yi , pˆii ) =

g∑

i=1

ni

k∑

j=1

(yij − pˆiij)2

pˆiij

with yik = 1− yi1 − · · · − yiq and pˆiik = 1− pˆii1 − · · · − pˆiiq.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 89 / 143

§6.5.2 Goodness-of-fit statistics in MGLM (2)

Deviance or likelihood ratio (LR) statistics:

D = −2

g∑

i=1

{`i (pˆii )− `i (yi )}

.

For multinomial data,

D = 2

g∑

i=1

χ2D(yi , pˆii ) = 2

g∑

i=1

ni

k∑

j=1

yij log

yij

pˆiij

Under regularity conditions,

χ2

a∼ χ2(g(k − 1)− p) and D a∼ χ2(g(k − 1)− p)

where p is the number of estimated parameters.

For sparse data with small ni , one should use alternative asymtotics.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 90 / 143

§6.5.3 Power-divergence family (1)

For categorical data a more general single-parameter family of

goodness-of-fit statistics is the power-divergence family, introduced by

Cressie and Read (1984, JRSSB).

The power-divergence statistic with parameter λ ∈ R is given by

Sλ =

g∑

i=1

SDλ(yi , pˆii )

where the sum of deviations over observations at yi is

SDλ(yi , pˆii ) =

2ni

λ(λ+ 1)

k∑

j=1

yij

[(

yij

pˆiij

)λ

− 1

]

, −∞ < λ <∞ (18)

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 91 / 143

§6.5.3 Power-divergence family (2)

1 At λ = 1, S1 =

g∑

i=1

ni

k∑

j=1

(yij − pˆiij)2

pˆiij

, the usual Pearson statistic.

2 At λ→ 0, S0 = 2

g∑

i=1

ni

k∑

j=1

yij log

yij

pˆiij

, the likelihood ratio statistic.

3 At λ→ −1, S−1 = 2

g∑

i=1

ni

k∑

j=1

pˆiij log

pˆiij

yij

, Kullback’s minimum

discrimination information statistic.

4 At λ = −2, S−2 =

g∑

i=1

ni

k∑

j=1

(yij − pˆiij)2

yij

, Neyman’s minimum

modified χ2-statistic.

5 At λ = −12 , S− 12 =4

g∑

i=1

k∑

j=1

(

√

niyij −

√

ni pˆiij)

2, Freeman-Tukey statistic.

In general, λ ∈ [−1, 2] is recommended.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 92 / 143

§6.5.3 Asymptotic properties under classical “fixed cells” assumptions

Classical assumptions in asymptotic theory for grouped data imply

1 a fixed number of groups (g is fixed)

2 increasing sample sizes ni →∞, i = 1, · · · , g , such that nin → λi for

fixed proportions λi > 0, i = 1, · · · , g .

3 a fixed “number of cells” k in each group

4 a fixed number of parameters.

If these assumptions hold and in addition, the model under consideration is

correct, then

Sλ

a∼ χ2(g(k − 1)− p), p is the # of estimated parameters.

And under local alternative hypothesis, the limit distribution of Sλ is

non-central χ2.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 93 / 143

§6.5.3 Asymptotic properties under sparseness and “increasing-cells”

assumptions

If several explanatory variables are considered, the number of

observations for a fixed explanatory variable x is often small and the

usual asymptotic machinery will fail.

Under such sparseness conditions, it may be assumed that with

increasing sample size n→∞ the number of groups (values of the

explanatory variables) is also increasing with g →∞, resulting in the

“increasing-cells” setting.

Now Sλ is no longer χ

2-distributed in the limit. Rather Sλ has an

asymptotic normal distribution under H0 that the model holds.

Details are not pursued here.

Multivariate GLM (Ch6) §6.5 Statistical inference for MGLM (optional) Guoqi Qian 94 / 143

§6.6 Multivariate models for correlated responses

So far we have been assuming there is only one response variable,

possibly multi-categorical, being under consideration.

Now we consider the situations where a vector of correlated or

clustered response variables is observed, together with covariates, for

each unit in the sample.

These situations often happen in longitudinal studies, repeated

measurements studies, and grouped (clustered) studies, etc.

When the response variables are approximately Gaussian, multivariate

linear models are commonly used for the underlying data analysis,

which has been well established.

The situations become more difficult when the response variables are

discrete, categorical or mixed discrete/categorical/continuous.

Three main approaches will be introduced here:

1 conditional models

2 marginal models

3 random effects models

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 95 / 143

§6.6.1 Conditional models: Asymmetric models (optional)1

Asymmetric models

In many applications, the components of a response vector are

ordered in a way that some components are prior to the other

components, e.g. if they refer to events that take place earlier.

Consider the simplest case of a response vector Y = (Y1,Y2), with

categorical components Y1 ∈ {1, · · · , k1} and Y2 ∈ {1, · · · , k2}. Let

Y2 refer to events that may be considered conditional on Y1.

An example: response years in school (Y1) and the statement about

happiness (Y2).

Then Y1 may be modelled to be dependent on the explanatory

variables x, using a model introduced in §3.2 and §3.3.

And Y2 may be modelled based on Y1 and x, also using a model

introduced in §3.2 and §3.3.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 96 / 143

§6.6.1 Conditional models: Asymmetric models (optional)2

Asymmetric models

In general, consider m categorical responses Y1, · · · ,Ym, where Yj

depends on Y1, · · · ,Yj−1 but not on Yj+1, · · · ,Ym. Models that

make use of this dependence structure are based on the decomposition

P(Y1, · · · ,Ym|x) = P(Y1|x) · P(Y2|Y1, x) · · ·P(Ym|Y1, · · · ,Ym−1, x)

(19)

Simple models arise if each component in (19) is specified by a GLM:

P(Yj = r |Y1, · · · ,Yj−1, x) = hj(Zjβ) (20)

where Zj = Z (Y1, · · · ,Yj−1, x) is a function of previous components

Y1, · · · ,Yj−1 and the explanatory variables x. Models of form (20)

are sometimes called data-driven ones.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 97 / 143

§6.6.1 Conditional models: Asymmetric models (optional)3

Markov-type transition models follow from the additional

assumption P(Yj = r |Y1, · · · ,Yj−1, x) = P(Yj = r |Yj−1, x).

One such example for binary responses is

log

P(y1 = 1|x)

P(y1 = 0|x) = β01 + z

T

1 β1

log

P(yj = 1|y1, · · · , yj−1, x)

P(yj = 0|y1, · · · , yj−1, x) = β0j + z

T

j βj + yj−1γj , j = 2, · · · , k.

It is called a regressive logistic model when

log

P(yj = 1|y1, · · · , yj−1, x)

P(yj = 0|y1, · · · , yj−1, x) = β0 + z

T

j β + γ1y1 + · · ·+ γj−1yj−1.

Models of the type (20) may be embedded into the GLM framework,

meaning some built-in functions in R (e.g. glm()) may be used to do

the computing.

In general, however, you need to write your own program to do the

analysis.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 98 / 143

§6.6.1 Conditional models: Asymmetric models (optional)4

Example: Reported happiness. Clogg (1982) investigated the

association between gender (x), years in school (Y1), and reported

happiness (Y2). The data are given in the following. Since x and Y1 are

prior to the statement about happiness, Y2 is modelled conditionally on Y1

and x .

Table 8: Cross classification of gender, reported happiness, and years of schooling

Years of school completed

Gender Reported happiness < 12 12 13-16 ≥ 17

Male Not too happy 40 21 14 3

Pretty happy 131 116 112 27

Very happy 82 61 55 27

Female Not too happy 62 26 12 3

Pretty happy 155 156 95 15

Very happy 87 127 76 15

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 99 / 143

§6.6.1 Conditional models: Symmetric models (optional)1

Symmetric models

Suppose the response vector Y = (y1, · · · , ym).

If there is no natural ordering among y1, · · · , ym, or if one does not

want to use this ordering, models that treat response components in a

symmetric way are more sensible. An example is Visual Impairment

Study seen in Chapter 1.

Now focus on the case where all y1, · · · , ym are binary. (General

cases can be handled similarly.)

Symmetric conditional models can be developed by specifying a

conditional distribution to be used:

P(yj = 1|yk , k 6= j ; xj), j = 1, · · · ,m (21)

For binary responses such conditional distributions uniquely determine

the joint distribution.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 100 / 143

§6.6.1 Example: Visual impairment study

Table 9: Visual impairment data, from Liang, Zeger & Qaqish (1992)

White Black

Visual Age

impairment 40-50 51-60 61-70 70+ 40-50 51-60 61-70 70+ Total

Left eye

Yes 15 24 42 139 29 38 50 85 422

No 617 557 789 673 750 574 473 344 4777

Right eye

Yes 19 25 48 146 31 37 49 93 448

No 613 556 783 666 748 575 474 336 4751

Vector binary response variable (y1, y2), where y1 = 1 if left-eye

impaired, 0 otherwise; y2 = 1 if right-eye impaired, 0 otherwise.

Covariates: Age (yrs.), Race (W or B).

Aim: find the effect of race and age on visual impairment.

Complication: y1 and y2 are correlated.

Methods: multivariate models for correlated responses; conditional

models; asymmetric models, marginal models, GEE, etc..

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 101 / 143

§6.6.1 Conditional models: Symmetric models (optional)2

Symmetric models

Symmetric logistic models are a natural choice for (21):

pi = P(yj = 1|yk , k 6= j ; xj) = h(α(wj ;θ) + xTj βj), j = 1, · · · ,m

(22)

where h(t) =

et

1 + et

is the logistic cdf; and α(·) is some function of a

parameter θ and wj =

∑

k 6=j yk .

When k = 2,

pi = P(yj = 1|yk , k 6= j ; xj) = h(θ0+θ1yk +xTj βj), j , k = 1, 2. (23)

Full likelihood estimation procedure for (22) or (23) may be

computationally cumbersome, because of the normalising constant

involved in the joint pmf.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 102 / 143

§6.6.1 Conditional models: Symmetric models (optional)3

Instead, a quasi-likelihood approach (Conolly and Liang, 1988) may

be used, with an independent working quasi-likelihood and

quasi-score function for each cluster i(= 1, · · · , n):

Li (β,θ) =

m∏

j=1

pi

yij

ij (1− piij)1−yij

Si (β,θ) =

m∑

j=1

∂piij

∂(βT ,θT )T

σ−2ij (yij − piij(β,θ))

where yi = (yi1, · · · , yij , · · · , yim)T are the responses in cluster i ,

piij(β,θ) = P(yij = 1|·) is defined by (22), and

σ2ij = piij(β,θ)(1− piij(β,θ)).

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 103 / 143

§6.6.1 Conditional models: Symmetric models (optional)4

Denoting Mi = diag

{

∂pii1

∂(βT ,θT )T

, · · · , ∂piim

∂(βT ,θT )T

}

,

Σi = diag{σ2i1, · · · , σ2im} and pi = (pii1, · · · , piim)T , we can rewrite

Si (β,θ) in matrix form

Si (β,θ) = MiΣ

−1

i (yi − pii )

which is a multivariate extension of the quasi-score.

Roots (βˆ, θˆ) of the resulting generalised estimating equation (GEE)

S(β,θ) =

n∑

i=1

Si (β,θ) = 0

are consistent & asymptotically normal under regularity assumptions:

(βˆ

T

, θˆ

T

)T

a∼ N((βT ,θT )T , Fˆ−1Vˆ Fˆ−1)

with Fˆ =

∑n

i=1 Mˆi Σˆ

−1

i Mˆi and Vˆ =

∑n

i=1 Sˆi Sˆ

T

i .

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 104 / 143

§6.6.1 Two drawbacks of the conditional models

1 They measure the effect of x on a binary component yj conditional on

incorporating into the effects of other yk , k 6= j . Thus the model is

not able to provide prediction based on x alone.

2 Interpretation of the effects depends on the dimension of y.

Both drawbacks can be avoided by using a marginal modelling approach.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 105 / 143

§6.6.2 Marginal models

In many situations the primary interest is to analyse the marginal

mean of the response given the covariates. The association between

the responses is often of secondary interest. An example is the visual

impairment study.

Marginal models were first proposed by Liang & Zeger (1986) and

Zeger & Liang (1986) in the context of longitudinal data with many

short time series. Their modelling and estimation approach is based

on GEE and has subsequently been modified and further developed.

Focus in this subsection is on modelling and estimating marginal

means of correlated and categorical response data by the first-order

generalised estimating equations (GEE1).

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 106 / 143

§6.6.2 Marginal models for correlated univariate responses (1)

In marginal models, the effects of covariates on responses and the

association between responses is modelled separately.

Let yi = (yi1, · · · , yimi )T be the vector of responses, and

xTi = (x

T

i1, · · · , xTimi )

be the vector of covariates from cluster i , i = 1, · · · , n. Here the

cluster size mi may vary with the cluster index i .

Response observations within the same cluster are correlated.

Response observations between different clusters are assumed

independent of each other.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 107 / 143

§6.6.2 Marginal models for correlated univariate responses (2)

Specification of a marginal model is as following

1 The marginal means of yij , j = 1, · · · ,mi , are assumed correctly

specified by common univariate response models:

µij(β) = E (yij |xij) = h(zTij β) (24)

where h(·) is a response function, e.g. a logit function, and zij is an

appropriate design vector.

2 The marginal variance of each yij is specified as a function of µij :

σ2ij = var(yij |xij) = v(µij)φ (25)

where v(·) is the variance function of known form.

3 The correlation between yij and yik is a function of µij = µij(β),

µik = µik(β) and perhaps of additional association parameters α:

corr(yij , yik) = c(µij , µik ,α) (26)

with the function c(·, ·, ·) being of known form.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 108 / 143

§6.6.2 Marginal models for correlated univariate responses (3)

Remarks:

1 Parameters (β,α) are the same for each cluster. Hence marginal

models are appropriate for analysing population-averaged effects.

2 Marginal effects β can be consistently estimated even if the correlation

function is incorrectly specified. However, the efficiency of the

estimator βˆ can be compromised.

Since the primary scientific objective is often the regression

relationship, it is important to correctly specify the marginal mean

structure (24), while c(µij , µik ,α) only needs to be a working

correlation for the association between yij and yik . Together with

(25) a working covariance matrix cov(yi ) = Σi (β,α) can be

obtained, where an additional dispersion parameter φ is involved but

suppressed for presentation simplicity.

Two main approaches for specifying the working correlations or

covariances have been considered in the literature.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 109 / 143

§6.6.2 Marginal models for correlated univariate responses (4)

First approach: The variance structure (25) is supplemented by a

working correlation matrix Ri (α), so that the working covariance

matrix is of the form

Σi (β,α) = C

1/2

i (β)Ri (α)C

1/2

i (β),

where Ci (β) = diag [var(yij |xij)] = diag{σ2i1, · · · , σ2imi}. Common choices

for Ri (α):

1 working independence model: Ri (α) = I , the identity matrix.

2 equicorrelation (or exchangeable) model: corr(yij , yik) = α for all

j 6= k, i.e. α reduces to be a scalar, and Ri (α) =

1 α · · · α

α 1 · · · α

...

...

. . .

...

α α · · · 1

.

3 Ri (α) completely unspecified, except being positive definite, i.e.

αjk = corr(yij , yik), j < k.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 110 / 143

§6.6.2 Marginal models for correlated univariate responses (5)

Second approach: For binary responses the odds ratio is an alternative

measure of association that is easier to interpret and has some desirable

properties.

The odds ratio for yij , yik , j , k = 1, · · · ,m, j 6= k, is defined as

γijk =

P(yij = 1, yik = 1) · P(yij = 0, yik = 0)

P(yij = 1, yik = 0) · P(yij = 0, yik = 1)

From this definition, the probability P(yij = yik = 1), denoted as pii11,

can be shown to satisfy

pii11 = E (yijyik) =

{

1−(piij+piik )(1−γijk )−s(piij ,piik ,γijk)

2(γijk−1) if γijk 6= 1

piijpiik if γijk = 1

(27)

with

s(piij , piik , γijk) =

(

[1−(piij +piik)(1−γijk)]2 − 4(γijk−1)γijkpiijpiik

)1/2

.

It can be seen that the covariance matrix of yi = (yi1 · · · , yimi )T can

be expressed as a function of (piij , piik , γijk).

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 111 / 143

§6.6.2 Marginal models for correlated univariate responses (6)

One advantage of the odds ratio is that it is unconstrained, being able

to take any positive values. In contrast,

corr(yij , yik) =

pii11 − piijpiik√

piij(1− piij)piik(1− piik)

,

the correlation between yij and yik is constrained (i.e. its possible

values may all fall into a subset of [−1, 1]), because the involved pii11

is constrained by

max(0, piij + piik − 1) ≤ pii11 ≤ min(piij , piik).

Note

pii11 = P(yij = 1) + P(yik = 1)−P(yij = 1 or yik = 1) ≥ piij +piik − 1.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 112 / 143

§6.6.2 Marginal models for correlated univariate responses (7)

To reduce the number of unknown parameters involved in the odds

ratio, we assume γijk = γijk(α), a function of a vector parameter α.

Common choices of γijk(α):

1 γijk = γ, for all i , j , k.

2 log γijk = α

Tωijk .

Using the relation (27), cov(yi ) = Σi (β,α) is also a function of β

and α.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 113 / 143

§6.6.2 Marginal models for correlated univariate responses (8)

Some examples of marginal models:

I Continuous responses:

µij(β) = E (yij |xij) = zTij β; var(yij |xij) = φ = σ2; corr(yij , yik) = αjk .

II Binary responses:

µij(β) = piij(β) = P(yij = 1|xij), log piij(β)

1− piij(β) = z

T

ij β;

var(yij |xij) = piij(β)(1− piij(β));

corr(yij , yik) = 0 (independence struc.) or γijk = α (equal odds ratio).

III Count data:

logµij(β) = log E (yij |xij) = zTij β;

var(yij |xij) = µij(β)φ;

corr(yij , yik) = α (equicorrelation).

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 114 / 143

§6.6.2 GEE approach in statistical inference (1)

For cluster i = 1, · · · , n, let yi = (yi1, · · · , yimi )T , xi = (xTi1, · · · , xTimi )T ,

µi (β) = (µi1(β1), · · · , µimi (βmi ))T and Σi (β,α) be defined as before.

In linear models for Gaussian responses, there is no problem to

perform maximum likelihood estimation, since specification of means

and covariances determines the likelihood function.

This is not the case with non-Gaussian data. Therefore a generalised

estimating approach is proposed.

Keeping the association parameter α and φ, if present, fixed for the

moment, the generalised estimating equation (GEE) for effect β is

Sβ(β,α) =

n∑

i=1

ZTi Di (β)Σ

−1

i (β,α)(yi − µi (β)) = 0, (28)

with mi -row design matrix Zi = (zi1, · · · , zimi )T and diagonal

matrices Di (β) = diag{Dij(β)}, Dij(β) = ∂h∂ηij evaluated at ηij = zTij β.

The middle-part of (28) is a multivariate quasi-score function.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 115 / 143

§6.6.2 GEE approach in statistical inference (2)

A GEE estimate βˆ is computed by iterating a modified Fisher scoring

algorithm w.r.t. β and estimation of α (and φ):

(i) Given current estimates αˆ (and φˆ), the GEE (28) for βˆ is solved by

the iterations

βˆ

(k+1)

= βˆ

(k)

+ (Fˆ (k))−1Sˆβ(βˆ

(k)

, αˆ), k = 0, 1, 2, · · · ,

with

Fˆ (k) =

n∑

i=1

ZTi Di (βˆ

(k)

)Σ−1i (βˆ

(k)

, αˆ)Di (βˆ

(k)

)Zi

being the observed quasi-information matrix.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 116 / 143

§6.6.2 GEE approach in statistical inference (3)

(ii) If α is unknown, it can be consistently estimated by a method of

moments based on Pearson residuals rˆij =

yij − µˆij√

v(µˆij)

.

The dispersion parameter φ is estimated consistently by

φˆ =

1

N − p

n∑

i=1

mi∑

j=1

rˆ 2ij , with N =

∑n

i=1 mi and p = dim(β).

Estimation of α depends on the choice of Ri (α). For exchangeable

correlation matrix Ri (α) with dim(α) = 1,

αˆ =

[

φˆ

{

n∑

i=1

1

2

mi (mi − 1)− p

}]−1 n∑

i=1

∑

k>j

rˆik rˆij .

An unspecified working correlation matrix R can be estimated by

Rˆ =

1

nφˆ

n∑

i=1

C

− 12

i (yi − µˆi )(yi − µˆi )TC

− 12

i

if all cluster sizes mi = m and m << n.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 117 / 143

§6.6.2 GEE approach in statistical inference (4)

Cycling between Fisher scoring steps for β and consistent estimation

of α and φ leads to a consistent estimation of β.

Alternatively, α (and possible φ) can be estimated by simultaneously

solving an additional estimating equation. Details are not pursued

here.

Under regularity conditions, the GEE estimator βˆ satisfies

βˆ

a∼ N(β,F−1VF−1)

with F =

∑n

i=1 Z

T

i DiΣ

−1

i DiZi and V =

∑n

i=1 Z

T

i DiΣ

−1

i SiΣ

−1

i DiZi ,

where Si is the true covariance matrix of yi .

cov(βˆ) is approximated by the “sandwich matrix”:

Aˆ = Fˆ−1

{

n∑

i=1

ZTi Dˆi Σˆ

−1

i (yi − µˆi )(yi − µˆi )T Σˆ−1i DˆiZi

}

Fˆ−1

cov(Sβ(β,α)) = V and Sβ(β,α)

a∼ N(0,V ).

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 118 / 143

§6.6.2 Visual impairment example (1)

We use GEE approach to fit a marginal model to the visual impairment

data. R packages gee with command gee() and geepack with command

geeglm() can be used. (There are a few problems with both packages).

library(gee); library(geepack)

VI.dat <- read.csv("D:/MAST90139/Visual-impairment.csv"); VI.dat

ID Yes No Age Race

1 1 15 617 40to50 White

2 1 19 613 40to50 White

3 2 24 557 51to60 White

4 2 25 556 51to60 White

5 3 42 789 61to70 White

6 3 48 783 61to70 White

7 4 139 673 70+ White

8 4 146 666 70+ White

9 5 29 750 40to50 Black

10 5 31 748 40to50 Black

11 6 38 574 51to60 Black

12 6 37 575 51to60 Black

13 7 50 473 61to70 Black

14 7 49 474 61to70 Black

15 8 85 344 70+ Black

16 8 93 336 70+ Black

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 119 / 143

§6.6.2 Visual impairment example (2)

VI1e<- gee(cbind(Yes, No) ~ Age + Race, id = ID,

data = VI.dat, family = binomial, corstr = "exchangeable")

summary(VI1e)

GEE: GENERALIZED LINEAR MODELS FOR DEPENDENT DATA

gee S-function, version 4.13 modified 98/01/27 (1998)

Model:

Link: Logit

Variance to Mean Relation: Binomial

Correlation Structure: Exchangeable

Call:

gee(formula = cbind(Yes, No) ~ Age + Race, id = ID, data = VI.dat,

family = binomial, corstr = "exchangeable")

Summary of Residuals:

Min 1Q Median 3Q Max

15.0 28.0 39.9 58.6 145.8

Coefficients:

Estimate Naive S.E. Naive z Robust S.E. Robust z

(Intercept) -3.225 0.1125 -28.66 0.0254 -127.22

Age51to60 0.478 0.1451 3.30 0.0167 28.65

Age61to70 0.837 0.1348 6.21 0.0905 9.26

Age70+ 1.973 0.1227 16.08 0.0531 37.13

RaceWhite -0.350 0.0768 -4.56 0.0662 -5.28

Estimated Scale Parameter: 0.571

Number of Iterations: 1

Working Correlation

[,1] [,2]

[1,] 1.000 0.886

[2,] 0.886 1.000

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 120 / 143

§6.6.2 Visual impairment example (3)

Call:

gee(formula = cbind(Yes, No) ~ Age + Race, id = ID, data = VI.dat,

family = binomial, corstr = "exchangeable")

Summary of Residuals: ###Residuals calculation is not correct in gee().

Min 1Q Median 3Q Max

15.0 28.0 39.9 58.6 145.8

Coefficients:

Estimate Naive S.E. Naive z Robust S.E. Robust z

(Intercept) -3.225 0.1125 -28.66 0.0254 -127.22

Age51to60 0.478 0.1451 3.30 0.0167 28.65

Age61to70 0.837 0.1348 6.21 0.0905 9.26

Age70+ 1.973 0.1227 16.08 0.0531 37.13

RaceWhite -0.350 0.0768 -4.56 0.0662 -5.28

Estimated Scale Parameter: 0.571

Number of Iterations: 1

Working Correlation

[,1] [,2]

[1,] 1.000 0.886

[2,] 0.886 1.000

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 121 / 143

§6.6.2 Visual impairment example (4)

total=VI.dat$Yes+VI.dat$No

pearson.res1e=

(VI.dat$Yes-total*VI1e$fitted)/sqrt(total*(VI1e$fitted)*(1-VI1e$fitted))

sum(pearson.res1e^2)/11

[1] 0.5708722 #estimate of \phi based on Pearson residuals

> VI1e$scale

[1] 0.5708722

(VI1e$scale*3)^(-1)*sum(pearson.res1e[2*(1:8)-1]*pearson.res1e[2*(1:8)])

[1] 0.8861748

> VI1e$working.correlation

[,1] [,2]

[1,] 1.0000000 0.8861748

[2,] 0.8861748 1.0000000

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 122 / 143

§6.6.2 Visual impairment example (5)

VI3 <- geeglm(cbind(Yes, No)~Age + Race, family=binomial(link="logit"),

data=VI.dat, id=ID, corstr = "exchangeable", std.err="san.se")

summary(VI3)

Call:

geeglm(formula = cbind(Yes, No) ~ Age + Race, family = binomial(link = "logit"),

data = VI.dat, id = ID, corstr = "exchangeable", std.err = "san.se")

Coefficients:

Estimate Std.err Wald Pr(>|W|)

(Intercept) -3.2253 0.0254 16184.7 < 2e-16 ***

Age51to60 0.4783 0.0167 820.7 < 2e-16 ***

Age61to70 0.8374 0.0905 85.7 < 2e-16 ***

Age70+ 1.9728 0.0531 1378.3 < 2e-16 ***

RaceWhite -0.3498 0.0662 27.9 1.3e-07 ***

---

Estimated Scale Parameters: #IScale seems defined differently in gee() and geeglm()

Estimate Std.err

(Intercept) 0.000604 0.000323

Correlation: Structure = exchangeable Link = identity

Estimated Correlation Parameters:

Estimate Std.err #The estimation here is about half of that in gee().

alpha 0.441 0.274

Number of clusters: 8 Maximum cluster size: 2

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 123 / 143

§6.6.2 Visual impairment example (6)

pearson.res3=(VI.dat$Yes/total-VI3$fitted)/sqrt((VI3$fitted)*(1-VI3$fitted)/total)

phi.est=sum(pearson.res3^2)/11 #=0.5709

sum((summary(VI3)$deviance.resid)^2)/11

[1] 0.577 #very close to 0.5709.

#Using deviance residuals or Pearson residuals give similar results.

VI3$geese$gamma #How is gamma defined?

(Intercept)

0.000604

(phi.est*3)^(-1)*sum(pearson.res3[2*(1:8)-1]*pearson.res3[2*(1:8)])

[1] 0.8862 #estimated alpha

(0.577*3)^(-1)*sum(summary(VI3)$devi[2*(1:8)-1]*summary(VI3)$devi[2*(1:8)])

[1] 0.872

VI3$geese$alpha

alpha

0.4415 #approximately half of 0.8862. Then How is alpha defined here?

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 124 / 143

§6.6.2 Marginal models for correlated responses having k categories (1)

Suppose categorical responses Yij , j = 1, · · · ,mi , are observed in

cluster i , i = 1, · · · , n.

For simplicity, each Yij has the same k categories and is coded by a

dummy vector

yij = (yij1, · · · , yijq)T , q = k − 1

Let yTi = (y

T

i1, · · · , yTimi ) be observations of yij in cluster i ;

xTi = (x

T

i1, · · · , xTimi ) be the corresponding covariate observations.

For data involving categorical responses, a marginal categorical

response model can be defined for each response variable, and then

supplemented by a working association model to relate to the

responses with each other with in each cluster.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 125 / 143

§6.6.2 Marginal models for correlated responses having k categories (2)

Here we focus on ordinal responses. Other types of categorical responses

can be handled similarly.

(i) The vector of marginal means or categorical probabilities of Yij is

assumed being correctly specified by an ordinal response model :

piij(β) = (piij1(β), · · · , piijq(β))T = h(Zijβ)

with piijr = P(Yij ≤ r |xij) =

∑r

`=1 P(yij` = 1|xij), and the response

function h(·) and design matrix Zij .

(ii) The marginal covariance function of yij is given by

Σij = cov(yij |xij) = diag(piij)− piijpiTij

i.e. the covariance matrix of a multinomial random variable.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 126 / 143

§6.6.2 Marginal models for correlated responses having k categories (3)

(iii) Association between Yij and Yik is modelled by a working correlation matrix

Ri or by global cross-ratios. For example, the working matrix of

exchangeable correlations is

Ri (α) =

I Q · · · Q

QT I · · · Q

...

...

. . .

...

QT QT · · · I

where Qq×q contains α to be estimated by a method of moments.

Global cross-ratios (GCR), for each pair of categories ` and m of Yij and

Yik , are defined as

γijk (`,m) =

P(Yij ≤ `,Yik ≤ m)P(Yij > `,Yik > m)

P(Yij > `,Yik ≤ m)P(Yij ≤ `,Yik > m)

GCR can be modelled log-linearly, i.e. log(γijk(`,m)) = α`m or by a

regression model including covariate effects.

For the model specified by (i), (ii) and (iii), the involved regression and association

parameters can be estimated by a multivariate extension of the GEE approach

discussed before. Details not pursued here.

R packages multgee, geepack and repolr may be used to fit the above model.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 127 / 143

§6.6.2 Likelihood-based inference for marginal models

The GEE approach is not likelihood-based as it does not require

specification of the joint distribution of multivariate response vector

yi , i = 1, · · · , n.

Difficulty with the likelihood-based inference is due to the difficulty in

formulating this joint distribution.

So in general it is difficult to perform likelihood-based inference for

marginal models.

But there are new developments:

Constructing joint multivariate distribution by copulas

Vector GLM

Bayesian inference.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 128 / 143

§6.6.3 Marginal models for longitudinal data (1)

In many longitudinal studies, data consist of a small or moderate

number of repeated observations for many subjects; and the main

objective is to analyse the effects of covariates on a response variable,

without conditioning on previous responses.

In this setting it is more natural to view the data as a cross section of

individual time series yi = (y

T

i1, · · · , yTiTi )T , xi = (xTi1, · · · , xTiTi )T ,

i = 1, · · · , n. It is also often assume that, given the covariates,

individual time series are mutually independent.

Marginal models introduced in §3.5.2 can be used for these cases.

Suppose yit = (yit1, · · · , yitq)T are q = k − 1 dummy variables

describing a multi-categorical response Yit with k categories.

For simplicity, assume the number of categories is the same for all Yit

responses.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 129 / 143

§6.6.3 Marginal models for longitudinal data (2)

Marginal GEE models for longitudinal data are then defined as

(i) The marginal mean is correctly specified by a response model

µit(β) = E (yit |xit) = h(ηit), ηit = Zitβ

where ηit is q-dimensional with components ηitr = z

T

itrβ, z

T

itr is the

rth row of Zit .

(ii) The association structure for yi = (y

T

i1, · · · , yTiTi )T is specified by a

“working” covariance matrix Σi (β,α), depending on regression

parameters β, association parameters α, and possibly an additional

nuisance parameter φ.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 130 / 143

§6.6.3 Marginal models for longitudinal data (3)

Σi (β,α) can be determined by a variance function for yit together

with a “working” correlation matrix Ri (α) or some odds ratio model

γ(α).

Common choices for Ri (α):

1 stationary correlations between yis and yit

(Ri (α))st = α(|t − s|) (29)

2 its special form α(|t − s|) = α|t−s|. AR(1) correlation.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 131 / 143

Statistical inference in marginal GEE models (1)

Estimation of β using GEEs is carried out in complete analogy to

§3.5.2.

Denote E (yi |xi ) = µi (β) = (µTi1, · · · ,µTiTi )T , ZTi = (Zi1, · · · ,ZiTi ),

and (block-) diagnol matrices Di (β) = diag{Dit(β)}, with

Dit(β) =

∂hT

∂ηit

|ηit= Zitβ.

The GEE for β is

sβ(β,α) =

n∑

i=1

ZTi Di (β)Σ

−1

i (β,α)(yi − µi (β)) = 0 (30)

Given current estimates αˆ and φˆ, the GEE (30) for βˆ is solved by

βˆ

(k+1)

= βˆ

(k)

+

(

Fˆ (k)

)−1

sˆ(k), k = 1, 2, · · ·

with the “working” Fisher information matrix

Fˆ (k) =

∑n

i=1 Z

T

i Di (βˆ

(k)

)Σ−1i (βˆ

(k)

, αˆ)DTi (βˆ

(k)

)Zi and

sˆ(k) = sβ(βˆ

(k)

, αˆ).

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 132 / 143

Statistical inference in marginal GEE models (2)

Given a current estimate of β, α and φ can be estimated from

Pearson residuals

rˆit =

yit − µˆit

(v(µˆit))1/2

by the method of moments.

1 φˆ =

1

N − p

n∑

i=1

Ti∑

t=1

rˆ 2it , N =

n∑

i=1

Ti

2 Estimation of α depends on the choice of Ri (α). For the exchangeable

correlation model

αˆ =

{

φˆ

[

n∑

i=1

1

2

Ti (Ti − 1)− p

]}−1 n∑

i=1

∑

t>s

rˆit rˆis

3 A totally unspecified R = R(α) can be estimated if T << n.

4 In particular for binary and categorical observations, estimation of α

via GEE2 is often preferrable.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 133 / 143

Statistical inference in marginal GEE models (3)

Consistency and asymptotic normality results for βˆ can be obtained

along the lines of asymptotic theory of quasi-MLEs for independent

observations if Ti , i = 1, · · · , n are fixed and n→∞.

If αˆ is

√

n-consistent given β and φ; and φˆ is

√

n-consistent given β,

then βˆ is

√

n-consistent and asymptotically normal, i.e.

βˆ

a∼ N(β,F−1VF−1)

with F =

n∑

i=1

ZTi DiΣ

−1

i D

T

i Zi , V =

n∑

i=1

ZTi DiΣ

−1

i Cov(yi )Σ

−1

i D

T

i Zi .

Asymptotic covariance matrix A = F−1VF−1 can be consistently

estimated by replacing β, α and φ by their respective consistent

estimates and Cov(yi ) by (yi − µˆi )(yi − µˆi )T , i.e. by sandwich matrix

Aˆ = Ĉov(βˆ)

a≈ Fˆ−1

{

n∑

i=1

ZTi Dˆi Σˆ

−1

i (yi−µˆi )(yi−µˆi )T Σˆ−1i DˆTi Zi

}

Fˆ−1 (31)

Based on the preceeding asymptotic result, confidence intervals for β

and Wald and score tests may be constucted; cf. §2.3.1.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 134 / 143

Example 6.4 Ohio children (1)

The ohio data concern 537 children from Steubenville, Ohio and

were taken as part of a study on the effects of air pollution.

Children were in the study for four years from age seven to ten.

The response is whether they wheezed or not.

The variables are

resp: an indicator of wheeze status (1=yes, 0=no)

id: an identifier for the child, taking values from 0 to 536

age: 7 yrs = −2; 8 yrs = −1; 9 yrs =0; 10 yrs =1

smoke: mother’s smoking status at start of the study

(1=smoker, 0=nonsmoker)

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 135 / 143

Example 6.4 Ohio children (2)

Some analysis has been done to the data in R, producing the following

output.

> head(ohio)

resp id age smoke

1 0 0 -2 0

2 0 0 -1 0

3 0 0 0 0

4 0 0 1 0

5 0 1 -2 0

6 0 1 -1 0

> tail(ohio)

resp id age smoke

2143 1 535 0 1

2144 1 535 1 1

2145 1 536 -2 1

2146 1 536 -1 1

2147 1 536 0 1

2148 1 536 1 1

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 136 / 143

Example 6.4 Ohio children (3)

> str(ohio)

'data.frame': 2148 obs. of 4 variables:

$ resp : int 0 0 0 0 0 0 0 0 0 0 ...

$ id : int 0 0 0 0 1 1 1 1 2 2 ...

$ age : int -2 -1 0 1 -2 -1 0 1 -2 -1 ...

$ smoke: int 0 0 0 0 0 0 0 0 0 0 ...

> library(geepack}

> fit.exch <- geeglm(resp~age+smoke, family=binomial(link="logit"),

data=ohio, id=id, corstr = "exchangeable", std.err="san.se")

> summary(fit.exch)

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 137 / 143

Example 6.4 Ohio children (4)

> summary(fit.exch)

Call:

geeglm(formula = resp ~ age + smoke, family = binomial(link = "logit"),

data = ohio, id = id, corstr = "exchangeable", std.err = "san.se")

Coefficients:

Estimate Std.err Wald Pr(>|W|)

(Intercept) -1.880 0.114 272.597 < 2e-16 ***

age -0.113 0.044 6.684 0.00973 **

smoke 0.265 0.178 2.224 0.13588

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Estimated Scale Parameters:

Estimate Std.err

(Intercept) 0.9985 0.1116

Correlation: Structure = exchangeable Link = identity

Estimated Correlation Parameters:

Estimate Std.err

alpha 0.3543 0.06244

Number of clusters: 537 Maximum cluster size: 4

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 138 / 143

Example 6.4 Ohio children (5)

Use the above output to answer the following questions.

1 Let yit be the response value resp of child i at age t. Write down the

model involved in the analysis, including the mean, variance and

correlation coefficient of yit ’s. Give the estimates of the parameters

appearing in the model.

2 Write down the model’s design matrix for data where id=536.

3 Estimate the odds ratio of wheezing for a child for every one year

older in age. Also calculate an approximate standard error of your

odds ratio estimate.

4 Estimate the odds ratio of wheezing for a 10-year old child with a

smoking mother versus a 9-year old child with nonsmoking mother.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 139 / 143

Example 6.4 Ohio children (6)

1 Let yit be the response value resp of child i at age t. Write down the

model involved in the analysis, including the mean, variance and

correlation coefficient of yit ’s. Give the estimates of the parameters

appearing in the model.

Let piit = P(yit = 1|age, smoke), t = 1, · · · , 4; i = 1, · · · , 537. Then

the model is specified by the following

log

pˆiit

1− pˆiit = βˆ0 + βˆ1 · age + βˆ2 · smoke

= −1.880− 0.113× age + 0.265× smoke

V̂ar(yit) = φˆpˆiit(1− pˆiit) = 0.9985e

−1.880−0.113·age+0.265·smoke

1 + e−1.880−0.113·age+0.265·smoke

Ĉorr(yit , yis) = αˆ = 0.3543 for t 6= s

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 140 / 143

Example 6.4 Ohio children (7)

2 Write down the model’s design matrix for data where id=536.

Z (id = 536) =

1 −2 1

1 −1 1

1 0 1

1 1 1

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 141 / 143

Example 6.4 Ohio children (8)

3 Estimate the odds ratio of wheezing for a child for every one year

older in age. Also calculate an approximate standard error of your

odds ratio estimate.

ÔR = eβˆ1 = e−0.113 = 0.893

s.e.(ÔR) ≈ eβˆ1 · s.e.(βˆ1) = e−0.113 × 0.044 = 0.039.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 142 / 143

Example 6.4 Ohio children (9)

4 Estimate the odds ratio of wheezing for a 10-year old child with a

smoking mother versus a 9-year old child with nonsmoking mother.

For a 10-year old child with a smoking mother

log

pˆiit

1− pˆiit = βˆ0 + βˆ1 · 1 + βˆ2 · 1 = −1.880− 0.113 + 0.265

For a 9-year old child with a non-smoking mother

log

pˆiit

1− pˆiit = βˆ0 + βˆ1 · 0 + βˆ2 · 0 = −1.880

Therefore

ÔR =

eβˆ0+βˆ1+βˆ2

eβˆ0

= eβˆ1+βˆ2 = e−0.113+0.265 = e0.152 = 1.164.

Multivariate GLM (Ch6) §6.6 Multivariate models for correlated responses Guoqi Qian 143 / 143