程序代写案例-TISTICS 330

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
STATISTICS 330 COURSE NOTES
Cyntha A. Struthers
Department of Statistics and Actuarial Science, University of Waterloo
Spring 2021 Edition
ii 1

Contents
1. Preview 1
1.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Univariate Random Variables 5
2.1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4 Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Location and Scale Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.6 Functions of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . . 30
2.7 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.8 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.9 Variance Stabilizing Transformation . . . . . . . . . . . . . . . . . . . . . . 44
2.10 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.11 Calculus Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.12 Chapter 2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3. Multivariate Random Variables 61
3.1 Joint and Marginal Cumulative Distribution Functions . . . . . . . . . . . . 62
3.2 Bivariate Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3 Bivariate Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . 66
3.4 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.6 Joint Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
3.7 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
3.8 Joint Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . 102
3.9 Multinomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.10 Bivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.11 Calculus Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.12 Chapter 3 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
iii
iv CONTENTS
4. Functions of Two or More Random Variables 121
4.1 Cumulative Distribution Function Technique . . . . . . . . . . . . . . . . . 121
4.2 One-to-One Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 125
4.3 Moment Generating Function Technique . . . . . . . . . . . . . . . . . . . . 137
4.4 Chapter 4 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
5. Limiting or Asymptotic Distributions 151
5.1 Convergence in Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
5.2 Convergence in Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5.3 Weak Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 159
5.4 Moment Generating Function Technique for Limiting Distributions . . . . . 167
5.5 Additional Limit Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
5.6 Chapter 5 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6. Maximum Likelihood Estimation - One Parameter 183
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6.2 Maximum Likelihood Method . . . . . . . . . . . . . . . . . . . . . . . . . . 185
6.3 Score and Information Functions . . . . . . . . . . . . . . . . . . . . . . . . 191
6.4 Likelihood Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
6.5 Limiting Distribution of Maximum Likelihood Estimator . . . . . . . . . . . 208
6.6 Con…dence Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
6.7 Approximate Con…dence Intervals . . . . . . . . . . . . . . . . . . . . . . . 218
6.8 Chapter 6 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
7. Maximum Likelihood Estimation - Multiparameter 227
7.1 Likelihood and Related Functions . . . . . . . . . . . . . . . . . . . . . . . . 228
7.2 Likelihood Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.3 Limiting Distribution of Maximum Likelihood Estimator . . . . . . . . . . . 240
7.4 Approximate Con…dence Regions . . . . . . . . . . . . . . . . . . . . . . . . 242
7.5 Chapter 7 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
8. Hypothesis Testing 253
8.1 Test of Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
8.2 Likelihood Ratio Tests for Simple Hypotheses . . . . . . . . . . . . . . . . . 257
8.3 Likelihood Ratio Tests for Composite Hypotheses . . . . . . . . . . . . . . . 265
8.4 Chapter 8 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
9. Solutions to Chapter Exercises 275
9.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276
9.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
9.3 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
9.4 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
CONTENTS v
9.5 Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
9.6 Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
9.7 Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
10. Solutions to Selected End of Chapter Problems 337
10.1 Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
10.2 Chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
10.3 Chapter 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412
10.4 Chapter 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435
10.5 Chapter 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
10.6 Chapter 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
10.7 Chapter 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
11. Summary of Named Distributions 469
12. Distribution Tables 471
0 CONTENTS
Preface
In order to provide improved versions of these Course Notes for students in subsequent
terms, please email corrections, sections that are confusing, or comments/suggestions to
[email protected].
1. Preview
The following examples will illustrate the ideas and concepts discussed in these Course
Notes. They also indicate how these ideas and concepts are connected to each other.
1.1 Example
The number of service interruptions in a communications system over 200 separate days is
summarized in the following frequency table:
Number of interruptions 0 1 2 3 4 5 > 5 Total
Observed frequency 64 71 42 18 4 1 0 200
It is believed that a Poisson model will …t these data well. Why might this be a reasonable
assumption? (PROBABILITY MODELS )
If we let the random variable X = number of interruptions in a day and assume that
the Poisson model is reasonable then the probability function of X is given by
P (X = x) =
xe
x!
for x = 0; 1; : : :
where is a parameter of the model which represents the mean number of service inter-
ruptions in a day. (RANDOM VARIABLES, PROBABILITY FUNCTIONS, EXPEC-
TATION, MODEL PARAMETERS ) Since is unknown we might estimate it using the
sample mean
x =
64(0) + 71(1) + + 1(5)
200
=
230
200
= 1:15
(POINT ESTIMATION ) The estimate ^ = x is the maximum likelihood estimate of .
It is the value of which maximizes the likelihood function. (MAXIMUM LIKELIHOOD
ESTIMATION ) The likelihood function is the probability of the observed data as a function
of the unknown parameter(s) in the model. The maximum likelihood estimate is thus the
value of which maximizes the probability of observing the given data.
1
2 1. PREVIEW
In this example the likelihood function is given by
L() = P (observing 0 interruptions 64 times,. . . , > 5 interruptions 0 times;)
=
200!
64!71! 1!0!

0e
0!
64
1e
1!
71


5e
5!
1 1P
x=6
xe
x!
0
= c64(0)+71(1)++1(5)e(64+71++1)
= c230e200 for > 0
where
c =
200!
64!71! 1!0!

1
0!
64 1
1!
71
: : :

1
5!
1
The maximum likelihood estimate of can be found by solving dLd = 0 or equivalently
d logL
d = 0 and verifying that it corresponds to a maximum.
If we want an interval of values for which are reasonable given the data then we
could construct a con…dence interval for . (INTERVAL ESTIMATION ) To construct
con…dence intervals we need to …nd the sampling distribution of the estimator. In this
example we would need to …nd the distribution of the estimator
X =
X1 +X2 + +Xn
n
where Xi = number of interruptions in a day i, i = 1; 2; : : : ; 200. (FUNCTIONS OF
RANDOM VARIABLES: cumulative distribution function technique, one-to-one transfor-
mations, moment generating function technique) Since Xi Poisson() with E(Xi) =
and V ar(Xi) = the distribution of X for large n is approximately N(; =n) by the
Central Limit Theorem. (LIMITING DISTRIBUTIONS )
Suppose the manufacturer of the communications system claimed that the mean number
of interruptions was 1. Then we would like to test the hypothesis H : = 1. (TESTS OF
HYPOTHESIS ) A test of hypothesis uses a test statistic to measure the evidence based on
the observed data against the hypothesis. A test statistic with good properties for testing
H : = 0 is the likelihood ratio statistic, 2 log [L (0) =L (^)]. (LIKELIHOOD RATIO
STATISTIC ) For large n the distribution of the likelihood ratio statistic is approximately
2 (1) if the hypothesis H : = 0 is true.
1.2 Example
The following are relief times in hours for 20 patients receiving a pain killer:
1:1 1:4 1:3 1:7 1:9 1:8 1:6 2:2 1:7 2:7
4:1 1:8 1:5 1:2 1:4 3:0 1:7 2:3 1:6 2:0
It is believed that the Weibull distribution with probability density function
f(x) =


x1e(x=)

for x > 0; > 0; > 0
1.2. EXAMPLE 3
will provide a good …t to the data. (CONTINUOUS MODELS, PROBABILITY DENSITY
FUNCTIONS ) Assuming independent observations the (approximate) likelihood function
is
L(; ) =
20Q
i=1


x1i e
(xi=) for > 0; > 0
where xi is the observed relief time for the ith patient. (MULTIPARAMETER LIKELI-
HOODS ) The maximum likelihood estimates ^ and ^ are found by simultaneously solving
@ logL
@
= 0 and
@ logL
@
= 0
Since an explicit solution to these equations cannot be obtained, a numerical solution must
be found using an iterative method. (NEWTON’S METHOD) Also, since the maximum
likelihood estimators cannot be given explicitly, approximate con…dence intervals and tests
of hypothesis must be based on the asymptotic distributions of the maximum likelihood
estimators. (LIMITING OR ASYMPTOTIC DISTRIBUTIONS OF MAXIMUM LIKE-
LIHOOD ESTIMATORS )
4 1. PREVIEW
2. Univariate Random Variables
In this chapter we review concepts that were introduced in a previous probability course
such as STAT 220/230/240 as well as introducing new concepts. In Section 2:1 the concepts
of random experiments, sample spaces, probability models, and rules of probability are re-
viewed. The concepts of a sigma algebra and probability set function are also introduced.
In Section 2:2 we de…ne a random variable and its cumulative distribution. It is important
to note that the de…nition of a cumulative distribution is the same for all types of ran-
dom variables. In Section 2:3 we de…ne a discrete random variable and review the named
discrete distributions (Hypergeometric, Binomial, Geometric, Negative Binomial, Poisson).
In Section 2:4 we de…ne a continuous random variable and review the named continuous
distributions (Uniform, Exponential, Normal, and Chi-squared). We also introduce new
continuous distributions (Gamma, Two Parameter Exponential, Weibull, Cauchy, Pareto).
A summary of the named discrete and continuous distributions that are used
in these Course Notes can be found in Chapter 11. In Section 2:5 we review the
cumulative distribution function technique for …nding a function of a random variable and
prove a theorem which can be used in the case of a monotone function. In Section 2:6 we
review expectations of functions of random variables. In Sections 2:8 2:10 we introduce
new material related to expectation such as inequalities, variance stabilizing transforma-
tions and moment generating functions. Section 2.11 contains a number of useful
calculus results which will be used throughout these Course Notes.
2.1 Probability
To model real life phenomena for which we cannot predict exactly what will happen we
assign numbers, called probabilities, to outcomes of interest which re‡ect the likelihood of
such outcomes. To do this it is useful to introduce the concepts of an experiment and its
associated sample space. Consider some phenomenon or process which is repeatable, at
least in theory. We call the phenomenon or process a random experiment and refer to a
single repetition of the experiment as a trial. For such an experiment we consider the set
of all possible outcomes.
5
6 2. UNIVARIATE RANDOM VARIABLES
2.1.1 De…nition - Sample Space
A sample space S is a set of all the distinct outcomes for a random experiment, with the
property that in a single trial, one and only one of these outcomes occurs.
To assign probabilities to the events of interest for a given experiment we begin by de…ning
a collection of subsets of a sample space S which is rich enough to de…ne all the events of
interest for the experiment. We call such a collection of subsets a sigma algebra.
2.1.2 De…nition - Sigma Algebra
A collection of subsets of a set S is called a sigma algebra, denoted by B, if it satis…es the
following properties:
(1) ? 2 B where ? is the empty set
(2) If A 2 B then A 2 B
(3) If A1; A2; : : : 2 B then
1S
i=1
Ai 2 B
Suppose A1; A2; : : : are subsets of the sample space S which correspond to events of interest
for the experiment. To complete the probability model for the experiment we need to assign
real numbers P (Ai) ; i = 1; 2; : : :, where P (Ai) is called the probability of Ai. To develop
the theory of probability these probabilities must satisfy certain properties. The following
Axioms of Probability are a set of axioms which allow a mathematical structure to be
developed.
2.1.3 De…nition - Probability Set Function
Let B be a sigma algebra associated with the sample space S. A probability set function is
a function P with domain B that satis…es the following axioms:
(A1) P (A) 0 for all A 2 B
(A2) P (S) = 1
(A3) If A1; A2; : : : 2 B are pairwise mutually exclusive events, that is, Ai \ Aj = ? for all
i 6= j, then
P
1S
i=1
Ai

=
1P
i=1
P (Ai)
Note: The probabilities P (A1) ; P (A2) ; : : : can be assigned in any way as long they satisfy
these three axioms. However, if we wish to model real life phenomena we would assign the
probabilities such that they correspond to the relative frequencies of events in a repeatable
experiment.
2.1. PROBABILITY 7
2.1.4 Example
Let B be a sigma algebra associated with the sample space S and let P be a probability
set function with domain B. If A;B 2 B then prove the following:
(a) P (?) = 0
(b) If A;B 2 B and A and B are mutually exclusive events then P (A [B) = P (A)+P (B).
(c) P

A

= 1 P (A)
(d) If A B then P (A) P (B) Note: A B means a 2 A implies a 2 B.
Solution
(a) Let A1 = S and Ai = ? for i = 2; 3; : : :. Since
1S
i=1
Ai = S then by De…nition 2.1.3 (A3)
it follows that
P (S) = P (S) +
1P
i=2
P (?)
and by (A2) we have
1 = 1 +
1P
i=2
P (?)
By (A1) the right side is a series of non-negative numbers which must converge to the left
side which is 1 which is …nite which results in a contradiction unless P (?) = 0 as required.
(b) Let A1 = A, A2 = B, and Ai = ? for i = 3; 4; : : :. Since
1S
i=1
Ai = A [B then by (A3)
P (A [B) = P (A) + P (B) +
1P
i=3
P (?)
and since P (?) = 0 by the result (a) it follows that
P (A [B) = P (A) + P (B)
(c) Since S = A [ A and A \ A = ? then by (A2) and the result proved in (b) it follows
that
1 = P (S) = P

A [ A = P (A) + P A
or
P

A

= 1 P (A)
(d) Since B = (A \B) [ A \B = A [ A \B and A \ A \B = ? then by (b)
P (B) = P (A) +P

A \B. But by (A1), P A \B 0 so it follows that P (B) P (A).
2.1.5 Exercise
Let B be a sigma algebra associated with the sample space S and let P be a probability
set function with domain B. If A;B 2 B then prove the following:
(a) 0 P (A) 1
(b) P

A \ B = P (A) P (A \B)
(c) P (A [B) = P (A) + P (B) P (A \B)
8 2. UNIVARIATE RANDOM VARIABLES
For a given experiment we are sometimes interested in the probability of an event given
that we know that the event of interest has occurred in a certain subset of S. For example,
the experiment might involve people of di¤erent ages and we may be interest in an event
only for a given age group. This leads us to de…ne conditional probability.
2.1.6 De…nition - Conditional Probability
Let B be a sigma algebra associated with the sample space S and suppose A;B 2 B. The
conditional probability of event A given event B is
P (AjB) = P (A \B)
P (B)
provided P (B) > 0
2.1.7 Example
The following table of probabilities are based on data from the 2011 Canadian census. The
probabilities are for Canadians aged 25 34.
Highest level of education attained Employed Unemployed
No certi…cate,
diploma or degree 0:066 0:010
High school
diploma or equivalent 0:185 0:016
Postsecondary
certi…cate, diploma or degree 0:683 0:040
If a person is selected at random what is the probability the person
(a) is employed?
(b) has no certi…cate, diploma or degree?
(c) is unemployed and has at least a high school diploma or equivalent?
(d) has at least a high school diploma or equivalent given that they are unemployed?
Solution
(a) Let E be the event “employed”, A1 be the event “no certi…cate, diploma or degree”,
A2 be the event “high school diploma or equivalent”, and A3 be the event “postsecondary
certi…cate, diploma or degree”.
P (E) = P (E \A1) + P (E \A2) + P (E \A3)
= 0:066 + 0:185 + 0:683
= 0:934
2.1. PROBABILITY 9
(b)
P (A1) = P (E \A1) + P

E \A1

= 0:066 + 0:010
= 0:076
(c)
P (unemployed and have at least a high school diploma or equivalent)
= P

E \ (A2 [A3)

= P

E \A2

+ P

E \A3

= 0:016 + 0:040 = 0:056
(d)
P

A2 [A3j E

=
P

E \ (A2 [A3)

P

E

=
0:056
0:066
= 0:848
If the occurrence of event B does not a¤ect the probability of the event A, then the events
are called independent events.
2.1.8 De…nition - Independent Events
Let B be a sigma algebra associated with the sample space S and suppose A;B 2 B. A
and B are independent events if
P (A \B) = P (A)P (B)
2.1.9 Example
In Example 2.1.7 are the events, “unemployed” and “no certi…cate, diploma or degree”,
independent events?
Solution
The events “unemployed”and “no certi…cate, diploma or degree”are not independent since
0:010 = P

E \A1

6= P EP (A1) = (0:066) (0:076)
10 2. UNIVARIATE RANDOM VARIABLES
2.2 Random Variables
A probability model for a random experiment is often easier to construct if the outcomes of
the experiment are real numbers. When the outcomes are not real numbers, the outcomes
can be mapped to numbers using a function called a random variable. When the observed
data are numerical values such as the number of interruptions in a day in a communications
system or the length of time until relief after taking a pain killer, random variables are still
used in constructing probability models.
2.2.1 De…nition of a Random Variable
A random variable X is a function from a sample space S to the real numbers <, that is,
X : S ! <
such that P (X x) is de…ned for all x 2 <.
Note: ‘X x’is an abbreviation for f! 2 S : X(!) xg where f! 2 S : X(!) xg 2 B
and B is a sigma algebra associated with the sample space S.
2.2.2 Example
Three friends Ali, Benita and Chen are enrolled in STAT 330. Suppose we are interested in
whether these friends earn a grade of 70 or more. If we let A represent the event “Ali earns
a grade of 70 or more”, B represent the event “Benita earns a grade of 70 or more”, and C
represent the event “Chen earns a grade of 70 or more”then a suitable sample space is
S =

ABC; ABC;A BC;AB C; A BC;A B C; AB C; A B C

Suppose we are mostly interested in how many of these friends earn a grade of 70 or
more. We can de…ne the random variable X = “number of friends who earn a grade of 70
or more”. The range of X is f0; 1; 2; 3g with associated mapping
X (ABC) = 3
X

ABC

= X

A BC

= X

AB C

= 2
X

A BC

= X

A B C

= X

AB C

= 1
X

A B C

= 0
An important function associated with random variables is the cumulative distribution
function.
2.2. RANDOM VARIABLES 11
2.2.3 De…nition - Cumulative Distribution Function
The cumulative distribution function (c.d.f.) of a random variable X is de…ned by
F (x) = P (X x) for x 2 <
Note: The cumulative distribution function is de…ned for all real numbers.
2.2.4 Properties - Cumulative Distribution Function
(1) F is a non-decreasing function, that is,
F (x1) F (x2) for all x1 < x2
(2)
lim
x!1F (x) = 0 and limx!1F (x) = 1
(3) F is a right-continuous function, that is,
lim
x!a+
F (x) = F (a)
(4) For all a < b
P (a < X b) = P (X b) P (X a) = F (b) F (a)
(5) For all b
P (X = b) = F (b) lim
a!b
F (a)
2.2.5 Example
Suppose X is a random variable with cumulative distribution function
F (x) = P (X x) =
8>>>>>>><>>>>>>>:
0 x < 0
0:1 0 x < 1
0:3 1 x < 2
0:6 2 x < 3
1 x 3
(a) Graph the function F (x).
(b) Determine the probabilities
(i) P (X 1)
(ii) P (X 2)
(iii) P (X 2:4)
(iv) P (X = 2)
(v) P (0 < X 2)
12 2. UNIVARIATE RANDOM VARIABLES
-1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
x
F(x)
Figure 2.1: Graph of F (x) = P (X x) for Example 2.2.5
(vi) P (0 X 2).
Solution
(a) See Figure 2.1.
(b) (i)
P (X 1) = F (1) = 0:3
(ii)
P (X 2) = F (2) = 0:6
(iii)
P (X 2:4) = P (X 2) = F (2) = 0:6
(iv)
P (X = 2) = F (2) lim
x!2
F (x) = 0:6 0:3 = 0:3
or
P (X = 2) = P (X 2) P (X 1) = F (2) F (1) = 0:6 0:3 = 0:3
(v)
P (0 < X 2) = P (X 2) P (X 0) = F (2) F (0) = 0:6 0:1 = 0:5
(vi)
P (0 X 2) = P (X 2) P (X < 0) = F (2) 0 = 0:6
2.2. RANDOM VARIABLES 13
2.2.6 Example
Suppose X is a random variable with cumulative distribution function
F (x) = P (X x) =
8>>>>><>>>>>:
0 x 0
2
5x
3 0 < x < 1
1
5

12x 3x2 7 1 x < 2
1 x 2
(a) Graph the function F (x).
(b) Determine the probabilities
(i) P (X 1)
(ii) P (X 2)
(iii) P (X 2:4)
(iv) P (X = 0:5)
(v) P (X = b), for b 2 <
(vi) P (1 < X 2:4)
(vii) P (1 X 2:4).
Solution
(a) See Figure 2.2
0 0.5 1 1.5 2
0
0.2
0.4
0.6
0.8
1
x
F(x )
Figure 2.2: Graph of F (x) = P (X x) for Example 2.2.6
14 2. UNIVARIATE RANDOM VARIABLES
(b) (i)
P (X 1) = F (1)
=
2
5
(1)3 =
2
5
= 0:4
(ii)
P (X 2) = F (2) = 1
(iii)
P (X 2:4) = F (2:4) = 1
(iv)
P (X = 0:5)
= F (0:5) lim
x!0:5
F (x)
= F (0:5) F (0:5)
= 0
(v)
P (X = b)
= F (b) lim
a!b
F (a)
= F (b) F (b)
= 0 for all b 2 <
(vi)
P (1 < X 2:4) = F (2:4) F (1)
= 1 0:4
= 0:6
(vii)
P (1 X 2:4) = P (1 < X 2:4) since P (X = 1) = 0
= 0:6
2.3. DISCRETE RANDOM VARIABLES 15
2.3 Discrete Random Variables
A set A is countable if the number of elements in the set is …nite or the elements of the set
can be put into a one-to-one correspondence with the positive integers.
2.3.1 De…nition - Discrete Random Variable
A random variable X de…ned on a sample space S is a discrete random variable if there is
a countable subset A < such that P (X 2 A) = 1.
2.3.2 De…nition - Probability Function
If X is a discrete random variable then the probability function (p.f.) of X is given by
f(x) = P (X = x)
= F (x) lim
"!0+
F (x ") for x 2 <
The set A = fx : f(x) > 0g is called the support set of X.
2.3.3 Properties - Probability Function
(1)
f(x) 0 for x 2 <
(2) P
x2A
f(x) = 1
2.3.4 Example
In Example 2.2.5 …nd the support set A, show that X is a discrete random variable and
determine its probability function.
Solution
The support set of X is A = f0; 1; 2; 3g which is a countable set. Its probability function is
f(x) = P (X = x) =
8>>>>><>>>>>:
0:1 if x = 0
P (X 1) P (X 0) = 0:3 0:1 = 0:2 if x = 1
P (X 2) P (X 1) = 0:6 0:3 = 0:3 if x = 2
P (X 3) P (X 2) = 1 0:6 = 0:4 if x = 3
or
x 0 1 2 3 Total
f(x) = P (X = x) 0:1 0:2 0:3 0:4 1
16 2. UNIVARIATE RANDOM VARIABLES
Since P (X 2 A) =
3P
x=0
P (X = x) = 1, X is a discrete random variable.
In the next example we review four of the named distributions which were introduced in a
previous probability course.
2.3.5 Example
Suppose a box containing a red balls and b black balls. For each of the following …nd
the probability function of the random variable X and show that
P
x2A
f(x) = 1 where
A = fx : f(x) > 0g is the support set of X.
(a) X = number of red balls among n balls drawn at random without replacement.
(b) X = number of red balls among n balls drawn at random with replacement.
(c) X = number of black balls selected before obtaining the …rst red ball if sampling is
done at random with replacement.
(d) X = number of black balls selected before obtaining the kth red ball if sampling is done
at random with replacement.
Solution
(a) If n balls are selected at random without replacement from a box of a red balls and
b black balls then the random variable X = number of red balls has a Hypergeometric
distribution with probability function
f (x) = P (X = x) =

a
x

b
n x


a+ b
n
for x = max (0; n b) ; : : : ;min (a; n)
By the Hypergeometric identity 2.11.6
P
x2A
f(x) =
min(a;n)P
x=max(0;nb)

a
x

b
n x


a+ b
n

=
1
a+ b
n
1P
x=0

a
x

b
n x

=

a+ b
n


a+ b
n

= 1
2.3. DISCRETE RANDOM VARIABLES 17
(b) If n balls are selected at random with replacement from a box of a red balls and b black
balls then we have a sequence of Bernoulli trials and the random variable X = number of
red balls has a Binomial distribution with probability function
f (x) = P (X = x) =

n
x

px (1 p)nx for x = 0; 1; : : : ; n
where p = aa+b . By the Binomial series 2.11.3(1)
P
x2A
f(x) =
nP
x20

n
x

px (1 p)nx
= (p+ 1 p)n
= 1
(c) If sampling is done with replacement then we have a sequence of Bernoulli trials and
the random variable X = number of black balls selected before obtaining the …rst red ball
has a Geometric distribution with probability function
f (x) = P (X = x) = p (1 p)x for x = 0; 1; : : :
By the Geometric series 2.11.1
P
x2A
f(x) =
1P
x20
p (1 p)x
=
p
[1 (1 p)]
= 1
(d) If sampling is done with replacement then we have a sequence of Bernoulli trials and
the random variable X = number of black balls selected before obtaining the kth red ball
has a Negative Binomial distribution with probability function
f (x) = P (X = x) =

x+ k 1
x

pk (1 p)x for x = 0; 1; : : :
Using the identity 2.11.4(1)
x+ k 1
x

=
k
x

(1)x
the probability function can be written as
f (x) = P (X = x) =
k
x

pk (p 1)x for x = 0; 1; : : :
18 2. UNIVARIATE RANDOM VARIABLES
By the Binomial series 2.11.3(2)P
x2A
f(x) =
1P
x20
k
x

pk (p 1)x
= pk
1P
x20
k
x

(p 1)x
= pk (1 + p 1)k
= 1
2.3.6 Example
If X is a random variable with probability function
f(x) =
xe
x!
for x = 0; 1; : : : ; > 0 (2.1)
show that 1P
x=0
f(x) = 1
Solution
By the Exponential series 2.11.7
1P
x=0
f(x) =
1P
x=0
xe
x!
= e
1P
x=0
x
x!
= ee
= 1
The probability function (2.1) is called the Poisson probability function.
2.3.7 Exercise
If X is a random variable with probability function
f(x) =
(1 p)x
x log p
for x = 1; 2; : : : ; 0 < p < 1
show that 1P
x=1
f(x) = 1
Hint: Use the Logarithmic series 2.11.8.
Important Note: A summary of the named distributions used in these Course Notes can
be found in Chapter 11.
2.4. CONTINUOUS RANDOM VARIABLES 19
2.4 Continuous Random Variables
2.4.1 De…nition - Continuous Random Variable
Suppose X is a random variable with cumulative distribution function F . If F is a continu-
ous function for all x 2 < and F is di¤erentiable except possibly at countably many points
then X is called a continuous random variable.
Note: The de…nition (2.2.3) and properties (2.2.4) of the cumulative distribution function
hold for the random variable X regardless of whether X is discrete or continuous.
2.4.2 Example
Suppose X is a random variable with cumulative distribution function
F (x) = P (X x) =
8>>>>><>>>>>:
0 x 1
12 (x+ 1)2 + x+ 1 1 < x 0
1
2 (x 1)2 + x 0 < x < 1
1 x 1
Show that X is a continuous random variable.
Solution
The cumulative distribution function F is a continuous function for all x 2 < since it is a
piecewise function composed of continuous functions and
lim
x!aF (x) = F (a)
at the break points a = 1; 0; 1.
The function F is di¤erentiable for all x 6= 1; 0; 1 since it is a piecewise function
composed of di¤erentiable functions. Since
lim
h!0
F (1 + h) F (1)
h
= 0 6= lim
h!0+
F (1 + h) F (1)
h
= 1
lim
h!0
F (0 + h) F (0)
h
= 0 = lim
h!0+
F (0 + h) F (0)
h
lim
h!0
F (1 + h) F (1)
h
= 1 6= lim
h!0+
F (1 + h) F (1)
h
= 0
F is di¤erentiable at x = 0 butnot di¤erentiable at x = 1; 1. The set f1; 1g is a countable
set.
Since F is a continuous function for all x 2 < and F is di¤erentiable except for countable
many points, therefore X is a continuous random variable.
20 2. UNIVARIATE RANDOM VARIABLES
2.4.3 De…nition - Probability Density Function
If X is a continuous random variable with cumulative distribution function F (x) then the
probability density function (p.d.f.) of X is f(x) = F 0 (x) if F is di¤erentiable at x. The
set A = fx : f(x) > 0g is called the support set of X.
Note: At the countably many points at which F 0 (a) does not exist, f (a) may be assigned
any convenient value since the probabilities P (X x) will be una¤ected by the choice. We
usually choose f(a) 0 and most often we choose f(a) = 0.
2.4.4 Properties - Probability Density Function
(1) f(x) 0 for all x 2 <
(2)
1R
1
f(x)dx = lim
x!1F (x) limx!1F (x) = 1
(3) f(x) = lim
h!0
F (x+h)F (x)
h = limh!0
P (xXx+h)
h if this limit exists
(4) F (x) =
xR
1
f(t)dt; x 2 <
(5) P (a < X b) = P (X b) P (X a) = F (b) F (a) =
bR
a
f(x)dx
(6) P (X = b) = F (b) lim
a!b
F (a) = F (b) F (b) = 0
(since F is continuous).
2.4.5 Example
Find and sketch the probability density function of the random variable X with the cumu-
lative distribution function in Example 2.2.6.
Solution
By taking the derivative of F (x) we obtain
F 0(x) =
8>>>>><>>>>>:
0 x < 0
d
dx

2
5x
3

= 65x
2 0 < x < 1
d
dx

12x3x27
5

= 15 (12 6x) 1 < x < 2
0 x > 2
We can assign any values to f (0), f (1), and f (2). For convenience we choose
f (0) = f (2) = 0 and f (1) = 65 .
2.4. CONTINUOUS RANDOM VARIABLES 21
The probability density function is
f(x) = F 0 (x) =
8>><>>:
6
5x
2 0 < x 1
1
5 (12 6x) 1 < x < 2
0 otherwise
The graph of f (x) is given in Figure 2.3.
-0.5 0 0.5 1 1.5 2 2.5
0
0.2
0.4
0.6
0.8
1
1.2
1.4
x
f(x)
Figure 2.3: Graph of probability density function for Example 2.4.5
Note: See Table 2.1 in Section 2.7 for a summary of the di¤erences between the properties
of a discrete and continuous random variable.
2.4.6 Example
Suppose X is a random variable with cumulative distribution function
F (x) =
8>><>>:
0 x a
xa
ba a < x b
1 x > b
where b > a.
(a) Sketch F (x) the cumulative distribution function of X.
(b) Find f (x) the probability density function of X and sketch it.
(c) Is it possible for f (x) to take on values greater than one?
22 2. UNIVARIATE RANDOM VARIABLES
Solution
(a) See Figure 2.4 for a sketch of the cumulative distribution function.
a b
0
0.5
1.0
(a+b)/2
x
F(x)
Figure 2.4: Graph of cumulative distribution function for Example 2.4.6
(b) By taking the derivative of F (x) we obtain
F 0(x) =
8>><>>:
0 x < a
1
ba a < x < b
0 x > b
The derivative of F (x) does not exist for x = a or x = b. For convenience we de…ne
f (a) = f (b) = 1ba so that
f(x) =
(
1
ba a x b
0 otherwise
See Figure 2.5 for a sketch of the probability density function. Note that we could de…ne
f (a) and f (b) to be any values and the cumulative distribution function would remain the
same since x = a or x = b are countably many points.
The random variable X is said to have a Uniform(a; b) distribution. We write this as
X Uniform(a; b).
(c) If a = 0 and b = 0:5 then f (x) = 2 for 0 x 0:5. This example illustrates that the
probability density function is not a probability and that the probability density function
can take on values greater than one.
The important restriction for continuous random variables is
1Z
1
f (x) dx = 1
2.4. CONTINUOUS RANDOM VARIABLES 23
a b
1/(b-a)
(a+b)/2
x
f(x)
Figure 2.5: Graph of the probability density function of a Uniform(a; b) random variable
2.4.7 Example
Consider the function
f(x) =

x+1
for x 1
and 0 otherwise. For what values of is this function a probability density function?
Solution
Using the result (2.8) from Section 2.11
1Z
1
f (x) dx =
1Z
1

x+1
dx
= lim
b!1
bZ
1
x1dx
= lim
b!1
h
xjb1
i
= 1 lim
b!1
1
b
= 1 if > 0
Also f (x) 0 if > 0. Therefore f(x) is a a probability density function for all > 0.
X is said to have a Pareto(1; ) distribution.
24 2. UNIVARIATE RANDOM VARIABLES
A useful function for evaluating integrals associated with several named random variables
is the Gamma function.
2.4.8 De…nition - Gamma Function
The gamma function, denoted by () for all > 0, is given by
() =
1Z
0
y1eydy
2.4.9 Properties - Gamma Function
(1) () = ( 1)( 1) > 1
(2) (n) = (n 1)! n = 1; 2; : : :
(3) (12) =
p

2.4.10 Example
Suppose X is a random variable with probability density function
f(x) =
x1ex=
()
for x > 0; > 0; > 0
and 0 otherwise.
X is said to have a Gamma distribution with parameters and and we write
X Gamma(; ).
(a) Verify that
1Z
1
f(x)dx = 1
(b) What special probability density function is obtained for = 1?
(c) Graph the probability density functions for
(i) = 1, = 3
(ii) = 2, = 1:5
(iii) = 5, = 0:6
(iv) = 10, = 0:3
on the same graph.
Note: See Chapter 11 - Summary of Named Distributions. Note that the notation for
parameters used for named distributions is not necessarily the same in all textbooks. This
is especially true for distributions with two or more parameters.
2.4. CONTINUOUS RANDOM VARIABLES 25
Solution
(a)
1Z
1
f (x) dx =
1Z
0
x1ex=
()
dx let y = x=
=
1
()
1Z
0
(y)1 eydy
=
1
()
1Z
0
y1eydy
=
()
()
= 1
(b) If = 1 the probability density function is
f(x) =
1

ex= for x > 0; > 0
and 0 otherwise which is the Exponential() distribution.
(c) See Figure 2.6
0 1 2 3 4 5 6 7 8 9
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
x
f(x)
a=1, b=3
a=2, b=1.5
a=5, b=0.6
a=10, b=0.3
Figure 2.6: Gamma(; ) probability density functions
26 2. UNIVARIATE RANDOM VARIABLES
2.4.11 Exercise
Suppose X is a random variable with probability density function
f(x) =


x1e(x=)

for x > 0; > 0; > 0
and 0 otherwise.
X is said to have a Weibull distribution with parameters and and we write
X Weibull(; ).
(a) Verify that
1Z
1
f(x)dx = 1
(b) What special probability density function is obtained for = 1?
(c) Graph the probability density functions for
(i) = 1, = 0:5
(ii) = 2, = 0:5
(iii) = 2, = 1
(iv) = 3, = 1
on the same graph.
2.4.12 Exercise
Suppose X is a random variable with probability density function
f(x) =

x+1
for x > ; > 0; > 0
and 0 otherwise.
X is said to have a Pareto distribution with parameters and and we write
X Pareto(; ).
(a) Verify that
1Z
1
f(x)dx = 1
(b) Graph the probability density functions for
(i) = 1, = 1
(ii) = 1, = 2
(iii) = 0:5, = 1
(iv) = 0:5, = 2
on the same graph.
2.5. LOCATION AND SCALE PARAMETERS 27
2.5 Location and Scale Parameters
In Chapter 6 we will look at methods for constructing con…dence intervals for an unknown
parameter . If the parameter is either a location parameter or a scale parameter then a
con…dence interval is easier to construct.
2.5.1 De…nition - Location Parameter
Suppose X is a continuous random variable with probability density function f(x; ) where
is a parameter of the distribution. Let F0 (x) = F (x; = 0) and f0 (x) = f (x; = 0).
The parameter is called a location parameter of the distribution if
F (x; ) = F0 (x ) for 2 <
or equivalently
f(x; ) = f0(x ) for 2 <
2.5.2 De…nition - Scale Parameter
Suppose X is a continuous random variable with probability density function f(x; ) where
is a parameter of the distribution. Let F1 (x) = F (x; = 1) and f1 (x) = f (x; = 1).
The parameter is called a scale parameter of the distribution if
F (x; ) = F1
x


for > 0
or equivalently
f(x; ) =
1

f1(
x

) for > 0
2.5.3 Example
Suppose X is a continuous random variable with probability density function
f (x) =
1

e(x)= for x ; 2 <; > 0
and 0 otherwise.
X is said to have a Two Parameter Exponential distribution and we write X Double
Exponential(; ).
(a) If X Two Parameter Exponential(; 1) show that is a location parameter for this
distribution. Sketch the probability density function for = 1; 0; 1 on the same graph.
(b) If X Two Parameter Exponential(0; ) show that is a scale parameter for this
distribution. Sketch the probability density function for = 0:5; 0; 2 on the same graph.
28 2. UNIVARIATE RANDOM VARIABLES
Solution
(a) For X Two Parameter Exponential(; 1) the probability density function is
f (x; ) = e(x) for x ; 2 <
and 0 otherwise.
Let
f0 (x) = f (x; = 0) = e
x for x > 0
and 0 otherwise. Then
f (x; ) = e(x)
= f0 (x ) for all 2 <
and therefore is a location parameter of this distribution.
See Figure 2.7 for a sketch of the probability density function for = 1; 0; 1.
-1 0 1 2 3 4 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
f (x)
q=-1 q=0 q=1
Figure 2.7: Exponential(; 1) probability density function for = 1; 0; 1
(b) For X Two Parameter Exponential(0; ) the probability density function is
f (x; ) =
1

ex= for x > 0; > 0
and 0 otherwise which is the Exponential() probability density function.
2.5. LOCATION AND SCALE PARAMETERS 29
Let
f1 (x) = f (x; = 1) = e
x for x > 0
and 0 otherwise. Then
f (x; ) =
1

ex= =
1

f1
x


for all > 0
and therefore is a scale parameter of this distribution.
See Figure 2.8 for a sketch of the probability density function for = 0:5; 1; 2.
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
f(x)
q=0.5
q=1
q=2
Figure 2.8: Exponential() probability density function for = 0:5; 1; 2
2.5.4 Exercise
Suppose X is a continuous random variable with probability density function
f (x) =
1
f1 + [(x )=]2g for x 2 <; 2 <; > 0
and 0 otherwise.
X is said to have a two parameter Cauchy distribution and we write
X Cauchy(; ).
(a) IfX Cauchy(; 1) then show that is a location parameter for the distribution. Graph
the Cauchy(; 1) probability density function for = 1; 0 and 1 on the same graph.
(b) If X Cauchy(0; ) then show that is a scale parameter for the distribution. Graph
the Cauchy(0; ) probability density function for = 0:5; 1 and 2 on the same graph.
30 2. UNIVARIATE RANDOM VARIABLES
2.6 Functions of a Random Variable
Suppose X is a continuous random variable with probability density function f and cumu-
lative distribution function F and we wish to …nd the probability density function of the
random variable Y = h(X) where h is a real-valued function. In this section we look at
techniques for determining the distribution of Y .
2.6.1 Cumulative Distribution Function Technique
A useful technique for determining the distribution of a function of a random variable
Y = h(X) is the cumulative distribution function technique. This technique involves ob-
taining an expression for G(y) = P (Y y), the cumulative distribution function of Y ,
in terms of F , the cumulative distribution function of X. The corresponding probability
density function g of Y is found by di¤erentiating G. Care must be taken to determine the
support set of the random variable Y .
2.6.2 Example
If Z N(0; 1) …nd the probability density function of Y = Z2. What type of random
variable is Y ?
Solution
If Z N(0; 1) then the probability density function of Z is
f (z) =
1p
2
ez
2=2 for z 2 <
LetG (y) = P (Y y) be the cumulative distribution function of Y = Z2. Since the support
set of the random variable Z is < and Y = Z2 then the support set of Y is B = fy : y > 0g.
For y 2 B
G (y) = P (Y y)
= P

Z2 y
= P (py Z py)
=
p
yZ
py
1p
2
ez
2=2dz
= 2
p
yZ
0
1p
2
ez
2=2dz since f (z) is an even function
2.6. FUNCTIONS OF A RANDOM VARIABLE 31
For y 2 B the probability density function of Y is
g (y) =
d
dy
G (y) =
d
dy
264
p
yZ
0
2p
2
ez
2=2dz
375 = 2p
2
e(
p
y)
2
=2 d
dy
(
p
y)
=
2p
2
e(
p
y)
2
=2 1
2
p
y
(2.2)
=
1p
2y
ey=2 (2.3)
by 2.11.10.
We recognize (2.3) as the probability density function of a Chi-squared(1) random vari-
able so Y = Z2 2 (1).
The above solution provides a proof of the following theorem.
2.6.3 Theorem
If Z N(0; 1) then Z2 2 (1).
2.6.4 Example
Suppose X Exponential(). The cumulative distribution function for X is
F (x) = P (X x) =
8<:0 x 01 ex= x > 0
Determine the distribution of the random variable Y = F (X) = 1 eX=.
Solution
Since Y = 1 eX=, X = log (1 Y ) = F1 (Y ) for X > 0 and 0 < Y < 1.
For 0 < y < 1 the cumulative distribution function of Y is
G (y) = P (Y y) = P

1 eX= y

= P (X log (1 y))
= F ( log (1 y))
= 1 e log(1y)=
= 1 (1 y)
= y
32 2. UNIVARIATE RANDOM VARIABLES
If U Uniform(0; 1) then the cumulative distribution function of U is
P (U u) =
8>>><>>>:
0 u 0
u 0 < u <1
1 u 1
Therefore the cumulative distribution function of Y = F (X) = 1 eX= is Uniform(0; 1).
This is an example of a result which holds more generally as summarized in the following
theorem.
2.6.5 Theorem - Probability Integral Transformation
If X is a continuous random variable with cumulative distribution function F then the
random variable
Y = F (X) =
XZ
1
f (t) dt (2.4)
has a Uniform(0; 1) distribution.
Proof
Suppose the continuous random variable X has support set A = fx : f (x) > 0g. For all
x 2 A, F is an increasing function since F is a cumulative distribution function. Therefore
for all x 2 A the function F has an inverse function F1.
For 0 < y < 1, the cumulative distribution function of Y = F (X) is
G (y) = P (Y y) = P (F (X) y)
= P

X F1 (y)
= F

F1 (y)

= y
which is the cumulative distribution function of a Uniform(0; 1) random variable. Therefore
Y = F (X) Uniform(0; 1) as required.
Note: Because of the form of the function (transformation) Y = F (X) in (2.4), this
transformation is called the probability integral transformation. This result holds for any
cumulative distribution function F corresponding to a continuous random variable.
2.6. FUNCTIONS OF A RANDOM VARIABLE 33
2.6.6 Theorem
Suppose F is a cumulative distribution function for a continuous random variable. If
U Uniform(0; 1) then the random variable X = F1 (U) also has cumulative distribution
function F .
Proof
Suppose that the support set of the random variable X = F1(U) is A. For x 2 A, the
cumulative distribution function of X = F1(U) is
P (X x) = P F1(U) x
= P (U F (x))
= F (x)
since P (U u) = u for 0 < u < 1 if U Uniform(0; 1). Therefore X = F1(U) has
cumulative distribution function F .
Note: The result of the previous theorem is important because it provides a method for
generating observations from a continuous distribution. Let u be an observation generated
from a Uniform(0; 1) distribution using a random number generator. Then by Theorem
2.6.6, x = F1 (u) is an observation from the distribution with cumulative distribution
function F .
2.6.7 Example
Explain how Theorem 2.6.6 can be used to generate observations from a Weibull(; 1)
distribution.
Solution
If X has a Weibull(; 1) distribution then the cumulative distribution function is
F (x; ) =
xZ
0
y1ey

dy
= 1 ex for x > 0
The inverse cumulative distribution function is
F1 (u) = [ log (1 u)]1= for 0 < u < 1
If u is an observation from the Uniform(0; 1) distribution then x = [ log (1 u)]1= is an
observation from the Weibull(; 1) distribution by Theorem 2.6.6.
If we wish to …nd the distribution of the random variable Y = h(X) and h is a one-to-one
real-valued function then the following theorem can be used.
34 2. UNIVARIATE RANDOM VARIABLES
2.6.8 Theorem - One-to-One Transformation of a Random Variable
Suppose X is a continuous random variable with probability density function f and support
set A = fx : f(x) > 0g. Let Y = h(X) where h is a real-valued function. Let B = fy :
g(y) > 0g be the support set of the random variable Y . If h is a one-to-one function from
A to B and ddxh (x) is continuous for x 2 A, then the probability density function of Y is
g(y) = f(h1(y))
ddyh1(y)
for y 2 B
Proof
We prove this theorem using the cumulative distribution function technique.
(1) Suppose h is an increasing function and ddxh (x) is continuous for x 2 A. Then h1 (y)
is also an increasing function and ddyh
1 (y) > 0 for y 2 B. The cumulative distribution
function of Y = h(X) is
G (y) = P (Y y) = P (h (X) y)
= P

X h1 (y) since h is an increasing function
= F

h1 (y)

Therefore
g (y) =
d
dy
G (y) =
d
dy
F

h1 (y)

= F 0

h1 (y)
d
dy
h1 (y) by the Chain Rule
= f

h1 (y)
ddyh1 (y)
y 2 B since ddyh1 (y) > 0
(2) Suppose h is a decreasing function and ddxh (x) is continuous for x 2 A. Then h1 (y)
is also a decreasing function and ddyh
1 (y) < 0 for y 2 B. The cumulative distribution
function of Y = h(X) is
G (y) = P (Y y) = P (h (X) y)
= P

X h1 (y) since h is a decreasing function
= 1 F h1 (y)
Therefore
g (y) =
d
dy
G (y) =
d
dy

1 F h1 (y)
= F 0 h1 (y) d
dy
h1 (y) by the Chain Rule
= f

h1 (y)
ddyh1 (y)
y 2 B since ddyh1 (y) < 0
These two cases give the desired result.
2.6. FUNCTIONS OF A RANDOM VARIABLE 35
2.6.9 Example
The following two results were used extensively in your previous probability and statistics
courses.
(a) If Z N(0; 1) then Y = + Z N; 2.
(b) If X N(; 2) then Z = X N(0; 1).
Prove these results using Theorem 2.6.8.
Solution
(a) If Z N(0; 1) then the probability density function of Z is
f (z) =
1p
2
ez
2=2 for z 2 <
Y = + Z = h (Z) is an increasing function with inverse function Z = h1 (Y ) = Y .
Since the support set of Z is A = <, the support set of Y is B = <.
Since
d
dy
h1 (y) =
d
dy

y


=
1

then by Theorem 2.6.8 the probability density function of Y is
g(y) = f(h1(y))
ddyh1(y)

=
1p
2
e(
y
)
2
=2 1

for y 2 <
which is the probability density function of an N

; 2

random variable.
Therefore if Z N(0; 1) then Y = + Z N; 2.
(b) If X N(; 2) then the probability density function of X is
f (x) =
1p
2
e(
x
)
2
=2 for x 2 <
Z = X = h (X) is an increasing function with inverse function X = h
1 (Z) = + Z.
Since the support set of X is A = <, the support set of Z is B = <.
Since
d
dz
h1 (z) =
d
dy
(+ z) =
then by Theorem 2.6.8 the probability density function of Z is
g(z) = f(h1(z))
ddyh1(z)

=
1p
2
ez
2=2 for z 2 <
which is the probability density function of an N(0; 1) random variable.
Therefore if X N(; 2) then Z = Y N(0; 1).
36 2. UNIVARIATE RANDOM VARIABLES
2.6.10 Example
Use Theorem 2.6.8 to prove the following relationship between the Pareto distribution and
the Exponential distribution.
If X Pareto (1; ) then Y = log(X) Exponential

1


Solution
If X Pareto(1; ) then the probability density function of X is
f(x) =

x+1
for x 1; > 0
Y = log(X) = h (X) is an increasing function with inverse function X = eY = h1 (Y ).
Since the support set of X is A = fx : x 0g, the support set of Y is B = fy : y > 0g.
Since
d
dy
h1 (y) =
d
dy
ey = ey
then by Theorem 2.6.8 the probability density function of Y is
g(y) = f(h1(y))
ddyh1(y)

=

(ey)+1
ey = ey for y > 0
which is the probability density function of an Exponential

1


random variable as required.
2.6.11 Exercise
Use Theorem 2.6.8 to prove the following relationship between the Exponential distribution
and the Weibull distribution.
If X Exponential(1) then Y = X1= Weibull(; ) for y > 0; > 0; > 0
2.6.12 Exercise
Suppose X is a random variable with probability density function
f(x) = x1 for 0 < x < 1; > 0
and 0 otherwise.
Use Theorem 2.6.8 to prove that Y = logX Exponential1.
2.7. EXPECTATION 37
2.7 Expectation
In this section we de…ne the expectation operator E which maps random variables to real
numbers. These numbers have an interpretation in terms of long run averages for repeated
independent trails of an experiment associated with the random variable. Much of this
section is a review of material covered in a previous probability course.
2.7.1 De…nition - Expectation
Suppose h(x) is a real-valued function.
If X is a discrete random variable with probability function f(x) and support set A then
the expectation of the random variable h (X) is de…ned by
E [h(X)] =
P
x2A
h (x) f(x)
provided the sum converges absolutely, that is, provided
E(jh(X)j) = P
x2A
jh (x)j f(x)dx <1
If X is a continuous random variable with probability density function f(x) then the ex-
pectation of the random variable h (X) is de…ned by
E [h(X)] =
1Z
1
h (x) f(x)dx
provided the integral converges absolutely, that is, provided
E(jh(X)j) =
1Z
1
jh (x)j f(x) <1
If E(jh(X)j) =1 then we say that E [h(X)] does not exist.
E [h(X)] is also called the expected value of the random variable h (X).
2.7.2 Example
Find E(X) if X Geometric(p).
Solution
If X Geometric(p) then
f (x) = pqx for x = 0; 1; : : : ; q = 1 p; 0 < p < 1
38 2. UNIVARIATE RANDOM VARIABLES
and
E (X) =
P
x2A
xf(x) =
1P
x=0
xpqx =
1P
x=1
xpqx
= pq
1P
x=1
xqx1 which converges if 0 < q < 1
=
pq
(1 q)2 by 2.11.2(2)
=
q
p
if 0 < q < 1
2.7.3 Example
Suppose X Pareto(1; ) with probability density function
f(x) =

x+1
for x 1; > 0
and 0 otherwise. Find E(X). For what values of does E(X) exist?
Solution
E (X) =
1Z
1
xf(x)dx =
1Z
1
x

x+1
dx
=
1Z
1
1
x
dx which converges for > 1 by 2.8
= lim
b!1
bZ
1
xdx =

1 limb!1x
+1jb1 =

1

1 1
b1

=

1 for > 1
Therefore E (X) = 1 and the mean exists only for > 1.
2.7.4 Exercise
Suppose X is a nonnegative continuous random variable with cumulative distribution func-
tion F (x) and E(X) <1. Show that
E(X) =
1Z
0
[1 F (x)] dx
Hint: Use integration by parts with u = [1 F (x)].
2.7. EXPECTATION 39
2.7.5 Theorem - Expectation is a Linear Operator
Suppose X is a random variable with probability (density) function f(x), a and b are real
constants, and g(x) and h(x) are real-valued functions. Then
E (aX + b) = aE (X) + b
E [ag(X) + bh(X)] = aE [g(X)] + bE [h(X)]
Proof (Continuous Case)
E (aX + b) =
1Z
1
(ax+ b) f(x)dx
= a
1Z
1
xf(x)dx+ b
1Z
1
f(x)dx by properties of integrals
= aE (X) + b (1) by De…nition 2.7.1 and Property 2.4.4
= aE (X) + b
E [ag(X) + bh(X)] =
1Z
1
[ag(x) + bh(x)] f(x)dx
= a
1Z
1
g(x)f(x)dx+ b
1Z
1
h(x)f(x)dx by properties of integrals
= aE [g(X)] + bE [h(X)] by De…nition 2.7.1
as required.
The following named expectations are used frequently.
2.7.6 Special Expectations
(1) The mean of a random variable
E (X) =
(2) The kth moment (about the origin) of a random variable
E(Xk)
(3) The kth moment about the mean of a random variable
E
h
(X )k
i
40 2. UNIVARIATE RANDOM VARIABLES
(4) The kth factorial moment of a random variable
E

X(k)

= E [X(X 1) (X k + 1)]
(5) The variance of a random variable
V ar(X) = E[(X )2] = 2 where = E(X)
2.7.7 Theorem - Properties of Variance
2 = V ar(X)
= E(X2) 2
= E[X(X 1)] + 2
V ar(aX + b) = a2V ar(X)
and
E(X2) = 2 + 2
Proof (Continuous Case)
V ar(X) = E[(X )2] = E X2 2X + 2
= E

X2
2E (X) + 2 by Theorem 2.7.5
= E

X2
22 + 2
= E

X2
2
Also
V ar(X) = E

X2
2 = E [X (X 1) +X] 2
= E [X (X 1)] + E (X) 2 by Theorem 2.7.5
= E [X (X 1)] + 2
V ar (aX + b) = E
n
[aX + b (a+ b)]2
o
= E
h
(aX a)2
i
= E
h
a2 (X )2
i
= a2E[(X )2] by Theorem 2.7.5
= a2V ar(X) by de…nition
Rearranging 2 = E(X2) 2 gives
E(X2) = 2 + 2
2.7. EXPECTATION 41
2.7.8 Example
If X Binomial(n; p) then show
E(X(k)) = n(k)pk for k = 1; 2; : : :
and thus …nd E (X) and V ar(X).
Solution
E(X(k)) =
nP
x=k
x(k)

n
x

px (1 p)nx
=
nP
x=k
n(k)

n k
x k

px (1 p)nx by 2.11.4(1)
= n(k)
nkP
y=0

n k
y

py+k (1 p)nyk y = x k
= n(k)pk
nkP
y=0

n k
y

py (1 p)(nk)y
= n(k)pk (p+ 1 p)nk by 2.11.3(1)
= n(k)pk for k = 1; 2; : : :
For k = 1 we obtain
E

X(1)

= E (X)
= n(1)p1
= np
For k = 2 we obtain
E

X(2)

= E [X (X 1)]
= n(2)p2
= n (n 1) p2
Therefore
V ar (X) = E[X(X 1)] + 2
= n (n 1) p2 + np n2p2
= np2 + np
= np (1 p)
42 2. UNIVARIATE RANDOM VARIABLES
2.7.9 Exercise
Show the following:
(a) If X Poisson() then E(X(k)) = k for k = 1; 2; : : :
(b) If X Negative Binomial(k; p) then E(X(j)) = (k)(j)

p1
p
j
for j = 1; 2; : : :
(c) If X Gamma(; ) then E(Xp) = p (+ p) = () for p > .
(d) If X Weibull(; ) then E(Xk) = k k + 1 for k = 1; 2; : : :
In each case …nd E (X) and V ar(X).
Table 2.1 summarizes the di¤erences between the properties of a discrete and continuous
random variable.
Property
Discrete Random Variable
Continuous Random Variable
c.d.f.
F (x) = P (X x) = P
tx
P (X = t)
F is a right continuous
function for all x 2 <
F (x) = P (X x) =
xR
1
f (t) dt
F is a continuous
function for all x 2 <
p.f./p.d.f. f (x) = P (X = x) f (x) = F 0 (x) 6= P (X = x) = 0
Probability
of an event
P (X 2 E) = P
x2E
P (X = x)
=
P
x2E
f (x)
P (a < X b) = F (b) F (a)
=
bR
a
f (x) dx
Total
Probability
P
x2A
P (X = x) =
P
x2A
f (x) = 1
where A = support set of X
1R
1
f (x) dx = 1
Expectation E [g (X)] =
P
x2A
g (x) f (x)
where A = support set of X
E [g (X)] =
1R
1
g (x) f (x) dx
Table 2.1 Properties of discrete versus continuous random variables
2.8. INEQUALITIES 43
2.8 Inequalities
In Chapter 5 we consider limiting distributions of a sequence of random variables. The
following inequalities which involve the moments of a distribution are useful for proving
limit theorems.
2.8.1 Markov’s Inequality
P (jXj c) E(jXj
k)
ck
for all k; c > 0
Proof (Continuous Case)
Suppose X is a continuous random variable with probability density function f (x). Let
A =

x :
x
c
k 1 = fx : jxj cg since c > 0
Then
E

jXjk

ck
= E
Xc
k
!
=
1Z
1
x
c
k f (x) dx
=
Z
A
x
c
k f (x) dx+ Z
A
x
c
k f (x) dx

Z
A
x
c
k f (x) dx since Z
A
x
c
k f (x) dx 0

Z
A
f (x) dx since
x
c
k 1 for x 2 A
= P (jXj c)
as required. (The proof of the discrete case follows by replacing integrals with sums.)
2.8.2 Chebyshev’s Inequality
Suppose X is a random variable with …nite mean and …nite variance 2. Then for any
k > 0
P (jX j k) 1
k2
2.8.3 Exercise
Use Markov’s Inequality to prove Chebyshev’s Inequality.
44 2. UNIVARIATE RANDOM VARIABLES
2.9 Variance Stabilizing Transformation
In Chapter 6 we look at methods for constructing a con…dence interval for an unknown
parameter based on data X. To do this it is often useful to …nd a transformation, g (X),
of the data X whose variance is approximately constant with respect to .
Suppose X is a random variable with …nite mean E (X) = . Suppose also that X has
…nite variance V ar (X) = 2 () and standard deviation
p
V ar (X) = () also depending
on . Let Y = g (X) where g is a di¤erentiable function. By the linear approximation
Y = g (X) g () + g0 () (X )
Therefore
E (Y ) E g () + g0 () (X ) = g ()
since
E

g0 () (X ) = g0 ()E [(X )] = 0
Also
V ar (Y ) V ar g0 () (X ) = g0 ()2 V ar (X) = g0 () ()2 (2.5)
If we want V ar (Y ) constant with respect to then we should choose g such that
g0 ()
2
V ar (X) =

g0 () ()
2
= constant
In other words we need to solve the di¤erential equation
dg
d
=
k
()
where k is a conveniently chosen constant.
2.9.1 Example
If X Poisson() then show that the random variable Y = g (X) = pX has approximately
constant variance.
Solution
If X Poisson() then pV ar (X) = () = p. For g (X) = pX, g0 (X) = 12X1=2.
Therefore by (2.5), the variance of Y = g (X) =
p
X is approximately

g0 () ()
2
=

1
2
1=2
p

2
=
1
4
which is a constant.
2.9.2 Exercise
If X Exponential() then show that the random variable Y = g (X) = logX has approx-
imately constant variance.
2.10. MOMENT GENERATING FUNCTIONS 45
2.10 Moment Generating Functions
If we are given the probability (density) function of a random variable X or the cumulative
distribution function of the random variable X then we can determine everything there is to
know about the distribution of X. There is a third type of function, the moment generating
function, which also uniquely determines a distribution. The moment generating function is
closely related to other transforms used in mathematics, the Laplace and Fourier transforms.
Moment generating functions are a powerful tool for determining the distributions of
functions of random variables (Chapter 4), particularly sums, as well as determining the
limiting distribution of a sequence of random variables (Chapter 5).
2.10.1 De…nition - Moment Generating Function
If X is a random variable then M(t) = E(etX) is called the moment generating function
(m.g.f.) of X if this expectation exists for all t 2 (h; h) for some h > 0.
Important:When determining the moment generating functionM(t) of a random variable
the values of t for which the expectation exists should always be stated.
2.10.2 Example
(a) Find the moment generating function of the random variable X Gamma(; ).
(b) Find the moment generating function of the random variableX Negative Binomial(k; p).
Solution
(a) If X Gamma(; ) then
M (t) =
1Z
1
etxf (x) dx =
1Z
0
etx
x1
()
ex=dx
=
1Z
0
1
()
x1ex

1

t

dx which converges for t <
1

=
1
()
1Z
0
x1ex

1t


dx let y =

1 t


x
=
1
()
1Z
0

y
1 t
1
ey


1 t

dy
=
1
()

1
1 t
1Z
0
y1eydy =
()
()

1
1 t

=

1
1 t

for t <
1

46 2. UNIVARIATE RANDOM VARIABLES
(b) If X Negative Binomial(k; p) then
M (t) =
1P
x20
etx
k
x

pk (q)x where q = 1 p
= pk
1P
x20
k
x

et (q)x which converges for qet < 1
= pk

1 qetk by 2.11.3(2) for et < q1
=

p
1 qet
k
for t < log (q)
2.10.3 Exercise
(a) Show that the moment generating function of the random variable X Binomial(n; p)
is M(t) =

q + pet
n for t 2 <.
(b) Show that the moment generating function of the random variable X Poisson() is
M(t) = e(e
t1) for t 2 <.
If the moment generating function of random variable X exists then the following theo-
rem gives us a method for determining the distribution of the random variable Y = aX + b
which is a linear function of X.
2.10.4 Theorem - Moment Generating Function of a Linear Function
Suppose the random variable X has moment generating function MX(t) de…ned for
t 2 (h; h) for some h > 0. Let Y = aX + b where a; b 2 R and a 6= 0. Then the moment
generating function of Y is
MY (t) = e
btMX(at) for jtj < hjaj
Proof
MY (t) = E

etY

= E

et(aX+b)

= ebtE

eatX

which exists for jatj < h
= ebtMX(at) for jtj < hjaj
as required.
2.10.5 Example
(a) Find the moment generating function of Z N(0; 1).
(b) Use (a) and the fact that X = +Z N(; 2) to …nd the moment generating function
of a N(; 2) random variable.
2.10. MOMENT GENERATING FUNCTIONS 47
Solution
(a) The moment generating function of Z is
MZ(t) =
1Z
1
etz
1p
2
ez
2=2dz
=
1Z
1
1p
2
e(z
22tz)=2dz
= et
2=2
1Z
1
1p
2
e(zt)
2=2dz
= et
2=2 for t 2 <
by 2.4.4(2) since
1p
2
e(zt)=2
is the probability density function of a N (t; 1) random variable.
(b) By Theorem 2.10.4 the moment generating function of X = + Z is
MX(t) = e
tMZ(t)
= ete(t)
2=2
= et+
2t2=2 for t 2 <
2.10.6 Exercise
If X Negative Binomial(k; p) then …nd the moment generating function of Y = X + k,
k = 1; 2; : : :
2.10.7 Theorem - Moments from Moment Generating Function
Suppose the random variable X has moment generating function M(t) de…ned for
t 2 (h; h) for some h > 0. Then M (0) = 1 and
M (k)(0) = E(Xk) for k = 1; 2; : : :
where
M (k)(t) =
dk
dtk
M(t)
is the kth derivative of M(t).
48 2. UNIVARIATE RANDOM VARIABLES
Proof (Continuous Case)
Note that
M (0) = E(X0) = E (1) = 1
and also that
dk
dtk
etx = xketx k = 1; 2; : : : (2.6)
The result (2.6) can be proved by induction.
Now ifX is a continuous random variable with moment generating functionM(t) de…ned
for t 2 (h; h) for some h > 0 then
M (k) (t) =
dk
dtk
E

etX

=
dk
dtk
1Z
1
etxf (x) dx
=
1Z
1
dk
dtk
etxf (x) dx k = 1; 2; : : :
assuming the operations of di¤erentiation and integration can be exchanged. (This inter-
change of operations cannot always be done but for the moment generating functions of
interest in this course the result does hold.)
Using (2.6) we have
M (k) (t) =
1Z
1
xketxf (x) dx
= E

XketX

t 2 (h; h) for some h > 0
Letting t = 0 we obtain
M (k) (0) = E

Xk

k = 1; 2; : : :
as required.
2.10.8 Example
If X Gamma(; ) then M(t) = (1 t), t < 1=. Find E Xk, k = 1; 2; : : : using
Theorem 2.10.7.
Solution
d
dt
M (t) = M 0 (t) =
d
dt
(1 t)
= (1 t)1
2.10. MOMENT GENERATING FUNCTIONS 49
so
E (X) = M 0 (0) =
d2
dt2
M (t) = M 00 (t) =
d2
dt2
(1 t)
= (+ 1)2 (1 t)2
so
E

X2

= M 00 (0) = (+ 1)2
Continuing in this manner we have
dk
dtk
M (t) = M (k) (t) =
dk
dtk
(1 t)
= (+ 1) (+ k 1)k (1 t)k for k = 1; 2; : : :
so
E

Xk

= M (k) (0)
= (+ 1) (+ k 1)k = (+ k 1)(k) k for k = 1; 2; : : :
2.10.9 Important Idea
Suppose M (k)(t), k = 1; 2; : : : exists for t 2 (h; h) for some h > 0, then M (t) has a
Maclaurin series given by
1P
k=0
M (k)(0)
k!
tk
where
M (0)(0) = M(0) = 1
The coe¢ cient of tk in this power series is equal to
M (k)(0)
k!
=
E(Xk)
k!
Therefore if we can obtain a Maclaurin series for M (t), for example, by using the Binomial
series or the Exponential series, then we can …nd E(Xk) by using
E(Xk) = k! coe¢ cient of tk in the Maclaurin series for M (t) (2.7)
50 2. UNIVARIATE RANDOM VARIABLES
2.10.10 Example
Suppose X Gamma(; ). Find E Xk by using the Binomial series expansion for
M(t) = (1 t), t < 1=.
Solution
M(t) = (1 t)
=
1P
k=0

k

(t)k for jtj < 1

by 2.11.3(2)
=
1P
k=0

k

()k tk for jtj < 1

The coe¢ cient of tk in M (t) is
k

()k for k = 1; 2; : : :
Therefore
E(Xk) = k! coe¢ cient of tk in the Maclaurin series for M (t)
= k!

k

()k
= k!
()(k)
k!
()k
= () ( 1) ( k + 2) ( k + 1) ()k
= (+ k 1) (+ k 2) (+ 1) ()k
= (+ k 1)(k) k for k = 1; 2; : : :
which is the same result as obtained in Example 2.10.8.
Moment generating functions are particularly useful for …nding distributions of sums of
independent random variables. The following theorem plays an important role in this
technique.
2.10.11 Uniqueness Theorem for Moment Generating Functions
Suppose the random variable X has moment generating function MX(t) and the random
variable Y has moment generating function MY (t). Suppose also that MX(t) = MY (t) for
all t 2 (h; h) for some h > 0. Then X and Y have the same distribution, that is,
P (X s) = FX(s) = FY (s) = P (Y s) for all s 2 <.
Proof
See Problem 18 for the proof of this result in the discrete case.
2.10. MOMENT GENERATING FUNCTIONS 51
2.10.12 Example
If X Exponential(1) then …nd the distribution of Y = + X where > 0 and 2 <.
Solution
From Example 2.4.10 we know that if X Exponential(1) then X Gamma(1; 1) so
MX (t) =
1
1 t for t < 1
By Theorem 2.10.4
MY (t) = e
tMX (t) for t < 1
=
et
1 t for t <
1

By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Two Parameter Exponential(; ) random variable.
Therefore by the Uniqueness Theorem for Moment Generating Functions, Y has a Two
Parameter Exponential(; ) distribution.
2.10.13 Example
If X Gamma(; ), where is a positive integer, then show 2X 2 (2).
Solution
(a) From Example 2.10.2 the moment generating function of X is
M (t) =

1
1 t

for t <
1

By Theorem 2.10.4 the moment generating function of Y = 2X is
MY (t) = MX

2

t

for jtj < 1 2 ()
=
24 1
1

2


t
35 for jtj < 1
2
=

1
1 2t

for jtj < 1
2
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a 2 (2) random variable if is a positive integer.
Therefore by the Uniqueness Theorem for Moment Generating Functions, Y has a 2 (2)
distribution if is a positive integer.
52 2. UNIVARIATE RANDOM VARIABLES
2.10.14 Exercise
If X Gamma(; ), where is a positive integer and > 0, then show
2X

2(2)
2.10.15 Exercise
Suppose the random variable X has moment generating function
M(t) = et
2=2 for t 2 <
(a) Use (2.7) and the Exponential series 2.11.7 to …nd E (X) and V ar (X).
(b) Find the moment generating function of Y = 2X 1. What is the distribution of Y ?
2.11 Calculus Review
2.11.1 Geometric Series
1P
x=0
atx = a+ at+ at2 + = a
1 t for jtj < 1
2.11.2 Useful Results
(1)
1P
x=0
tx =
1
1 t for jtj < 1
(2)
1P
x=1
xtx1 =
1
(1 t)2 for jtj < 1
2.11.3 Binomial Series
(1) For n 2 Z+ (the positive integers)
(a+ b)n =
nP
x=0

n
x

axbnx
where
n
x

=
n!
x! (n x)! =
n(x)
x!
(2) For n 2 Q (the rational numbers) and jtj < 1
(1 + t)n =
1P
x=0

n
x

tx
2.11. CALCULUS REVIEW 53
where
n
x

=
n(x)
x!
=
n(n 1) (n x+ 1)
x!
2.11.4 Important Identities
(1) x(k)

n
x

= n(k)

n k
x k

(2)

x+ k 1
x

=

x+ k 1
k 1

= (1)x
k
x

2.11.5 Multinomial Theorem
If n is a positive integer and a1; a2; : : : ; ak are real numbers, then
(a1 + a2 + + ak)n =
PP P n!
x1!x2! xk!a
x1
1 a
x2
2 axkk
where the summation extends over all non-negative integers x1; x2; : : : ; xk with x1 + x2 +
+ xk = n.
2.11.6 Hypergeometric Identity
1P
x=0

a
x

b
n x

=

a+ b
n

2.11.7 Exponential Series
ex = 1 +
x
1!
+
x2
2!
+ =
1P
n=0
xn
n!
for x 2 <
2.11.8 Logarithmic Series
ln (1 + x) = log (1 + x) = x x
2
2
+
x3
3
for 1 < x 1
2.11.9 First Fundamental Theorem of Calculus (FTC1)
If f is continuous on [a; b] then the function g de…ned by
g (x) =
xZ
a
f (t) dt for a x b
is continuous on [a; b], di¤erentiable on (a; b) and g0 (x) = f (x).
54 2. UNIVARIATE RANDOM VARIABLES
2.11.10 Fundamental Theorem of Calculus and the Chain Rule
Suppose we want the derivative with respect to x of G (x) where
G (x) =
h(x)Z
a
f (t) dt for a x b
and h (x) is a di¤erentiable function on [a; b]. If we de…ne
g (u) =
uZ
a
f (t) dt
then G (x) = g (h (x)). Then by the Chain Rule
G0 (x) = g0 (h (x))h0 (x)
= f (h (x))h0 (x) for a < x < b
2.11.11 Improper Integrals
(a) If
bR
a
f (x) dx exists for every number b a then
1Z
a
f (x) dx = lim
b!1
bZ
a
f (x) dx
provided this limit exists. If the limit exists we say the improper integral converges otherwise
we say the improper integral diverges.
(b) If
bR
a
f (x) dx exists for every number a b then
bZ
1
f (x) dx = lim
a!1
bZ
a
f (x) dx
provided this limit exists.
(c) If both
1R
a
f (x) dx and
aR
1
f (x) dx are convergent then we de…ne
1Z
1
f (x) dx =
aZ
1
f (x) dx+
1Z
a
f (x) dx
where a is any real number.
2.11. CALCULUS REVIEW 55
2.11.12 Comparison Test for Improper Integrals
Suppose that f and g are continuous functions with f (x) g (x) 0 for x a.
(a)
If
1Z
a
f (x) dx is convergent then
1Z
a
g (x) dx is convergent.
(b)
If
1Z
a
g (x) dx is divergent then
1Z
a
f (x) dx is divergent.
2.11.13 Useful Result for Comparison Test
1Z
1
1
xp
dx converges if and only if p > 1 (2.8)
2.11.14 Useful Inequalities
1
1 + yp
1
yp
for y 1, p > 0
1
1 + yp
1
yp + yp
=
1
2yp
for y 1, p > 0
2.11.15 Taylor’s Theorem
Suppose f is a real-valued function such that the derivatives f (1); f (2); : : : ; f (n+1) all exist
on an open interval containing the point x = a. Then
f (x) = f (a)+f (1) (a) (x a)+f
(2) (a)
2!
(x a)2+ +f
(n) (a)
n!
(x a)n+f
(n+1) (c)
(n+ 1)!
(x a)n+1
for some c between a and x.
56 2. UNIVARIATE RANDOM VARIABLES
2.12 Chapter 2 Problems
1. Consider the following functions:
(a) f(x) = kxx for x = 1; 2; : : :; 0 < < 1
(b) f(x) = k
h
1 + (x=)2
i1
for x 2 <, > 0
(c) f(x) = kejxj for x 2 <, 2 <
(d) f(x) = k (1 x) for 0 < x < 1, > 0
(e) f(x) = kx2ex for x > 0, > 0
(f) f(x) = kx(+1) for x 1, > 0
(g) f(x) = kex=

1 + ex=
2
for x 2 <, > 0
(h) f(x) = kx3e1=(x) for x > 0, > 0
(i) f (x) = k (1 + x) for x > 0, > 1
(j) f(x) = k (1 x)x1 for 0 < x < 1, > 0
In each case:
(1) Determine k so that f(x) is a probability (density) function and sketch f(x).
(2) Let X be a random variable with probability (density) function f(x). Find
the cumulative distribution function of X.
(3) Find E(X) and V ar(X) using the probability (density) function. Indicate
the values of for which E(X) and V ar(X) exist.
(4) Find P (0:5 < X 2) and P (X > 0:5jX 2).
In (a) use = 0:3, in (b) use = 1, in (c) use = 0, in (d) use = 5, in (e) use
= 1, in (f) use = 1, in (g) use = 2, in (h) use = 1, in (i) use = 2, in (j)
use = 3.
2. Determine if is a location parameter, a scale parameter, or neither for the distribu-
tions in (b) (j) of Problem 1.
3. (a) If X Weibull(2; ) then show is a scale parameter for this distribution.
(b) If X Uniform(0; ) then show is a scale parameter for this distribution.
4. Suppose X is a continuous random variable with probability density function
f (x) =
8<:ke
(x)2=2 jx j c
kecjxj+c2=2 jx j > c
(a) Show that
1
k
=
2
c
ec
2=2 +
p
2 [2 (c) 1]
where is the N(0; 1) cumulative distribution function.
2.12. CHAPTER 2 PROBLEMS 57
(b) Find the cumulative distribution function of X, E (X) and V ar (X).
(c) Show that is a location parameter for this distribution.
(d) On the same graph sketch f(x) for c = 1, = 0, f(x) for c = 2, = 0 and the
N(0; 1) probability density function. What do you notice?
5. The Geometric and Exponential distributions both have a property referred to as the
memoryless property.
(a) Suppose X Geometric(p). Show that P (X k + jjX k) = P (X j)
where k and j are nonnegative integers. Explain why this is called the memory-
less property.
(b) Show that if Y Exponential() then P (Y a+ bjY a) = P (Y b) where
a > 0 and b > 0.
6. Suppose that f1 (x) ; f2 (x) ; : : : ; fk (x) are probability density functions with support
sets A1; A2; : : : ; Ak; means 1; 2; : : : ; k; and …nite variances
2
1;
2
2; : : : ;
2
k respec-
tively. Suppose that 0 < p1; p2; : : : ; pk < 1 and
kP
i=1
pi = 1.
(a) Show that g (x) =
kP
i=1
pifi (x) is a probability density function.
(b) Let X be a random variable with probability density function g (x). Find the
support set of X, E (X) and V ar (X).
7.
(a) If X Gamma(; ) then …nd the probability density function of Y = eX .
(b) If X Gamma(; ) then show Y = 1=X Inverse Gamma(; ).
(c) If X Gamma(k; ) then show Y = 2X= 2 (2k) for k = 1; 2; : : :.
(d) If X N; 2 then …nd the probability density function of Y = eX .
(e) If X N; 2 then …nd the probability density function of Y = X1.
(f) If X Uniform(2 ; 2 ) then show that Y = tanX Cauchy(1; 0).
(g) If X Pareto(; ) then show that Y = log(X=) Exponential(1).
(h) If X Weibull(2; ) then show that Y = X2 Exponential(2).
(i) If X Double Exponential(0; 1) then …nd the probability density function of
Y = X2.
(j) If X t(k) then show that Y = X2 F(1; k).
58 2. UNIVARIATE RANDOM VARIABLES
8. Suppose T t(n).
(a) Show that E (T ) = 0 if n > 1.
(b) Show that V ar(T ) = n= (n 2) if n > 2.
9. Suppose X Beta(a; b).
(a) Find E

Xk

for k = 1; 2; : : :. Use this result to …nd E (X) and V ar (X).
(b) Graph the probability density function for (i) a = 0:7, b = 0:7, (ii) a = 1, b = 3,
(iii) a = 2, b = 2, (iv) a = 2, b = 4, and (v) a = 3, b = 1 on the same graph.
(c) What special probability density function is obtained for a = b = 1?
10. If E(jXjk) exists for some integer k > 1, then show that E(jXjj) exists for
j = 1; 2; : : : ; k 1.
11. If X Binomial(n; ), …nd the variance stabilizing transformation g (X) such that
V ar [g (X)] is approximately constant.
12. Prove that for any random variable X,
E

X4
1
4
P

X2 1
2

13. For each of the following probability (density) functions derive the moment generating
functionM(t). State the values for whichM(t) exists and use the moment generating
function to …nd the mean and variance.
(a) f(x) =

n
x

px(1 p)nx for x = 0; 1; : : : ; n; 0 < p < 1
(b) f(x) = xe=x! for x = 0; 1; : : :; > 0
(c) f(x) = 1 e
(x)= for x > ; 2 <, > 0
(d) f(x) = 12e
jxj for x 2 <; 2 <
(e) f(x) = 2x for 0 < x < 1
(f) f(x) =
8>><>>:
x 0 x 1
2 x 1 < x 2
0 otherwise
14. Suppose X is a random variable with moment generating function M(t) = E(etX)
which exists for t 2 (h; h) for some h > 0. Then K(t) = logM(t) is called the
cumulant generating function of X.
(a) Show that E(X) = K 0(0) and V ar(X) = K 00(0).
2.12. CHAPTER 2 PROBLEMS 59
(b) If X Negative Binomial(k; p) then use (a) to …nd E(X) and V ar(X).
15. For each of the following …nd the Maclaurin series forM (t) using known series. Thus
determine all the moments of X if X is a random variable with moment generating
function M(t):
(a) M(t) = (1 t)3 for jtj < 1
(b) M(t) = (1 + t)=(1 t) for jtj < 1
(c) M(t) = et=(1 t2) for jtj < 1
16. Suppose Z N(0; 1) and Y = jZj.
(a) Show that MY (t) = 2 (t) et
2=2 for t 2 < where (t) is the cumulative distrib-
ution function of a N(0; 1) random variable.
(b) Use (a) to …nd E (jZj) and V ar (jZj).
17. Suppose X 2(1) and Z N(0; 1). Use the properties of moment generating func-
tions to compute E

Xk

and E

Zk

for k = 1; 2; : : : How are these two related? Is
this what you expected?
18. Suppose X and Y are discrete random variables such that P (X = j) = pj and
P (Y = j) = qj for j = 0; 1; : : :. Suppose also that MX(t) = MY (t) for t 2 (h; h),
h > 0. Show that X and Y have the same distribution. (Hint: Compare MX(log s)
and MY (log s) and recall that if two power series are equal then their coe¢ cients are
equal.)
19. Suppose X is a random variable with moment generating function M(t) = et=(1 t2)
for jtj < 1.
(a) Find the moment generating function of Y = (X 1) =2.
(b) Use the moment generating function of Y to …nd E (Y ) and V ar (Y ).
(c) What is the distribution of Y ?
60 2. UNIVARIATE RANDOM VARIABLES
3. Multivariate Random Variables
Models for real phenomena usually involve more than a single random variable. When there
are multiple random variables associated with an experiment or process we usually denote
them as X;Y; : : : or as X1; X2; : : : . For example, your …nal mark in a course might involve
X1 = your assignment mark, X2 = your midterm test mark, and X3 = your exam mark.
We need to extend the ideas introduced in Chapter 2 for univariate random variables to
deal with multivariate random variables.
In Section 3:1 we began be de…ning the joint and marginal cumulative distribution
functions since these de…nitions hold regardless of what type of random variable we have.
We de…ne these functions in the case of two random variables X and Y . More than two
random variables will be considered in speci…c examples in later sections. In Section 3:2 we
brie‡y review discrete joint probability functions and marginal probability functions that
were introduced in a previous probability course. In Section 3:3 we introduce the ideas
needed for two continuous random variables and look at detailed examples since this is new
material. In Section 3:4 we de…ne independence for two random variables and show how the
Factorization Theorem for Independence can be used. When two random variables are not
independent then we are interested in conditional distributions. In Section 3:5 we review
the de…nition of a conditional probability function for discrete random variables and de…ne
a conditional probability density function for continuous random variables which is new
material. In Section 3:6 we review expectations of functions of discrete random variables.
We also de…ne expectations of functions of continuous random variables which is new ma-
terial except for the case of Normal random variables. In Section 3:7 we de…ne conditional
expectations which arise from the conditional distributions discussed in Section 3:5. In
Section 3:8 we discuss moment generating functions for two or more random variables, and
show how the Factorization Theorem for Moment Generating Functions can be used to
prove that random variables are independent. In Section 3:9 we review the Multinomial
distribution and its properties. In Section 3:10 we introduce the very important Bivariate
Normal distribution and its properties. Section 3:11 contains some useful results related to
evaluating double integrals.
61
62 3. MULTIVARIATE RANDOM VARIABLES
3.1 Joint and Marginal Cumulative Distribution Functions
We begin with the de…nitions and properties of the cumulative distribution functions asso-
ciated with two random variables.
3.1.1 De…nition - Joint Cumulative Distribution Function
Suppose X and Y are random variables de…ned on a sample space S. The joint cumulative
distribution function of X and Y is given by
F (x; y) = P (X x; Y y) for (x; y) 2 <2
3.1.2 Properties - Joint Cumulative Distribution Function
(1) F is non-decreasing in x for …xed y
(2) F is non-decreasing in y for …xed x
(3) lim
x!1F (x; y) = 0 and limy!1F (x; y) = 0
(4) lim
(x;y)!(1;1)
F (x; y) = 0 and lim
(x;y)!(1;1)
F (x; y) = 1
3.1.3 De…nition - Marginal Distribution Function
The marginal cumulative distribution function of X is given by
F1(x) = lim
y!1F (x; y)
= P (X x) for x 2 <
The marginal cumulative distribution function of Y is given by
F2(y) = lim
x!1F (x; y)
= P (Y y) for y 2 <
Note: The de…nitions and properties of the joint cumulative distribution function and the
marginal cumulative distribution functions hold for both (X;Y ) discrete random variables
and for (X;Y ) continuous random variables.
Joint and marginal cumulative distribution functions for discrete random variables are not
very convenient for determining probabilities. Joint and marginal probability functions,
which are de…ned in Section 3.2, are more frequently used for discrete random variables.
In Section 3.3 we look at speci…c examples of joint and marginal cumulative distribution
functions for continuous random variables. In Chapter 5 we will see the important role of
cumulative distribution functions in determining asymptotic distributions.
3.2. BIVARIATE DISCRETE DISTRIBUTIONS 63
3.2 Bivariate Discrete Distributions
Suppose X and Y are random variables de…ned on a sample space S. If there is a countable
subset A <2 such that P [(X;Y ) 2 A] = 1, then X and Y are discrete random variables.
Probabilities for discrete random variables are most easily handled in terms of joint
probability functions.
3.2.1 De…nition - Joint Probability Function
Suppose X and Y are discrete random variables.
The joint probability function of X and Y is given by
f(x; y) = P (X = x; Y = y) for (x; y) 2 <2
The set A = f(x; y) : f(x; y) > 0g is called the support set of (X;Y ).
3.2.2 Properties of Joint Probability Function
(1) f(x; y) 0 for (x; y) 2 <2
(2)
P
(x;y)2
P
A
f(x; y) = 1
(3) For any set R <2
P [(X;Y ) 2 R] = P
(x;y)2
P
R
f(x; y)
3.2.3 De…nition - Marginal Probability Function
Suppose X and Y are discrete random variables with joint probability function f(x; y).
The marginal probability function of X is given by
f1(x) = P (X = x)
=
P
all y
f(x; y) for x 2 <
and the marginal probability function of Y is given by
f2(y) = P (Y = y)
=
P
all x
f(x; y) for y 2 <
3.2.4 Example
In a fourth year statistics course there are 10 actuarial science students, 9 statistics students
and 6 math business students. Five students are selected at random without replacement.
Let X be the number of actuarial science students selected, and let Y be the number of
statistics students selected.
64 3. MULTIVARIATE RANDOM VARIABLES
Find
(a) the joint probability function of X and Y
(b) the marginal probability function of X
(c) the marginal probability function of Y
(d) P (X > Y )
Solution
(a) The joint probability function of X and Y is
f(x; y) = P (X = x; Y = y)
=

10
x

9
y

6
5 x y


25
5

for x = 0; 1; : : : ; 5, y = 0; 1; : : : ; 5, x+ y 5
(b) The marginal probability function of X is
f1 (x) = P (X = x)
=
1X
y=0

10
x

9
y

6
5 x y


25
5

=

10
x

15
5 x


25
5
1X
y=0

9
y

6
5 x y


15
5 x

=

10
x

15
5 x


25
5
for x = 0; 1; : : : ; 5
by the Hypergeometric identity 2.11.6. Note that the marginal probability function of X
is Hypergeometric(25; 10; 5). This makes sense because, when we are only interested in the
number of actuarial science students, we only have two types of objects (actuarial science
students and non-actuarial science students) and we are sampling without replacement
which gives us the familiar Hypergeometric probability function.
3.2. BIVARIATE DISCRETE DISTRIBUTIONS 65
(c) The marginal probability function of Y is
f2 (y) = P (Y = y)
=
1X
x=0

10
x

9
y

6
5 x y


25
5

=

9
y

16
5 y


25
5
1X
y=0

10
x

6
5 x y


16
5 y

=

9
y

16
5 y


25
5
for y = 0; 1; : : : ; 5
by the Hypergeometric identity 2.11.6. The marginal probability function of Y is
Hypergeometric(25; 9; 5).
(d)
P (X > Y ) =
P
(x;y):
P
x>y
f(x; y)
= f(1; 0) + f(2; 0) + f(3; 0) + f(4; 0) + f(5; 0)
+f(2; 1) + f(3; 1) + f(4; 1)
+f (3; 2)
3.2.5 Exercise
The Hardy-Weinberg law of genetics states that, under certain conditions, the relative
frequencies with which three genotypes AA; Aa and aa occur in the population will be 2;
2(1 ) and (1 )2 respectively where 0 < < 1. Suppose n members of a very large
population are selected at random.
Let X be the number of AA types selected and let Y be the number of Aa types selected.
Find
(a) the joint probability function of X and Y
(b) the marginal probability function of X
(c) the marginal probability function of Y
(d) P (X + Y = t) for t = 0; 1; : : :.
66 3. MULTIVARIATE RANDOM VARIABLES
3.3 Bivariate Continuous Distributions
Probabilities for continuous random variables can also be specifed in terms of joint proba-
bility density functions.
3.3.1 De…nition - Joint Probability Density Function
Suppose that F (x; y) is a continuous function and that
f(x; y) =
@2
@x@y
F (x; y)
exists and is a continuous function except possibly along a …nite number of curves. Suppose
also that 1Z
1
1Z
1
f(x; y)dxdy = 1
Then X and Y are said to be continuous random variables with joint probability density
function f . The set A = f(x; y) : f(x; y) > 0g is called the support set of (X;Y ).
Note: We will arbitrarily de…ne f(x; y) to be equal to 0 when @
2
@x@yF (x; y) does not exist
although we could de…ne it to be any real number.
3.3.2 Properties - Joint Probability Density Function
(1)
f(x; y) 0 for all (x; y) 2 <2
(2)
P [(X;Y ) 2 R] =
ZZ
R
f(x; y)dxdy for R <2
= the volume under the surface z = f (x; y)
and above the region R in the xy plane
3.3.3 Example
Suppose X and Y are continuous random variables with joint probability density function
f(x; y) = x+ y for 0 < x < 1; 0 < y < 1
and 0 otherwise. The support set of (X;Y ) is A = f(x; y) : 0 < x < 1; 0 < y < 1g.
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 67
The joint probability function for (x; y) 2 A is graphed in Figure 3.1. We notice that the
surface is the portion of the plane z = x+ y lying above the region A.
0
0.2
0.4
0.6
0.8
1
0
0.5
1
0
0.5
1
1.5
2
Figure 3.1: Graph of joint probability density function for Example 3.3.3
(a) Show that
1Z
1
1Z
1
f(x; y)dxdy = 1
(b) Find
(i) P

X 13 ; Y 12

(ii) P (X Y )
(iii) P

X + Y 12

(iv) P

XY 12

.
68 3. MULTIVARIATE RANDOM VARIABLES
Solution
(a) A graph of the support set for (X;Y ) is given in Figure 3.2. Such a graph is useful for
determining the limits of integration of the double integral.
x
y
0 1
1
A
Figure 3.2: Graph of the support set of (X;Y ) for Example 3.3.3
1Z
1
1Z
1
f(x; y)dxdy =
Z
(x;y)
Z
2A
(x+ y) dxdy
=
1Z
0
1Z
0
(x+ y) dxdy
=
1Z
0

1
2
x2 + xy

j10

dy
=
1Z
0

1
2
+ y

dy
=

1
2
y +
1
2
y2

j10
=
1
2
+
1
2
= 1
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 69
(b) (i) A graph of the region of integration is given in Figure 3.3
x
y
0 1
1
1/3
1/2
B
Figure 3.3: Graph of the integration region for Example 3.3.3(b)(i)
P

X 1
3
; Y 1
2

=
Z
(x;y)
Z
2B
(x+ y) dxdy
=
1=2Z
0
1=3Z
0
(x+ y) dxdy
=
1=2Z
0

1
2
x2 + xy

j1=30

dy
=
1=2Z
0
"
1
2

1
3
2
+
1
3
y
#
dy
=

1
18
y +
1
6
y2

j1=20
=
1
18

1
2

+
1
6

1
2
2
=
5
72
70 3. MULTIVARIATE RANDOM VARIABLES
(ii) A graph of the region of integration is given in Figure 3.4. Note that when the
region is not rectangular then care must be taken with the limits of integration.
x
y
0 1
1
y=x
(y,y)
C
Figure 3.4: Graph of the integration region for Example 3.3.3(b)(ii)
P (X Y ) =
Z
(x;y)
Z
2C
(x+ y) dxdy
=
1Z
y=0
yZ
x=0
(x+ y) dxdy
=
1Z
0

1
2
x2 + xy

jy0

dy
=
1Z
0

1
2
y2 + y2

dy
=
1Z
0
3
2
y2dy =
1
2
y3j10
=
1
2
Alternatively
P (X Y ) =
1Z
x=0
1Z
y=x
(x+ y) dydx
Why does the answer of 1=2 make sense when you look at Figure 3.1?
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 71
(iii) A graph of the region of integration is given in Figure 3.5.
x
y
0 1
1
1/2
1/2
x+y=1/2
(x,1/2-x)
D
Figure 3.5: Graph of the region of integration for Example 3.3.3(b)(iii)
P

X + Y 1
2

=
Z
(x;y)
Z
2D
(x+ y) dxdy
=
1
2Z
x=0
1
2
xZ
y=0
(x+ y) dydx
=
1
2Z
0

xy +
1
2
y2

j
1
2
x
0

dx
=
1
2Z
0
"
x

1
2
x

+
1
2

1
2
x
2#
dx
=
1
2Z
0

x
2
2
+
1
8

dx =

x
3
6
+
1
8
x

j1=20
=
2
48
Alternatively
P

X + Y 1
2

=
1
2Z
y=0
1
2
yZ
x=0
(x+ y) dxdy
Why does this small probability make sense when you look at Figure 3.1?
72 3. MULTIVARIATE RANDOM VARIABLES
(iv) A graph of the region of integration E is given in Figure 3.6. In this example the
integration can be done more easily by integrating over the region F .
x
y
0 1
1
1/2
1/2
xy=1/2
(x,1/(2x))
E
F
|
-
Figure 3.6: Graph of the region of integration for Example 3.3.3(b)(iv)
P

XY 1
2

=
Z
(x;y)
Z
2E
(x+ y) dxdy
= 1
Z
(x;y)
Z
2F
(x+ y) dxdy
= 1
1Z
x= 1
2
1Z
y= 1
2x
(x+ y) dydx
= 1
1Z
1
2

xy +
1
2
y2

j11
2x

dx
= 1
1Z
1
2
(
x+
1
2

"
x

1
2x

+
1
2

1
2x
2#)
dx
= 1
1Z
1
2

x 1
8x2

dx
= 1

1
2
x2 +
1
8x

j11
2

=
3
4
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 73
3.3.4 De…nition of Marginal Probability Density Function
Suppose X and Y are continuous random variables with joint probability density function
f(x; y).
The marginal probability density function of X is given by
f1(x) =
1Z
1
f(x; y)dy for x 2 <
and the marginal probability density function of Y is given by
f2(y) =
1Z
1
f(x; y)dx for y 2 <
3.3.5 Example
For the joint probability density function in Example 3.3.3 determine:
(a) the marginal probability density function of X and the marginal probability density
function of Y
(b) the joint cumulative distribution function of X and Y
(c) the marginal cumulative distribution function of X and the marginal cumulative distri-
bution function of Y
Solution
(a) The marginal probability density function of X is
f1 (x) =
1Z
1
f(x; y)dy =
1Z
0
(x+ y) dy =

xy +
1
2
y2

j10 = x+
1
2
for 0 < x < 1
and 0 otherwise.
Since both the joint probability density function f(x; y) and the support set A are symmetric
in x and y then by symmetry the marginal probability density function of Y is
f2 (y) = y +
1
2
for 0 < y < 1
and 0 otherwise.
(b) Since
P (X x; Y y) =
yZ
0
xZ
0
(s+ t) dsdt =
yZ
0

1
2
s2 + st

jx0

dt =
yZ
0

1
2
x2 + xt

dt
=

1
2
x2t+
1
2
xt2

jy0
=
1
2

x2y + xy2

for 0 < x < 1, 0 < y < 1
74 3. MULTIVARIATE RANDOM VARIABLES
P (X x; Y y) =
xZ
0
1Z
0
(s+ t) dtds =
1
2

x2 + x

for 0 < x < 1, y 1
P (X x; Y y) =
yZ
0
1Z
0
(s+ t) dsdt =
1
2

y2 + y

for x 1, 0 < y < 1
the joint cumulative distribution function of X and Y is
F (x; y) = P (X x; Y y) =
8>>>>>>><>>>>>>>:
0 x 0 or y 0
1
2

x2y + xy2

0 < x < 1, 0 < y < 1
1
2

x2 + x

0 < x < 1, y 1
1
2

y2 + y

x 1, 0 < y < 1
1 x 1, y 1
(c) Since the support set of (X;Y ) is A = f(x; y) : 0 < x < 1; 0 < y < 1g then
F1(x) = P (X x)
= lim
y!1F (x; y) = limy!1P (X x; Y y)
= F (x; 1)
=
1
2

x2 + x

for 0 < x < 1
Alternatively
F1(x) = P (X x) =
xZ
1
f1 (s) ds =
xZ
0

s+
1
2

ds
=
1
2

x2 + x

for 0 < x < 1
In either case the marginal cumulative distribution function of X is
F1(x) =
8>><>>:
0 x 0
1
2

x2 + x

0 < x < 1
1 x 1
By symmetry the marginal cumulative distribution function of Y is
F2(y) =
8>><>>:
0 y 0
1
2

y2 + y

0 < y < 1
1 y 1
3.3. BIVARIATE CONTINUOUS DISTRIBUTIONS 75
3.3.6 Exercise
Suppose X and Y are continuous random variables with joint probability density function
f(x; y) =
k
(1 + x+ y)3
for 0 < x <1; 0 < y <1
and 0 otherwise.
(a) Determine k and sketch f(x; y).
(b) Find
(i) P (X 1; Y 2)
(ii) P (X Y )
(iii) P (X + Y 1)
(c) Determine the marginal probability density function of X and the marginal probability
density function of Y .
(d) Determine the joint cumulative distribution function of X and Y .
(e) Determine the marginal cumulative distribution function of X and the marginal cumu-
lative distribution function of Y .
3.3.7 Exercise
Suppose X and Y are continuous random variables with joint probability density function
f(x; y) = kexy for 0 < x < y <1
and 0 otherwise.
(a) Determine k and sketch f(x; y).
(b) Find
(i) P (X 1; Y 2)
(ii) P (X Y )
(iii) P (X + Y 1)
(c) Determine the marginal probability density function of X and the marginal probability
density function of Y .
(d) Determine the joint cumulative distribution function of X and Y .
(e) Determine the marginal cumulative distribution function of X and the marginal cumu-
lative distribution function of Y .
76 3. MULTIVARIATE RANDOM VARIABLES
3.4 Independent Random Variables
Suppose we are modeling a phenomenon involving two random variables. For example
suppose X is your …nal mark in this course and Y is the time you spent doing practice
problems. We would be interested in whether the distribution of one random variable
a¤ects the distribution of the other. The following de…nition de…nes this idea precisely.
The de…nition should remind you of the de…nition of independent events (see De…nition
2.1.8).
3.4.1 De…nition - Independent Random Variables
Two random variables X and Y are called independent random variables if and only if
P (X 2 A and Y 2 B) = P (X 2 A)P (Y 2 B)
for all sets A and B of real numbers.
De…nition 3.4.1 is not very convenient for determining the independence of two random
variables. The following theorem shows how to use the marginal and joint cumulative
distribution functions or the marginal and joint probability (density) functions to determine
if two random variables are independent.
3.4.2 Theorem - Independent Random Variables
(1) Suppose X and Y are random variables with joint cumulative distribution function
F (x; y). Suppose also that F1(x) is the marginal cumulative distribution function of X and
F2(y) is the marginal cumulative distribution function of Y . ThenX and Y are independent
random variables if and only if
F (x; y) = F1 (x)F2 (y) for all (x; y) 2 <2
(2) Suppose X and Y are random variables with joint probability (density) function f(x; y).
Suppose also that f1(x) is the marginal probability (density) function of X with support
set A1 = fx : f1(x) > 0g and f2(y) is the marginal probability (density) function of Y with
support set A2 = fy : f2(y) > 0g. Then X and Y are independent random variables if and
only if
f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2
where A1 A2 = f(x; y) : x 2 A1; y 2 A2g.
Proof
(1) For given (x; y), let Ax = fs : s xg and let By = ft : t yg. Then by De…nition 3.4.1
X and Y are independent random variables if and only if
P (X 2 Ax and Y 2 By) = P (X 2 Ax)P (Y 2 By)
3.4. INDEPENDENT RANDOM VARIABLES 77
for all (x; y) 2 <2.
But
P (X 2 Ax and Y 2 By) = P (X x; Y y) = F (x; y)
P (X 2 Ax) = P (X x) = F1 (x)
and
P (Y 2 By) = F2 (y)
Therefore X and Y are independent random variables if and only if
F (x; y) = F1 (x)F2 (y) for all (x; y) 2 <2
as required.
(2) (Continuous Case) From (1) we have X and Y are independent random variables if
and only if
F (x; y) = F1 (x)F2 (y) (3.1)
for all (x; y) 2 <2. Now @@xF1 (x) exists for x 2 A1 and @@yF2 (y) exists for y 2 A2. Taking
the partial derivative @
2
@x@y of both sides of (3.1) where the partial derivative exists implies
that X and Y are independent random variables if and only if
@2
@x@y
F (x; y) =
@
@x
F1 (x)
@
@y
F2 (y) for all (x; y) 2 A1 A2
or
f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2
as required.
Note: The discrete case can be proved using an argument similar to the one used for (1).
3.4.3 Example
(a) In Example 3.2.4 determine if X and Y are independent random variables.
(b) In Example 3.3.3 determine if X and Y are independent random variables.
Solution
(a) Since the total number of students is …xed, a larger number of actuarial science students
would imply a smaller number of statistics students and we would guess that the random
variables are not independent. To show this we only need to …nd one pair of values (x; y)
78 3. MULTIVARIATE RANDOM VARIABLES
for which P (X = x; Y = y) 6= P (X = x)P (Y = y). Since
P (X = 0; Y = 0) =

10
0

9
0

6
5


25
5
=

6
5


25
5

P (X = 0) =

10
0

15
5


25
5
=

15
5


25
5

P (Y = 0) =

9
0

16
5


25
5
=

16
5


25
5

and
P (X = 0; Y = 0) =

6
5


25
5
6= P (X = 0)P (Y = 0) =

15
5


25
5


16
5


25
5

therefore by Theorem 3.4.2, X and Y are not independent random variables.
(b) Since
f (x; y) = x+ y for 0 < x < 1; 0 < y < 1
f1 (x) = x+
1
2
for 0 < x < 1, f2 (y) = y +
1
2
for 0 < y < 1
it would appear that X and Y are not independent random variables. To show this we only
need to …nd one pair of values (x; y) for which f (x; y) 6= f1 (x) f2 (y). Since
f

2
3
;
1
3

= 1 6= f1

2
3

f2

1
3

=
7
6

5
6

therefore by Theorem 3.4.2, X and Y are not independent random variables.
3.4.4 Exercise
In Exercises 3.2.5, 3.3.6 and 3.3.7 determine if X and Y independent random variables.
In the previous examples we determined whether the random variables were independent
using the joint probability (density) function and the marginal probability (density) func-
tions. The following very useful theorem does not require us to determine the marginal
probability (density) functions.
3.4. INDEPENDENT RANDOM VARIABLES 79
3.4.5 Factorization Theorem for Independence
Suppose X and Y are random variables with joint probability (density) function f(x; y).
Suppose also that A is the support set of (X;Y ), A1 is the support set of X, and A2 is the
support set of Y . Then X and Y are independent random variables if and only if there
exist non-negative functions g (x) and h (y) such that
f(x; y) = g (x)h (y) for all (x; y) 2 A1 A2
Notes:
(1) If the Factorization Theorem for Independence holds then the marginal probability
(density) function of X will be proportional to g and the marginal probability (density)
function of Y will be proportional to h.
(2) Whenever the support set A is not rectangular the random variables will not be inde-
pendent. The reason for this is that when the support set is not rectangular it will always
be possible to …nd a point (x; y) such that x 2 A1 with f1 (x) > 0, and y 2 A2 with
f2 (y) > 0 so that f1 (x) f2 ( y) > 0, but (x; y) =2 A so f (x; y) = 0. This means there is a
point (x; y) such that f (x; y) 6= f1 (x) f2 ( y) and therefore X and Y are not independent
random variables.
(3) The above de…nitions and theorems can easily be extended to the random vector
(X1; X2; : : : ; Xn).
Proof (Continuous Case)
If X and Y are independent random variables then by Theorem 3.4.2
f(x; y) = f1 (x) f2 (y) for all (x; y) 2 A1 A2
Letting g (x) = f1 (x) and h (y) = f2 (y) proves there exist g (x) and h (y) such that
f(x; y) = g (x)h (y) for all (x; y) 2 A1 A2
If there exist non-negative functions g (x) and h (y) such that
f(x; y) = g (x)h (y) for all (x; y) 2 A1 A2
then
f1 (x) =
1Z
1
f(x; y)dy =
Z
y2A2
g (x)h (y) dy = cg (x) for x 2 A1
and
f2 (y) =
1Z
1
f(x; y)dx =
Z
x2A1
g (x)h (y) dy = kh (y) for y 2 A2
80 3. MULTIVARIATE RANDOM VARIABLES
Now
1 =
1Z
1
1Z
1
f(x; y)dxdy
=
Z
y2A2
Z
x2A1
f1 (x) f2 (y) dxdy
=
Z
y2A2
Z
x2A1
cg (x) kh (y) dxdy
= ck
Since ck = 1
f(x; y) = g (x)h (y) = ckg (x)h (y)
= cg (x) kh (y)
= f1 (x) f2 (y) for all (x; y) 2 A1 A2
and by Theorem 3.4.2 X and Y are independent random variables.
3.4.6 Example
Suppose X and Y are discrete random variables with joint probability function
f(x; y) =
x+ye2
x!y!
for x = 0; 1 : : : ; y = 0; 1; : : :
(a) Determine if X and Y independent are random variables.
(b) Determine the marginal probability function of X and the marginal probability function
of Y .
Solution
(a) The support set of (X;Y ) is A = f(x; y) : x = 0; 1; : : : ; y = 0; 1; : : :g which is rec-
tangular. The support set of X is A1 = fx : x = 0; 1; : : :g, and the support set of Y is
A2 = fy : y = 0; 1; : : :g.
Let
g (x) =
xe
x!
and h (y) =
ye
y!
Then f(x; y) = g(x)h(y) for all (x; y) 2 A1 A2. Therefore by the Factorization Theorem
for Independence X and Y are independent random variables.
(b) By inspection we can see that g (x) is the probability function for a Poisson() random
variable. Therefore the marginal probability function of X is
f1 (x) =
xe
x!
for x = 0; 1 : : :
3.4. INDEPENDENT RANDOM VARIABLES 81
Similarly the marginal probability function of Y is
f2 (y) =
ye
y!
for y = 0; 1 : : :
and Y Poisson().
3.4.7 Example
Suppose X and Y are continuous random variables with joint probability density function
f(x; y) =
3
2
y

1 x2 for 1 < x < 1; 0 < y < 1
and 0 otherwise.
(a) Determine if X and Y independent are random variables.
(b) Determine the marginal probability function of X and the marginal probability function
of Y .
Solution
(a) The support set of (X;Y ) is A = f(x; y) : 1 < x < 1; 0 < y < 1g which is rectan-
gular. The support set of X is A1 = fx : 1 < x < 1g, and the support set of Y is
A2 = fy : 0 < y < 1g.
Let
g (x) = 1 x2 and h (y) = 3
2
y
Then f(x; y) = g(x)h(y) for all (x; y) 2 A1 A2. Therefore by the Factorization Theorem
for Independence X and Y are independent random variables.
(b) Since the marginal probability density function of Y is proportional to h (y) we know
f2 (y) = kh (y) for 0 < y < 1 where k is determined by
1 = k
1Z
0
3
2
ydy =
3k
4
y2j10 =
3k
4
Therefore k = 43 and
f2 (y) = 2y for 0 < y < 1
and 0 otherwise.
Since X and Y are independent random variables f(x; y) = f1 (x) f2 (y) or
f1 (x) = f(x; y)=f2 (y) for x 2 A1. Therefore the marginal probability density function of
X is
f1 (x) =
f(x; y)
f2 (y)
=
3
2y

1 x2
2y
=
3
4

1 x2 for 1 < x < 1
and 0 otherwise.
82 3. MULTIVARIATE RANDOM VARIABLES
3.4.8 Example
Suppose X and Y are continuous random variables with joint probability density function
f(x; y) =
2

for 0 < x <
p
1 y2, 1 < y < 1
and 0 otherwise.
(a) Determine if X and Y independent are random variables.
(b) Determine the marginal probability function of X and the marginal probability function
of Y .
Solution
(a) The support set of (X;Y ) which is
A =
n
(x; y) : 0 < x <
p
1 y2; 1 < y < 1
o
is graphed in Figure 3.7.
x
y
0 1
1
-1
x=sqrt(1-y 2)
A
Figure 3.7: Graph of the support set of (X;Y ) for Example 3.4.8
The set A can also be described as
A =
n
(x; y) : 0 < x < 1;
p
1 x2 < y <
p
1 x2
o
The support set A is not rectangular. Note that the surface has constant height on the
region A.
3.4. INDEPENDENT RANDOM VARIABLES 83
The support set for X is
A1 = fx : 0 < x < 1g
and the support set for Y is
A2 = fy : 1 < y < 1g
If we choose the point (0:9; 0:9) 2 A1 A2, f (0:9; 0:9) = 0 but f1 (0:9) > 0 and
f2 (0:9) > 0 so f (0:9; 0:9) 6= f1 (0:9) f2 (0:9) and therefore X and Y are not independent
random variables.
(b) When the support set is not rectangular care must be taken to determine the marginal
probability functions.
To …nd the marginal probability density function of X we use the description of the support
set in which the range of X does not depend on y which is
A =
n
(x; y) : 0 < x < 1;
p
1 x2 < y <
p
1 x2
o
The marginal probability density function of X is
f1 (x) =
1Z
1
f(x; y)dy
=
p
1x2Z
p1x2
2

dy
=
4

p
1 x2 for 0 < x < 1
and 0 otherwise.
To …nd the marginal probability density function of Y which use the description of the
support set in which the range of Y does not depend on x which is
A =
n
(x; y) : 0 < x <
p
1 y2;1 < y < 1
o
The marginal probability density function of Y is
f2 (y) =
1Z
1
f(x; y)dx
=
p
1y2Z
0
2

dx
=
2

p
1 y2 for 1 < y < 1
and 0 otherwise.
84 3. MULTIVARIATE RANDOM VARIABLES
3.5 Conditional Distributions
In Section 2.1 we de…ned the conditional probability of event A given event B as
P (AjB) = P (A \B)
P (B)
provided P (B) > 0
The concept of conditional probability can also be extended to random variables.
3.5.1 De…nition - Conditional Probability (Density) Function
Suppose X and Y are random variables with joint probability (density) function f(x; y),
and marginal probability (density) functions f1(x) and f2(y) respectively. Suppose also
that the support set of (X;Y ) is A = f(x; y) : f(x; y) > 0g.
The conditional probability (density) function of X given Y = y is given by
f1(xjy) = f(x; y)
f2(y)
(3.2)
for (x; y) 2 A provided f2(y) 6= 0.
The conditional probability (density) function of Y given X = x is given by
f2(yjx) = f(x; y)
f1(x)
(3.3)
for (x; y) 2 A provided f1(x) 6= 0.
Notes:
(1) If X and Y are discrete random variables then
f1(xjy) = P (X = xjY = y)
=
P (X = x; Y = y)
P (Y = y)
=
f(x; y)
f2(y)
and P
x
f1(xjy) =
P
x
f(x; y)
f2(y)
=
1
f2(y)
P
x
f(x; y)
=
f2(y)
f2(y)
= 1
Similarly for f2(yjx).
3.5. CONDITIONAL DISTRIBUTIONS 85
(2) If X and Y are continuous random variables
1Z
1
f1(xjy)dx =
1Z
1
f(x; y)
f2(y)
dx
=
1
f2(y)
1Z
1
f(x; y)dx
=
f2(y)
f2(y)
= 1
Similarly for f2(yjx).
(3) If X is a continuous random variable then f1(x) 6= P (X = x) and P (X = x) = 0 for all
x. Therefore to justify the de…nition of the conditional probability density function of Y
given X = x when X and Y are continuous random variables we consider P (Y yjX = x)
as a limit
P (Y yjX = x) = lim
h!0
P (Y yjx X x+ h)
= lim
h!0
x+hR
x
yR
1
f (u; v) dvdu
x+hR
x
f1 (u) du
= lim
h!0
d
dh
x+hR
x
yR
1
f (u; v) dvdu
d
dh
x+hR
x
f1 (u) du
by L’Hôpital’s Rule
= lim
h!0
yR
1
f (x+ h; v) dv
f1 (x+ h)
by the Fundamental Theorem of Calculus
=
limh!0
yR
1
f (x+ h; v) dv
limh!0 f1 (x+ h)
=
yR
1
f (x; v) dv
f1 (x)
assuming that the limits exist and that integration and the limit operation can be inter-
changed. If we di¤erentiate the last term with respect to y using the Fundamental Theorem
of Calculus we have
d
dy
P (Y yjX = x) = f (x; y)
f1 (x)
86 3. MULTIVARIATE RANDOM VARIABLES
which gives us a justi…cation for using
f2(yjx) = f(x; y)
f1(x)
as the conditional probability density function of Y given X = x.
(4) For a given value of x, call it x, we can think of obtaining the conditional probability
density function of Y given X = x geometrically in the following way. Think of the curve
of intersection which is obtained by cutting through the surface z = f (x; y) with the plane
x = x which is parallel to the yz plane. The curve of intersection is z = f (x; y) which
is a curve lying in the plane x = x. The area under the curve z = f (x; y) and lying
above the xy plane is not necessarily equal to 1 and therefore z = f (x; y) is not a proper
probability density function. However if we consider the curve z = f (x; y) =f1 (x), which
is just a rescaled version of the curve z = f (x; y), then the area lying under the curve
z = f (x; y) =f1 (x) and lying above the xy plane is equal to 1. This is the probability
density function we want.
3.5.2 Example
In Example 3.4.8 determine the conditional probability density function of X given Y = y
and the conditional probability density function of Y given X = x.
Solution
The conditional probability density function of X given Y = y is
f1(xjy) = f(x; y)
f2(y)
=
2

2

p
1 y2
=
1p
1 y2 for 0 < x <
p
1 y2; 1 < y < 1
Note that for each y 2 (1; 1), the conditional probability density function of X given
Y = y is Uniform

0;
p
1 y2

. This makes sense because the joint probability density
function is constant on its support set.
The conditional probability density function of Y given X = x is
f2(yjx) = f(x; y)
f1(x)
=
2

4

p
1 x2
=
1
2
p
1 x2 for
p
1 x2 < y <
p
1 x2; 0 < x < 1
3.5. CONDITIONAL DISTRIBUTIONS 87
Note that for each x 2 (0; 1), the conditional probability density function of Y given X = x
is Uniform

p1 x2;p1 x2

. This again makes sense because the joint probability
density function is constant on its support set.
3.5.3 Exercise
In Exercise 3.2.5 show that the conditional probability function of Y given X = x is
Binomial

n x; 2 (1 )
1 2

Why does this make sense?
3.5.4 Exercise
In Example 3.3.3 and Exercises 3.3.6 and 3.3.7 determine the conditional probability density
function of X given Y = y and the conditional probability density function of Y given
X = x. Be sure to check that
1Z
1
f1(xjy)dx = 1 and
1Z
1
f2(yjx)dy = 1
When choosing a model for bivariate data it is sometimes easier to specify a conditional
probability (density) function and a marginal probability (density) function. The joint
probability (density) function can then be determined using the Product Rule which is
obtained by rewriting (3.2) and (3.3).
3.5.5 Product Rule
Suppose X and Y are random variables with joint probability (density) function f(x; y),
marginal probability (density) functions f1(x) and f2(y) respectively and conditional prob-
ability (density) function’s f1(xjy) and f2(yjx). Then
f(x; y) = f1(xjy)f2(y)
= f2(yjx)f1(x)
3.5.6 Example
In modeling survival in a certain insect population it is assumed that the number of eggs
laid by a single female follows a Poisson() distribution. It is also assumed that each egg
has probability p of surviving independently of any other egg. Determine the probability
function of the number of eggs that survive.
88 3. MULTIVARIATE RANDOM VARIABLES
Solution
Let Y = number of eggs laid and letX = number of eggs that survive. Then Y Poisson()
and XjY = y Binomial(y; p). We want to determine the marginal probability function
of X.
By the Product Rule the joint probability function of X and Y is
f (x; y) = f1 (xjy) f2 (y)
=

y
x

px (1 p)yx
ye
y!
=
pxe
x!
(1 p)yx y
(y x)!
with support set is
A = f(x; y) : x = 0; 1; : : : ; y; y = 0; 1; : : :g
which can also be written as
A = f(x; y) : y = x; x+ 1; : : : ; x = 0; 1; : : :g (3.4)
The marginal probability function of X can be obtained using
f1 (x) =
P
all y
f (x; y)
Since we are summing over y we need to use the second description of the support set given
in (3.4). So
f1 (x) =
1P
y=x
pxe
x!
(1 p)yx y
(y x)!
=
pxex
x!
1P
y=x
(1 p)yx yx
(y x)! let u = y x
=
pxex
x!
1P
u=0
[ (1 p)]u
u!
=
pxex
x!
e(1p) by the Exponential series 2.11.7
=
(p)x ep
x!
for x = 0; 1; : : :
which we recognize as a Poisson(p) probability function.
3.5.7 Example
Determine the marginal probability function of X if Y Gamma(; 1 ) and the conditional
distribution of X given Y = y is Weibull(p; y1=p).
3.5. CONDITIONAL DISTRIBUTIONS 89
Solution
Since Y Gamma(; 1 )
f2 (y) =
y1ey
()
for y > 0
and 0 otherwise.
Since the conditional distribution of X given Y = y is Weibull(p; y1=p)
f1 (xjy) = pyxp1eyxp for x > 0
By the Product Rule the joint probability density function of X and Y is
f (x; y) = f1 (xjy) f2 (y)
= pyxp1eyx
p y1ey
()
=
pxp1
()
yey(+x
p)
The support set is
A = f(x; y) : x > 0; y > 0g
which is a rectangular region.
The marginal probability function of X is
f1 (x) =
1Z
1
f (x; y) dy
=
pxp1
()
1Z
0
yey(+x
p)dy let u = y ( + xp)
=
pxp1
()
1Z
0

u
+ xp

eu

1
+ xp

du
=
pxp1
()

1
+ xp
+1 1Z
0
ueudu
=
pxp1
()

1
+ xp
+1
(+ 1) by 2.4.8
=
pxp1
( + xp)+1
since (+ 1) = ()
for x > 0 and 0 otherwise. This distribution is a member of the Burr family of distributions
which is frequently used by actuaries for modeling household income, crop prices, insurance
risk, and many other …nancial variables.
90 3. MULTIVARIATE RANDOM VARIABLES
The following theorem gives us one more method for determining whether two random
variables are independent.
3.5.8 Theorem
Suppose X and Y are random variables with marginal probability (density) functions f1(x)
and f2(y) respectively and conditional probability (density) functions f1(xjy) and f2(yjx).
Suppose also that A1 is the support set of X, and A2 is the support set of Y . Then X and
Y are independent random variables if and only if either of the following holds
f1(xjy) = f1(x) for all x 2 A1
or
f2(yjx) = f2(y) for all y 2 A2
3.5.9 Example
Suppose the conditional distribution of X given Y = y is
f1(xjy) = e
x
1 ey for 0 < x < y
and 0 otherwise. Are X and Y independent random variables?
Solution
Since the conditional distribution of X given Y = y depends on y then f1(xjy) = f1(x)
cannot hold for all x in the support set of X and therefore X and Y are not independent
random variables.
3.6 Joint Expectations
As with univariate random variables we de…ne the expectation operator for bivariate random
variables. The discrete case is a review of material you would have seen in a previous
probability course.
3.6.1 De…nition - Joint Expectation
Suppose h(x; y) is a real-valued function.
IfX and Y are discrete random variables with joint probability function f(x; y) and support
set A then
E[h(X;Y )] =
P
(x;y)
P
2A
h(x; y)f(x; y)
provided the joint sum converges absolutely.
3.6. JOINT EXPECTATIONS 91
If X and Y are continuous random variables with joint probability density function f(x; y)
then
E[h(X;Y )] =
1Z
1
1Z
1
h(x; y)f(x; y)dxdy
provided the joint integral converges absolutely.
3.6.2 Theorem
Suppose X and Y are random variables with joint probability (density) function f(x; y), a
and b are real constants, and g(x; y) and h(x; y) are real-valued functions. Then
E[ag(X;Y ) + bh(X;Y )] = aE[g(X;Y )] + bE[h(X;Y )]
Proof (Continuous Case)
E[ag(X;Y ) + bh(X;Y )] =
1Z
1
1Z
1
[ag(x; y) + bh(x; y)] f(x; y)dxdy
= a
1Z
1
1Z
1
g(x; y)f(x; y)dxdy + b
1Z
1
1Z
1
h(x; y)f(x; y)dxdy
by properties of double integrals
= aE[g(X;Y )] + bE[h(X;Y )] by De…nition 3.6.1
3.6.3 Corollary
(1)
E(aX + bY ) = aE(X) + bE(Y ) = aX + bY
where X = E(X) and Y = E(Y ).
(2) If X1; X2; : : : ; Xn are random variables and a1; a2; : : : ; an are real constants then
E

nP
i=1
aiXi

=
nP
i=1
aiE(Xi) =
nP
i=1
aii
where i = E(Xi).
(3) If X1; X2; : : : ; Xn are random variables with E(Xi) = , i = 1; 2; : : : ; n then
E

X

=
1
n
nP
i=1
E (Xi) =
1
n
nP
i=1
=
n
n
=
92 3. MULTIVARIATE RANDOM VARIABLES
Proof of (1) (Continuous Case)
E(aX + bY ) =
1Z
1
1Z
1
(ax+ by) f(x; y)dxdy
= a
1Z
1
x
24 1Z
1
f(x; y)dy
35 dx+ b 1Z
1
y
24 1Z
1
f(x; y)dx
35 dy
= a
1Z
1
xf1 (x) dx+ b
1Z
1
yf2 (y) dy
= aE(X) + bE(Y ) = aX + bY
3.6.4 Theorem - Expectation and Independence
(1) IfX and Y are independent random variables and g(x) and h(y) are real valued functions
then
E [g (X)h (Y )] = E [g (X)]E [h (Y )]
(2) More generally if X1; X2; : : : ; Xn are independent random variables and h1; h2; : : : ; hn
are real valued functions then
E

nQ
i=1
hi(Xi)

=
nQ
i=1
E [hi(Xi)]
Proof of (1) (Continuous Case)
Since X and Y are independent random variables then by Theorem 3.4.2
f (x; y) = f1 (x) f2 (y) for all (x; y) 2 A
where A is the support set of (X;Y ). Therefore
E [g(X)h(Y )] =
1Z
1
1Z
1
g(x)h(y)f1 (x) f2 (y) dxdy
=
1Z
1
h(y)f2 (y)
24 1Z
1
g(x)f1 (x) dx
35 dy
= E [g (X)]
1Z
1
h(y)f2 (y) dy
= E [g (X)]E [h (Y )]
3.6. JOINT EXPECTATIONS 93
3.6.5 De…nition - Covariance
The covariance of random variables X and Y is de…ned by
Cov(X;Y ) = E[(X X)(Y Y )]
If Cov(X;Y ) = 0 then X and Y are called uncorrelated random variables.
3.6.6 Theorem - Covariance and Independence
If X and Y are random variables then
Cov(X;Y ) = E(XY ) XY
If X and Y are independent random variables then Cov(X;Y ) = 0.
Proof
Cov(X;Y ) = E [(X X) (Y Y )]
= E (XY XY XY + XY )
= E(XY ) XE(Y ) YE(X) + XY
= E(XY ) E(X)E(Y ) E(Y )E(X) + E(X)E(Y )
= E(XY ) E(X)E(Y )
Now if X and Y are independent random variables then by Theorem 3.6.4
E (XY ) = E(X)E(Y ) and therefore Cov(X;Y ) = 0.
3.6.7 Theorem - Variance of a Linear Combination
(1) Suppose X and Y are random variables and a and b are real constants then
V ar(aX + bY ) = a2V ar(X) + b2V ar(Y ) + 2abCov(X;Y )
= a22X + b
22Y + 2abCov(X;Y )
(2) Suppose X1; X2; : : : ; Xn are random variables with V ar(Xi) = 2i and a1; a2; : : : ; an are
real constants then
V ar

nP
i=1
aiXi

=
nP
i=1
a2i
2
i + 2
n1P
i=1
nP
j=i+1
aiajCov (Xi; Xj)
(3) IfX1; X2; : : : ; Xn are independents random variables and a1; a2; : : : ; an are real constants
then
V ar

nP
i=1
aiXi

=
nP
i=1
a2i
2
i
94 3. MULTIVARIATE RANDOM VARIABLES
(4) If X1; X2; : : : ; Xn are independent random variables with V ar(Xi) = 2, i = 1; 2; : : : ; n
then
V ar

X

= V ar

1
n
nP
i=1
Xi

=

1
n
2 nP
i=1
V ar (Xi)
=
1
n2
nP
i=1
2 =
n2
n2
=
2
n
Proof of (1)
V ar (aX + bY ) = E
h
(aX + bY aX bY )2
i
= E
n
[a (X X) + b (Y Y )]2
o
= E
h
a2 (X X)2 + b2 (Y Y )2 + 2ab (X X) (Y Y )
i
= a2E
h
(X X)2
i
+ b2E
h
(Y Y )2
i
+ 2abE [(X X) (Y Y )]
= a22X + b
22Y + 2abCov (X;Y )
3.6.8 De…nition - Correlation Coe¢ cient
The correlation coe¢ cient of random variables X and Y is de…ned by
(X;Y ) =
Cov(X;Y )
XY
3.6.9 Example
For the joint probability density function in Example 3.3.3 …nd (X;Y ).
Solution
E (XY ) =
1Z
1
1Z
1
xyf(x; y)dxdy =
1Z
0
1Z
0
xy (x+ y) dxdy
=
1Z
0
1Z
0

x2y + xy2

dxdy =
1Z
0

1
3
x3y +
1
2
x2y2

j10dy
=
1Z
0

1
3
y +
1
2
y2

dy =

1
6
y2 +
1
6
y3

j10 =
1
6
+
1
6
=
1
3
3.6. JOINT EXPECTATIONS 95
E (X) =
1Z
1
xf1(x)dx =
1Z
0
x

x+
1
2

dx =
1Z
0

x2 +
1
2
x

dx
=

1
3
x3 +
1
4
x2

j10
=
1
3
+
1
4
=
7
12
E

X2

=
1Z
1
x2f1(x)dx =
1Z
0
x2

x+
1
2

dx =
1Z
0

x3 +
1
2
x2

dx
=

1
4
x4 +
1
6
x3

j10
=
1
4
+
1
6
=
3 + 2
12
=
5
12
V ar (X) = E

X2
[E (X)]2 = 5
12


7
12
2
=
60 49
144
=
11
144
By symmetry
E (Y ) =
7
12
and V ar (Y ) =
11
144
Therefore
Cov (X;Y ) = E (XY ) E (X)E (Y ) = 1
3


7
12

7
12

=
48 49
144
=
1
144
and
(X;Y ) =
Cov(X;Y )
XY
=
1
144q
11
144

11
144

=
1
144

144
11

= 1
11
3.6.10 Exercise
For the joint probability density function in Exercise 3.3.7 …nd (X;Y ).
96 3. MULTIVARIATE RANDOM VARIABLES
3.6.11 Theorem
If (X;Y ) is the correlation coe¢ cient of random variables X and Y then
1 (X;Y ) 1
(X;Y ) = 1 if and only if Y = aX + b for some a > 0 and (X;Y ) = 1 if and only if
Y = aX + b for some a < 0.
Proof
Let S = X + tY , where t 2 <. Then E (S) = S and
V ar(S) = E

(S S)2

= Ef[(X + tY ) (X + tY )]2g
= Ef[(X X) + t(Y Y )]2g
= E

(X X)2 + 2t(X X)(Y Y ) + t2(Y Y )2

= t22Y + 2Cov(X;Y )t+
2
X
Now V ar(S) 0 for any t 2 < implies that the quadratic equation V ar(S) = t22Y +
2Cov(X;Y )t + 2X in the variable t must have at most one real root. To have at most
one real root the discrimant of this quadratic equation must be less than or equal to zero.
Therefore
[2Cov(X;Y )]2 42X2Y 0
or
[Cov(X;Y )]2 2X2Y
or
j(X;Y )j =
Cov(X;Y )XY
1
and therefore
1 (X;Y ) 1
To see that (X;Y ) = 1 corresponds to a linear relationship between X and Y , note
that (X;Y ) = 1 implies
jCov(X;Y )j = XY
and therefore
[2Cov(X;Y )]2 42X2Y = 0
which corresponds to a zero discriminant in the quadratic equation. This means that there
exists one real number t for which
V ar(S) = V ar(X + tY ) = 0
But V ar(X + tY ) = 0 implies X + tY must equal a constant, that is, X + tY = c. Thus
X and Y satisfy a linear relationship.
3.7. CONDITIONAL EXPECTATION 97
3.7 Conditional Expectation
Since conditional probability (density) functions are also probability (density) function,
expectations can be de…ned in terms of these conditional probability (density) functions as
in the following de…nition.
3.7.1 De…nition - Conditional Expectation
The conditional expectation of g(X) given Y = y is given by
E [g(X)jy] = P
x
g(x)f1(xjy)
if Y is a discrete random variable and
E[g(X)jy] =
1Z
1
g(x)f1(xjy)dx
if Y is a continuous random variable provided the sum/integral converges absolutely.
The conditional expectation of h(Y ) given X = x is de…ned in a similar manner.
3.7.2 Special Cases
(1) The conditional mean of X given Y = y is denoted by E (Xjy).
(2) The conditional variance of X given Y = y is denoted by V ar (Xjy) and is given by
V ar (Xjy) = E
n
[X E (Xjy)]2 jy
o
= E

X2jy [E (Xjy)]2
3.7.3 Example
For the joint probability density function in Example 3.4.8 …nd E (Y jx) the conditional
mean of Y given X = x, and V ar (Y jx) the conditional variance of Y given X = x.
Solution
From Example 3.5.2 we have
f2(yjx) = 1
2
p
1 x2 for
p
1 x2 < y <
p
1 x2; 0 < x < 1
98 3. MULTIVARIATE RANDOM VARIABLES
Therefore
E (Y jx) =
1Z
1
yf2(yjx)dy
=
p
1x2Z
p1x2
y
1
2
p
1 x2dy
=
1
2
p
1 x2
p
1x2Z
p1x2
ydy
=
1p
1 x2

y2j
p
1x2
p1x2

= 0
Since E (Y jx) = 0
V ar (Y jx) = E Y 2jx = 1Z
1
y2f2(yjx)dy
=
p
1x2Z
p1x2
y2
1
2
p
1 x2dy
=
1
2
p
1 x2
p
1x2Z
p1x2
y2dy
=
1
6
p
1 x2

y3j
p
1x2
p1x2

=
1
3

1 x2
Recall that the conditional distribution of Y given X = x is Uniform
p
1 x2;p1 x2

.
The results above can be veri…ed by noting that if U Uniform(a; b) then E (U) = a+b2
and V ar (U) = (ba)
2
12 .
3.7.4 Exercise
In Exercises 3.5.3 and 3.3.7 …nd E (Y jx), V ar (Y jx), E (Xjy) and V ar (Xjy).
3.7.5 Theorem
If X and Y are independent random variables then E [g (X) jy] = E [g (X)] and
E [h (Y ) jx] = E [h (Y )].
3.7. CONDITIONAL EXPECTATION 99
Proof (Continuous Case)
E[g(X)jy] =
1Z
1
g(x)f1(xjy)dx
=
1Z
1
g(x)f1(x)dx by Theorem 3.5.8
= E [g (X)]
as required.
E [h (Y ) jx] = E [h (Y )] follows in a similar manner.
3.7.6 De…nition
E [g (X) jY ] is the function of the random variable Y whose value is E [g (X) jy] when Y = y.
This means of course that E [g (X) jY ] is a random variable.
3.7.7 Example
In Example 3.5.6 the joint model was speci…ed by Y Poisson() and
XjY = y Binomial(y; p) and we showed that X Poisson(p). Determine E (XjY = y),
E (XjY ), E[E(XjY )], and E (X). What do you notice about E[E(XjY )] and E (X).
Solution
Since XjY = y Binomial(y; p)
E (XjY = y) = py
and
E (XjY ) = pY
which is a random variable. Since Y Poisson() and E (Y ) =
E [E (XjY )] = E (pY ) = pE (Y ) = p
Now since X Poisson(p) then E (X) = p.
We notice that
E [E (XjY )] = E (X)
The following theorem indicates that this result holds generally.
100 3. MULTIVARIATE RANDOM VARIABLES
3.7.8 Theorem
Suppose X and Y are random variables then
E fE [g (X) jY ]g = E [g (X)]
Proof (Continuous Case)
E fE [g (X) jY ]g = E
24 1Z
1
g (x) f1 (xjy) dx
35
=
1Z
1
24 1Z
1
g (x) f1 (xjy) dx
35 f2 (y) dy
=
1Z
1
1Z
1
g (x) f1 (xjy) f2 (y) dxdy
=
1Z
1
g (x)
24 1Z
1
f (x; y) dy
35 dx
=
1Z
1
g (x) f1 (x) dx
= E [g (X)]
3.7.9 Corollary - Law of Total Expectation
Suppose X and Y are random variables then
E[E(XjY )] = E(X)
Proof
Let g (X) = X in Theorem 3.7.8 and the result follows.
3.7.10 Theorem - Law of Total Variance
Suppose X and Y are random variables then
V ar(X) = E[V ar(XjY )] + V ar[E(XjY )]
Proof
V ar (X) = E

X2
[E (X)]2
= E

E

X2jY fE [E (XjY )]g2 by Theorem 3.7.8
= E

E

X2jY E n[E (XjY )]2o+ E n[E(XjY )]2o fE [E(XjY )]g2
= E[V ar(XjY )] + V ar[E(XjY )]
3.7. CONDITIONAL EXPECTATION 101
When the joint model is speci…ed in terms of a conditional distribution XjY = y and
a marginal distribution for Y then Theorems 3.7.8 and 3.7.10 give a method for …nding
expectations for functions of X without having to determine the marginal distribution for
X.
3.7.11 Example
Suppose P Uniform(0; 0:1) and Y jP = p Binomial(10; p). Find E(Y ) and V ar(Y ).
Solution
Since P Uniform(0; 0:1)
E (P ) =
0 + 0:1
2
=
1
20
, V ar (P ) =
(0:1 0)2
12
=
1
1200
and
E

P 2

= V ar (P ) + [E (P )]2 =
1
1200
+

1
20
2
=
1
1200
+
1
400
=
1 + 3
1200
=
4
1200
=
1
300
Since Y jP = p Binomial(10; p)
E (Y jp) = 10p, E (Y jP ) = 10P
and
V ar (Y jp) = 10p (1 p) , V ar (Y jP ) = 10P (1 P ) = 10 P P 2
Therefore
E (Y ) = E [E (Y jP )] = E (10P ) = 10E (P ) = 10

1
20

=
1
2
and
V ar(Y ) = E[V ar(Y jP )] + V ar[E(Y jP )]
= E

10

P P 2+ V ar (10P )
= 10

E (P ) E P 2+ 100V ar (P )
= 10

1
20
1
300

+ 100

1
1200

=
11
20
3.7.12 Exercise
In Example 3.5.7 …nd E (X) and V ar (X) using Corollary 3.7.9 and Theorem 3.7.10.
3.7.13 Exercise
Suppose P Beta(a; b) and Y jP = p Geometric(p). Find E(Y ) and V ar(Y ).
102 3. MULTIVARIATE RANDOM VARIABLES
3.8 Joint Moment Generating Functions
Moment generating functions can also be de…ned for bivariate and multivariate random
variables. As mentioned previously, moment generating functions are a powerful tool for
determining the distributions of functions of random variables (Chapter 4), particularly
sums, as well as determining the limiting distribution of a sequence of random variables
(Chapter 5).
3.8.1 De…nition - Joint Moment Generating Function
If X and Y are random variables then
M (t1; t2) = E

et1X+t2Y

is called the joint moment generating function of X and Y if this expectation exists (joint
sum/integral converges absolutely) for all t1 2 (h1; h1) for some h1 > 0, and all
t2 2 (h2; h2) for some h2 > 0.
More generally if X1; X2; : : : ; Xn are random variables then
M (t1; t2; : : : ; tn) = E

exp

nP
i=1
tiXi

is called the joint moment generating function of X1; X2; : : : ; Xn if this expectation exists
for all ti 2 (hi; hi) for some hi > 0, i = 1; 2; : : : ; n.
If the joint moment generating function is known that it is straightforward to obtain the
moment generating functions of the marginal distributions.
3.8.2 Important Note
If M(t1; t2) exists for all t1 2 (h1; h1) for some h1 > 0, and all t2 2 (h2; h2) for some
h2 > 0, then the moment generating function of X is given by
MX(t) = E(e
tX) = M(t; 0) for t 2 (h1; h1)
and the moment generating function of Y is given by
MY (t) = E(e
tY ) = M(0; t) for t 2 (h2; h2)
3.8.3 Example
Suppose X and Y are continuous random variables with joint probability density function
f (x; y) = ey for 0 < x < y <1
and 0 otherwise.
3.8. JOINT MOMENT GENERATING FUNCTIONS 103
(a) Find the joint moment generating function of X and Y .
(b) What is the moment generating function of X and what is the marginal distribution of
X?
(c) What is the moment generating function of Y and what is the marginal distribution of
Y ?
Solution
(a) The joint moment generating function is
M (t1; t2) = E

et1X+t2Y

=
1Z
1
1Z
1
et1x+t2yf (x; y) dxdy
=
1Z
y=0
yZ
x=0
et1x+t2yeydxdy
=
1Z
0
et2yy
0@ yZ
0
et1xdx
1A dy
=
1Z
0
et2yy

1
t1
et1xjy0

dy
=
1
t1
1Z
0
et2yy

et1y 1 dy
=
1
t1
1Z
0

e(1t1t2)y e(1t2)y

dy
which converges for t1 + t2 < 1 and t2 < 1
Therefore
M (t1; t2) =
1
t1
lim
b!1
1
1 t1 t2 e
(1t1t2)yjb0 +
1
1 t2 e
(1t2)yjb0

=
1
t1
lim
b!1
1
1 t1 t2
h
e(1t1t2)b 1
i
+
1
1 t2
h
e(1t2)b 1
i
=
1
t1

1
1 t1 t2
1
1 t2

=
1
t1
(1 t2) (1 t1 t2)
(1 t1 t2) (1 t2)
=
1
(1 t1 t2) (1 t2) for t1 + t2 < 1 and t2 < 1
104 3. MULTIVARIATE RANDOM VARIABLES
(b) The moment generating function of X is
MX(t) = E(e
tX)
= M(t; 0)
=
1
(1 t 0) (1 0)
=
1
1 t for t < 1
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Exponential(1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, X has a Exponential(1) distribu-
tion.
(c) The moment generating function of Y is
MY (t) = E(e
tY )
= M(0; t)
=
1
(1 0 t) (1 t)
=
1
(1 t)2 for t < 1
By examining the list of moment generating functions in Chapter 11 we see that this
is the moment generating function of a Gamma(2; 1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, Y has a Gamma(2; 1) distribution.
3.8.4 Example
Suppose X and Y are continuous random variables with joint probability density function
f (x; y) = exy for x > 0, y > 0
and 0 otherwise.
(a) Find the joint moment generating function of X and Y .
(b) What is the moment generating function of X and what is the marginal distribution of
X?
(c) What is the moment generating function of Y and what is the marginal distribution of
Y ?
3.8. JOINT MOMENT GENERATING FUNCTIONS 105
Solution
(a) The joint moment generating function is
M (t1; t2) = E

et1X+t2Y

=
1Z
1
1Z
1
et1x+t2yf (x; y) dxdy
=
1Z
0
1Z
0
et1x+t2yexydxdy
=
0@ 1Z
0
ey(1t2)dy
1A0@ 1Z
0
ex(1t1)dx
1A which converges for t1 < 1, t2 < 1
= lim
b!1

ey(1t2)
(1 t2) j
b
0
!
lim
b!1

ex(1t1)
(1 t1) j
b
0
!
=

1
1 t1

1
1 t2

for t1 < 1, t2 < 1
(b) The moment generating function of X is
MX(t) = E(e
tX) = M(t; 0)
=
1
(1 t) (1 0)
=
1
1 t for t < 1
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Exponential(1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, X has a Exponential(1) distribu-
tion.
(c) The moment generating function of Y is
MY (t) = E(e
tY ) = M(0; t)
=
1
(1 0) (1 t)
=
1
1 t for t < 1
By examining the list of moment generating functions in Chapter 11 we see that this is
the moment generating function of a Exponential(1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, Y has a Exponential(1) distribu-
tion.
106 3. MULTIVARIATE RANDOM VARIABLES
3.8.5 Theorem
If X and Y are random variables with joint moment generating function M(t1; t2) which
exists for all t1 2 (h1; h1) for some h1 > 0, and all t2 2 (h2; h2) for some h2 > 0 then
E(XjY k) =
@j+k
@tj1@t
k
2
M(t1; t2)j(t1;t2)=(0;0)
Proof
See Problem 11(a).
3.8.6 Independence Theorem for Moment Generating Functions
Suppose X and Y are random variables with joint moment generating function M(t1; t2)
which exists for all t1 2 (h1; h1) for some h1 > 0, and all t2 2 (h2; h2) for some h2 > 0.
Then X and Y are independent random variables if and only if
M (t1; t2) = MX(t1)MY (t2)
for all t1 2 (h1; h1) and t2 2 (h2; h2) where MX(t1) = M(t1; 0) and MY (t2) = M(0; t2).
Proof
See Problem 11(b).
3.8.7 Example
Use Theorem 3.8.6 to determine if X and Y are independent random variables in Examples
3.8.3 and 3.8.4.
Solution
For Example 3.8.3
M (t1; t2) =
1
(1 t1 t2) (1 t2) for t1 + t2 < 1 and t2 < 1
MX (t1) =
1
1 t1 for t1 < 1
MY (t2) =
1
(1 t2)2
for t2 < 1
Since
M

1
4
;
1
4

=
1
1 14 14

1 14
= 8
3
6= MX

1
4

MY

1
4

=
1
1 14
1
1 14
2 = 43
3
then by Theorem 3.8.6 X and Y are not independent random variables.
3.8. JOINT MOMENT GENERATING FUNCTIONS 107
For Example 3.8.4
M (t1; t2) =

1
1 t1

1
1 t2

for t1 < 1, t1 < 1
MX (t1) =
1
1 t1 for t1 < 1
MY (t2) =
1
1 t2 for t2 < 1
Since
M (t1; t2) =

1
1 t1

1
1 t2

= MX (t1)MY (t2) for all t1 < 1, t1 < 1
then by Theorem 3.8.6 X and Y are independent random variables.
3.8.8 Example
Suppose X1; X2; : : : ; Xn are independent and identically distributed random variables each
with moment generating function M (t), t 2 (h; h) for some h > 0. Find M (t1; t2; : : : ; tn)
the joint moment generating function of X1; X2; : : : ; Xn. Find the moment generating
function of T =
nP
i=1
Xi.
Solution
Since the Xi’s are independent random variables each with moment generating function
M (t), t 2 (h; h) for some h > 0, the joint moment generating function of X1; X2; : : : ; Xn
is
M (t1; t2; : : : ; tn) = E

exp

nP
i=1
tiXi

= E

nQ
i=1
etiXi

=
nQ
i=1
E

etiXi

=
nQ
i=1
M (ti) for ti 2 (h; h) ; i = 1; 2; : : : ; n for some h > 0
The moment generating function of T =
nP
i=1
Xi is
MT (t) = E

etT

= E

exp

nP
i=1
tXi

= M (t; t; : : : ; t)
=
nQ
i=1
M (t)
= [M (t)]n for t 2 (h; h) for some h > 0
108 3. MULTIVARIATE RANDOM VARIABLES
3.9 Multinomial Distribution
The discrete multivariate distribution which is the most widely used is the Multinomial
distribution which was introduced in a previous probability course.
We give its joint probability function and its important properties here.
3.9.1 De…nition - Multinomial Distribution
Suppose (X1; X2; : : : ; Xk) are discrete random variables with joint probability function
f (x1; x2; : : : ; xk) =
n!
x1!x2! xk!p
x1
1 p
x2
2 pxkk
for xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k;
kP
i=1
xi = n; 0 pi 1; i = 1; 2; : : : ; k;
kP
i=1
pi = 1
Then (X1; X2; : : : ; Xk) is said to have a Multinomial distribution.
We write (X1; X2; : : : ; Xk) Multinomial(n; p1; p2; : : : ; pk).
Notes: (1) Since
kP
i=1
Xi = n, the Multinomial distribution is actually a joint distribution
for k 1 random variables which can be written as
f (x1; x2; : : : ; xk1) =
n!
x1!x2! xk1!

n
k1P
i=1
xi

!
px11 p
x2
2 pxk1k1

1
k1P
i=1
pi
nk1P
i=1
xi
for xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k1;
k1P
i=1
xi n; 0 pi 1; i = 1; 2; : : : ; k1;
k1P
i=1
pi 1
(2) If k = 2 we obtain the familiar Binomial distribution
f (x1) =
n!
x1! (n x1)!p
x1
1 (1 p1)nx1
=

n
x1

px11 (1 p1)nx1
for x1 = 0; 1; : : : ; n; 0 p1 1
(2) If k = 3 we obtain the Trinomial distribution
f (x1; x2) =
n!
x1!x2! (n x1 x2)!p
x1
1 p
x2
2 (1 p1 p2)nx1x2
for xi = 0; 1; : : : ; n; i = 1; 2, x1 + x2 n and 0 pi 1; i = 1; 2; p1 + p2 1
3.9. MULTINOMIAL DISTRIBUTION 109
3.9.2 Theorem - Properties of the Multinomial Distribution
If (X1; X2; : : : ; Xk) Multinomial(n; p1; p2; : : : ; pk), then
(1) (X1; X2; : : : ; Xk1) has joint moment generating function
M (t1; t2; : : : ; tk1) = E

et1X1+t2X2++tk1Xk1

=

p1e
t1 + p2e
t2 + pk1etk1 + pk
n
(3.5)
for (t1; t2; : : : ; tk1) 2 (2) Any subset of X1; X2; : : : ; Xk also has a Multinomial distribution. In particular
Xi Binomial(n; pi) for i = 1; 2; : : : ; k
(3) If T = Xi +Xj ; i 6= j; then
T Binomial (n; pi + pj)
(4)
Cov (Xi; Xj) = npipj for i 6= j
(5) The conditional distribution of any subset of (X1; X2; : : : ; Xk) given the remaining of
the coordinates is a Multinomial distribution. In particular the conditional probability
function of Xi given Xj = xj ; i 6= j; is
XijXj = xj Binomial

n xj ; pi
1 pj

(6) The conditional distribution of Xi given T = Xi +Xj = t; i 6= j; is
XijXi +Xj = t Binomial

t;
pi
pi + pj

3.9.3 Example
Suppose (X1; X2; : : : ; Xk) Multinomial(n; p1; p2; : : : ; pk)
(a) Prove (X1; X2; : : : ; Xk1) has joint moment generating function
M (t1; t2; : : : ; tk1) =

p1e
t1 + p2e
t2 + pk1etk1 + pk
n
for (t1; t2; : : : ; tk1) 2 (b) Prove (X1; X2) Multinomial(n; p1; p1; 1 p1 p2).
(c) Prove Xi Binomial(n; pi) for i = 1; 2; : : : ; k.
(d) Prove T = X1 +X2 Binomial(n; p1 + p2).
110 3. MULTIVARIATE RANDOM VARIABLES
Solution
(a) Let A =

(x1; x2; : : : ; xk) : xi = 0; 1; : : : ; n; i = 1; 2; : : : ; k;
kP
i=1
xi = n

then
M (t1; t2; : : : ; tk1) = E

et1X1+t2X2++tk1Xk1

=
P
(x1;x2;
P
:::;xk)
P
2 A
et1x1+t2x2++tk1xk1
n!
x1!x2! xk!p
x1
1 p
x2
2 pxk1k1 pxkk
=
P
(x1;x2;
P
:::;xk)
P
2 A
n!
x1!x2! xk!

p1e
t1
x1 p2et2x2 pk1etk1xk1 pxkk
=

p1e
t1 + p2e
t2 + pk1etk1 + pk
n
for (t1; t2; : : : ; tk1) 2 by the Multinomial Theorem 2.11.5.
(b) The joint moment generating function of (X1; X2) is
M (t1; t2; 0; : : : ; 0) =

p1e
t1 + p2e
t2 + (1 p1 p2)
n
for (t1; t2) 2 <2
which is of the form 3.5 so by the Uniqueness Theorem for Moment Generating Functions,
(X1; X2) Multinomial(n; p1; p1; 1 p1 p2).
(c) The moment generating function of Xi is
M (0; 0; : : : ; t; 0; : : : ; 0) =

pie
ti + (1 pi)
n
for ti 2 <
for i = 1; 2; : : : ; k which is the moment generating function of a Binomial(n; pi) random vari-
able. By the Uniqueness Theorem for Moment Generating Functions, Xi Binomial(n; pi)
for i = 1; 2; : : : ; k.
(d) The moment generating function of T = X1 +X2 is
MT (t) = E

etT

= E

et(X1+X2)

= E

etX1+tX2

= M (t; t; 0; 0; : : : ; 0)
=

p1e
t + p2e
t + (1 p1 p2)
n
=

(p1 + p2) e
t + (1 p1 p2)
n
for t 2 <
which is the moment generating function of a Binomial(n; p1 + p2) random variable. By
the Uniqueness Theorem for Moment Generating Functions, T Binomial(n; p1 + p2).
3.9.4 Exercise
Prove property (3) in Theorem 3.9.2.
3.10. BIVARIATE NORMAL DISTRIBUTION 111
3.10 Bivariate Normal Distribution
The best known bivariate continuous distribution is the Bivariate Normal distribution. We
give its joint probability density function written in vector notation so we can easily intro-
duce the multivariate version of this distribution called the Multivariate Normal distribution
in Chapter 7.
3.10.1 De…nition - Bivariate Normal Distribution (BVN)
Suppose X1 and X2 are random variables with joint probability density function
f(x1; x2) =
1
2jj1=2 exp

1
2
(x )1(x )T

for (x1; x2) 2 <2
where
x =
h
x1 x2
i
; =
h
1 2
i
; =
"
21 12
12
2
2
#
and is an nonsingular matrix. Then X = (X1; X2) is said to have a bivariate normal
distribution. We write X BVN(;).
The Bivariate Normal distribution has many special properties.
3.10.2 Theorem - Properties of the BVN Distribution
If X BVN(;), then
(1) X1; X2 has joint moment generating function
M(t1; t2) = E

et1X1+t2X2

= E

exp

XtT

= exp

tT +
1
2
ttT

for all t = (t1; t2) 2 <2
(2) X1 N(1; 21) and X2 N(2; 22).
(3) Cov (X1; X2) = 12 and Cor (X1; X2) = where 1 1.
(4) X1 and X2 are independent random variables if and only if = 0.
(5) If c = (c1; c2) is a nonzero vector of constants then
c1X1 + c2X2 N

cT ; ccT

(6) If A is a 2 2 nonsingular matrix and b is a 1 2 vector then
XA+ b BVN A+ b; ATA
(7)
X2jX1 = x1 N

2 + 2(x1 1)=1; 22(1 2)

112 3. MULTIVARIATE RANDOM VARIABLES
and
X1jX2 = x2 N

1 + 1(x2 2)=2; 21(1 2)

(8) (X )1(X )T 2(2)
Proof
For proofs of properties (1) (4) and (6) (7) see Problem 13.
(5) The moment generating function of c1X1 + c2X2 is
E

et(c1X1+c2X2)

= E

e(c1t)X1+(c2t)X2

= exp
h
1 2
i " c1t
c2t
#
+
1
2
h
c1t c2t
i

"
c1t
c2t
#!
= exp
h
1 2
i " c1
c2
#
t+
1
2
h
c1 c2
i

"
c1
c2
#
t2
!
= exp

cT

t+
1
2

ccT

t2

for t 2 < where c = (c1; c2)
which is the moment generating function of a N

cT ; ccT

random variable. Therefore by
the Uniqueness Theorem for Moment Generating Functions, c1X1 +c2X2 N

cT ; ccT

.
The BVN joint probability density function is graphed in Figures 3.8 - 3.10.
-3
-2
-1
0
1
2
3
-3
-2
-1
0
1
2
3
0
0.05
0.1
0.15
0.2
xy
f(
x,
y)
Figure 3.8: Graph of BVN p.d.f. with =
h
0 0
i
and =
"
1 0
0 1
#
The graphs all have the same mean vector = [0 0] but di¤erent variance/covariance
matrices . The axes all have the same scale.
3.10. BIVARIATE NORMAL DISTRIBUTION 113
-3
-2
-1
0
1
2
3
-3
-2
-1
0
1
2
3
0
0.05
0.1
0.15
0.2
xy
f(
x,
y)
Figure 3.9: Graph of BVN p.d.f. with =
h
0 0
i
and =
"
0:6 0:5
0:5 1
#
-3
-2
-1
0
1
2
3
-3
-2
-1
0
1
2
3
0
0.05
0.1
0.15
0.2
xy
f(
x,
y)
Figure 3.10: Graph of BVN p.d.f. with =
h
0 0
i
and =
"
0:6 0:5
0:5 1
#
114 3. MULTIVARIATE RANDOM VARIABLES
3.11 Calculus Review
Consider the region R in the xy-plane in Figure 3.11.
x
y
y=g(x)
y=h(x)x=a x=b
R
Figure 3.11: Region 1
Suppose f(x; y) 0 for all (x; y) 2 <2. The graph of z = f(x; y) is a surface in 3-space
lying above or touching the xy-plane. The volume of the solid bounded by the surface
z = f(x; y) and the xy-plane above the region R is given by
Volume =
bZ
x=a
h(x)Z
y=g(x)
f(x; y)dydx
x
y
x=g(y) x=h(y)
y=a
y=b
R
Figure 3.12: Region 2
3.11. CALCULUS REVIEW 115
If R is the region in Figure 3.12 then the volume is given by
Volume =
dZ
y=c
h(y)Z
x=g(y)
f(x; y)dxdy
Give an expression for the volume of the solid bounded by the surface z = f(x; y) and
the xy-plane above the region R = R1 [R2 in Figure 3.13.
a1 a2 a3
b1
b2
b3
x
y
R1
R2
y=g(x) x=h(y)
Figure 3.13: Region 3
116 3. MULTIVARIATE RANDOM VARIABLES
3.12 Chapter 3 Problems
1. Suppose X and Y are discrete random variables with joint probability function
f(x; y) = kq2px+y for x = 0; 1; : : : ; y = 0; 1; : : : ; 0 < p < 1; q = 1 p
(a) Determine the value of k.
(b) Find the marginal probability function of X and the marginal probability func-
tion of Y . Are X and Y independent random variables?
(c) Find P (X = xjX + Y = t).
2. Suppose X and Y are discrete random variables with joint probability function
f(x; y) =
e2
x!(y x)! for x = 0; 1; : : : ; y; y = 0; 1; : : :
(a) Find the marginal probability function of X and the marginal probability func-
tion of Y .
(b) Are X and Y independent random variables?
3. Suppose X and Y are continuous random variables with joint probability density
function
f(x; y) = k(x2 + y) for 0 < y < 1 x2; 1 < x < 1
(a) Determine k:
(b) Find the marginal probability density function ofX and the marginal probability
density function of Y .
(c) Are X and Y independent random variables?
(d) Find P (Y X + 1).
4. Suppose X and Y are continuous random variables with joint probability density
function
f(x; y) = kx2y for x2 < y < 1
(a) Determine k.
(b) Find the marginal probability density function ofX and the marginal probability
density function of Y .
(c) Are X and Y independent random variables?
(d) Find P (X Y ).
(e) Find the conditional probability density function of X given Y = y and the
conditional probability density function of Y given X = x.
3.12. CHAPTER 3 PROBLEMS 117
5. Suppose X and Y are continuous random variables with joint probability density
function
f(x; y) = kxey for 0 < x < 1; 0 < y <1
(a) Determine k.
(b) Find the marginal probability density function ofX and the marginal probability
density function of Y .
(c) Are X and Y independent random variables?
(d) Find P (X + Y t).
6. Suppose each of the following functions is a joint probability density function for
continuous random variables X and Y .
(a) f (x; y) = k for 0 < x < y < 1
(b) f (x; y) = kx for 0 < x2 < y < 1
(c) f (x; y) = kxy for 0 < y < x < 1
(d) f (x; y) = k (x+ y) for 0 < x < y < 1
(e) f (x; y) = kx for 0 < y < x < 1
(f) f (x; y) = kx2y for 0 < x < 1; 0 < y < 1; 0 < x+ y < 1
(g) f (x; y) = kex2y for 0 < y < x <1
In each case:
(i) Determine k.
(ii) Find the marginal probability density function of X and the marginal prob-
ability density function of Y .
(iii) Find the conditional probability density function of X given Y = y and the
conditional probability density function of Y given X = x.
(iv) Find E (Xjy) and E (Y jx).
7. Suppose X Uniform(0; 1) and the conditional probability density function of Y
given X = x is
f2 (yjx) = 1
1 x for 0 < x < y < 1
Determine:
(a) the joint probability density function of X and Y
(b) the marginal probability density function of Y
(c) the conditional probability density function of X given Y = y.
118 3. MULTIVARIATE RANDOM VARIABLES
8. Suppose X and Y are continuous random variables. Suppose also that the marginal
probability density function of X is
f1 (x) =
1
3
(1 + 4x) for 0 < x < 1
and the conditional probability density function of Y given X = x is
f2 (yjx) = 2y + 4x
1 + 4x
for 0 < x < 1; 0 < y < 1
Determine:
(a) the joint probability density function of X and Y
(b) the marginal probability density function of Y
(c) the conditional probability density function of X given Y = y.
9. Suppose that Beta(a; b) and Y j Binomial(n; ). Find E(Y ) and V ar(Y ).
10. Assume that Y denotes the number of bacteria in a cubic centimeter of liquid and
that Y j Poisson(). Further assume that varies from location to location and
Gamma(; ).
(a) Find E(Y ) and V ar(Y ).
(b) If is a positive integer then show that the marginal probability function of Y
is Negative Binomial.
11. Suppose X and Y are random variables with joint moment generating function
M(t1; t2) which exists for all jt1j < h1 and jt2j < h2 for some h1; h2 > 0.
(a) Show that
E(XjY k) =
@j+k
@tj1@t
k
2
M(t1; t2)j(t1;t2)=(0;0)
(b) Prove that X and Y are independent random variables if and only if
M(t1; t2) = MX(t1)MY (t2).
(c) If (X;Y ) Multinomial(n; p1; p2; 1 p1 p2) …nd Cov(X;Y ).
12. Suppose X and Y are discrete random variables with joint probability function
f(x; y) =
e2
x!(y x)! for x = 0; 1; : : : ; y; y = 0; 1; : : :
(a) Find the joint moment generating function of X and Y .
(b) Find Cov(X;Y ).
3.12. CHAPTER 3 PROBLEMS 119
13. Suppose X = (X1; X2) BVN(;).
(a) Let t = (t1; t2). Use matrix multiplication to verify that
(x ) 1 (x )T 2xtT
= [x (+ t)] 1 [x (+ t)]T 2tT ttT
Use this identity to show that the joint moment generating function of X1 and
X2 is
M(t1; t2) = E

et1X1+t2X2

= E

exp

XtT

= exp

tT +
1
2
ttT

for all t = (t1; t2) 2 <2
(b) Use moment generating functions to show X1 N(1; 21) and X2 N(2; 22).
(c) Use moment generating functions to show Cov(X1; X2) = 12. Hint: Use the
result in Problem 11(a).
(d) Use moment generating functions to show that X1 and X2 are independent
random variables if and only if = 0.
(e) Let A be a 2 2 nonsingular matrix and b be a 1 2 vector. Use the moment
generating function to show that
XA+ b BVN A+ b; ATA
(f) Verify that
(x ) 1 (x )T

x1 1
1
2
=
1
22 (1 2)

x2

2 +
2
1
(x1 1)
2
and thus show that the conditional distribution of X2 given X1 = x1 is
N(2 + 2(x1 1)=1; 22(1 2)). Note that by symmetry the conditional
distribution of X1 given X2 = x2 is N(1 + 1(x2 2)=2; 21(1 2)).
14. Suppose X and Y are continuous random variables with joint probability density
function
f(x; y) = 2exy for 0 < x < y <1
(a) Find the joint moment generating function of X and Y .
(b) Determine the marginal distributions of X and Y .
(c) Find Cov(X;Y ).
120 3. MULTIVARIATE RANDOM VARIABLES
4. Functions of Two or More
Random Variables
In this chapter we look at techniques for determining the distributions of functions of
two or more random variables. These techniques are extremely important for determining
the distributions of estimators such as maximum likelihood estimators, the distributions
of pivotal quantities for constructing con…dence intervals, and the distributions of test
statistics for testing hypotheses.
In Section 4:1 we extend the cumulative distribution function technique introduced in
Section 2.6 to a function of two or more random variables. In Section 4:2 we look at a
method for determining the distribution of a one-to-one transformation of two or more
random variables which is an extension of the result in Theorem 2.6.8. In particular we
show how the t distribution, which you would have used in your previous statistics course,
arises as the ratio of two independent Chi-squared random variables each divided by their
degrees of freedom. In Section 4:3 we see how moment generating functions can be used for
determining the distribution of a sum of random variables which is an extension of Theorem
2.10.4. In particular we prove that a linear combination of independent Normal random
variables has a Normal distribution. This is a result which was used extensively in previous
probability and statistics courses.
4.1 Cumulative Distribution Function Technique
Suppose X1; X2; : : : ; Xn are continuous random variables with joint probability density
function f (x1; x2; : : : ; xn). The probability density function of Y = h (X1; X2; : : : ; Xn)
can be determined using the cumulative distribution function technique that was used in
Section 2.6 for the case n = 1.
4.1.1 Example
Suppose X and Y are continuous random variables with joint probability density function
f(x; y) = 3y for 0 < x < y < 1
and 0 otherwise. Determine the probability density function of T = XY .
121
122 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Solution
The support set of (X;Y ) is A = f(x; y) : 0 < x < y < 1g which is the union of the regions
E and F shown in Figure 4.1
x
y
0 1
1
x=yx=t/y
Öt -
E
F
Figure 4.1: Support set for Example 4.1.1
For 0 < t < 1
G (t) = P (T t) = P (XY t) =
Z
(x;y)
Z
2 E
3ydxdy
Due to the shape of the region E, the double integral over the region E would have to be
written as the sum of two double integrals. It is easier to …nd G (t) using
G (t) =
Z
(x;y)
Z
2 E
3ydxdy
= 1
Z
(x;y)
Z
2 F
3ydxdy
= 1
1Z
y=
p
t
yZ
x=t=y
3ydxdy = 1
1Z
p
t
3y

xjyt=y

dy
= 1
1Z
p
t
3y

y t
y

dy = 1
1Z
p
t

3y2 3t dy
= 1 y3 3ty j1p
t
= 1

1 3t t3=2 3t3=2

= 3t 2t3=2 for 0 < t < 1
4.1. CUMULATIVE DISTRIBUTION FUNCTION TECHNIQUE 123
The cumulative distribution function for T is
G (t) =
8>><>>:
0 t 0
3t 2t3=2 0 < t < 1
1 t 1
Now a cumulative distribution function must be a continuous function for all real values.
Therefore as a check we note that
lim
t!0+

3t 2t3=2

= 0 = G (0)
and
lim
t!1

3t 2t3=2

= 1 = G (1)
so indeed G (t) is a continuous function for all t 2 <.
Since ddtG (t) = 0 for t < 0 and t > 0, and
d
dt
G (t) =
d
dt

3t 2t3=2

= 3 3t1=2 for 0 < t < 1
the probability density function of T is
g (t) = 3 3t1=2 for 0 < t < 1
and 0 otherwise.
4.1.2 Exercise
Suppose X and Y are continuous random variables with joint probability density function
f(x; y) = 3y for 0 x y 1
and 0 otherwise. Find the probability density function of S = Y=X.
4.1.3 Example
Suppose X1; X2; : : : ; Xn are independent and identically distributed continuous random
variables each with probability density function f(x) and cumulative distribution function
F (x). Find the probability density function of U = max (X1; X2; : : : ; Xn) = X(n) and
V = min (X1; X2; : : : ; Xn) = X(1).
124 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Solution
For u 2 <, the cumulative distribution function of U is
G (u) = P (U u) = P [max (X1; X2; : : : ; Xn) u]
= P (X1 u;X2 u; : : : ;Xn u)
= P (X1 u)P (X2 u) : : : P (Xn u)
since X1; X2; : : : ; Xn are independent random variables
=
nQ
i=1
P (Xi u)
=
nQ
i=1
F (u) since X1; X2; : : : ; Xn are identically distributed
= [F (u)]n
Suppose A is the support set of Xi, i = 1; 2; : : : ; n. The probability density function of U is
g (u) =
d
du
G (u) =
d
du
[F (u)]n
= n [F (u)]n1 f (u) for u 2 A
and 0 otherwise.
For v 2 <, the cumulative distribution function of V is
H (v) = P (V v) = P [min (X1; X2; : : : ; Xn) v]
= 1 P [min (X1; X2; : : : ; Xn) > v]
= 1 P (X1 > v;X2 > v; : : : ;Xn > v)
= 1 P (X1 > v)P (X2 > v) : : : P (Xn > v)
since X1; X2; : : : ; Xn are independent random variables
= 1
nQ
i=1
P (Xi > v)
= 1
nQ
i=1
[1 F (v)] since X1; X2; : : : ; Xn are identically distributed
= 1 [1 F (v)]n
Suppose A is the support set of Xi, i = 1; 2; : : : ; n. The probability density function of V is
h (v) =
d
dv
H (v) =
d
dv
f1 [1 F (v)]ng
= n [1 F (v)]n1 f (v) for v 2 A
and 0 otherwise.
4.2. ONE-TO-ONE TRANSFORMATIONS 125
4.2 One-to-One Transformations
In this section we look at how to determine the joint distribution of a one-to-one transfor-
mation of two or more random variables. We concentrate on the bivariate case for ease of
presentation. The method does extend to more than two random variables. See Problems
12 and 13 at the end of this chapter for examples of one-to-one transformations of three
random variables.
We begin with some notation and a theorem which gives su¢ cient conditions for deter-
mining whether a transformation is one-to-one in the bivariate case followed by the theorem
which gives the joint probability density function for the two new random variables.
Suppose the transformation S de…ned by
u = h1(x; y)
v = h2(x; y)
is a one-to-one transformation for all (x; y) 2 RXY and that S maps the region RXY into
the region RUV in the uv plane. Since S : (x; y) ! (u; v) is a one-to-one transformation
there exists a inverse transformation T de…ned by
x = w1(u; v)
y = w2(u; v)
such that T = S1 : (u; v)! (x; y) for all (u; v) 2 RUV . The Jacobian of the transformation
T is
@(x; y)
@(u; v)
=

@x
@u
@x
@v
@y
@u
@y
@v
=

@(u; v)
@(x; y)
1
where @(u;v)@(x;y) is the Jacobian of the transformation S.
4.2.1 Inverse Mapping Theorem
Consider the transformation S de…ned by
u = h1(x; y)
v = h2(x; y)
If @u@x ,
@u
@y ,
@v
@x and
@v
@y are continuous functions and
@(u;v)
@(x;y) 6= 0 for all
(x; y) 2 R then S is one-to-one on R and S1 exists.
Note: These are su¢ cient but not necessary conditions for the inverse to exist.
126 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
4.2.2 Theorem - One-to-One Bivariate Transformations
Let X and Y be continuous random variables with joint probability density function f(x; y)
and let RXY = f(x; y) : f(x; y) > 0g be the support set of (X;Y ). Suppose the transfor-
mation S de…ned by
U = h1(X;Y )
V = h2(X;Y )
is a one-to-one transformation with inverse transformation
X = w1(U; V )
Y = w2(U; V )
Suppose also that S maps RXY into RUV . Then g(u; v), the joint joint probability density
function of U and V , is given by
g(u; v) = f(w1(u; v); w2(u; v))
@(x; y)@(u; v)

for all (u; v) 2 RUV . (Compare Theorem 2.6.8 for univariate random variables.)
4.2.3 Proof
We want to …nd g(u; v), the joint probability density function of the random variables U
and V . Suppose S1 maps the region B RUV into the region A RXY then
P [(U; V ) 2 B]
=
ZZ
B
g(u; v)dudv (4.1)
= P [(X;Y ) 2 A]
=
ZZ
A
f(x; y)dxdy
=
ZZ
B
f(w1(u; v); w2(u; v))
@(x; y)@(u; v)
dudv (4.2)
where the last line follows by the Change of Variable Theorem. Since this is true for all
B RUV we have, by comparing (4.1) and (4.2), that the joint probability density function
of U and V is given by
g(u; v) = f(w1(u; v); w2(u; v))
@(x; y)@(u; v)

for all (u; v) 2 RUV .
In the following example we see how Theorem 4.2.2 can be used to show that the sum of
two independent Exponential(1) random variables is a Gamma random variable.
4.2. ONE-TO-ONE TRANSFORMATIONS 127
4.2.4 Example
Suppose X Exponential(1) and Y Exponential(1) independently. Find the joint prob-
ability density function of U = X + Y and V = X. Show that U Gamma(2; 1).
Solution
Since X Exponential(1) and Y Exponential(1) independently, the joint probability
density function of X and Y is
f (x; y) = f1 (x) f2 (y) = e
xey
= exy
with support set RXY = f(x; y) : x > 0; y > 0g which is shown in Figure 4.2.
x
y
0
. . .
.
.
. .
.
.
Figure 4.2: Support set RXY for Example 4.2.6
The transformation
S : U = X + Y , V = X
has inverse transformation
X = V , Y = U V
Under S the boundaries of RXY are mapped as
(k; 0) ! (k; k) for k 0
(0; k) ! (k; 0) for k 0
and the point (1; 2) is mapped to the point (3; 1). Thus S maps RXY into
RUV = f(u; v) : 0 < v < ug
128 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
u
v
0
v=u
. . .
Figure 4.3: Support set RUV for Example 4.2.4
as shown in Figure 4.3.
The Jacobian of the inverse transformation is
@ (x; y)
@ (u; v)
=

@x
@u
@x
@v
@y
@u
@y
@v
=
0 11 @y@v
= 1
Note that the transformation S is a linear transformation and so we would expect the
Jacobian of the transformation to be a constant.
The joint probability density function of U and V is given by
g (u; v) = f(w1(u; v); w2(u; v))
@(x; y)@(u; v)

= f (v; u v) j1j
= eu for (u; v) 2 RUV
and 0 otherwise.
To …nd the marginal probability density functions for U we note that the support set
RUV is not rectangular and the range of integration for v will depend on u. The marginal
probability density function of U is
g1 (u) =
1Z
1
g (u; v) dv = eu
uZ
v=0
dv
= ueu for u > 0
and 0 otherwise which is the probability density function of a Gamma(2; 1) random variable.
Therefore U Gamma(2; 1).
4.2. ONE-TO-ONE TRANSFORMATIONS 129
In the following exercise we see how the sum and di¤erence of two independent Exponential(1)
random variables give a Gamma random variable and a Double Exponential random vari-
able respectively.
4.2.5 Exercise
Suppose X Exponential(1) and Y Exponential(1) independently. Find the joint prob-
ability density function of U = X + Y and V = X Y . Show that U Gamma(2; 1) and
V Double Exponential(0; 1).
In the following example we see how the Gamma and Beta distributions are related.
4.2.6 Example
SupposeX Gamma(a; 1) and Y Gamma(b; 1) independently. Find the joint probability
density function of U = X + Y and V = XX+Y . Show that U Gamma(a+ b; 1) and
V Beta(a; b) independently. Find E (V ) by …nding E

X
X+Y

.
Solution
Since X Gamma(a; 1) and Y Gamma(b; 1) independently, the joint probability density
function of X and Y is
f (x; y) = f1 (x) f2 (y) =
xa1ex
(a)
yb1ey
(b)
=
xa1yb1exy
(a) (b)
with support set RXY = f(x; y) : x > 0; y > 0g which is the same support set as shown in
Figure 4.2.
The transformation
S : U = X + Y , V =
X
X + Y
has inverse transformation
X = UV , Y = U (1 V )
Under S the boundaries of RXY are mapped as
(k; 0) ! (k; 1) for k 0
(0; k) ! (k; 0) for k 0
and the point (1; 2) is mapped to the point

3; 13

. Thus S maps RXY into
RUV = f(u; v) : u > 0; 0 < v < 1g
as shown in Figure 4.4.
130 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
u
v
0
1
. . .
Figure 4.4: Support set RUV for Example 4.2.6
The Jacobian of the inverse transformation is
@ (x; y)
@ (u; v)
=
v u1 v u
= uv u+ uv = u
The joint probability density function of U and V is given by
g (u; v) = f (uv; u (1 v)) juj
=
(uv)a1 [u (1 v)]b1 eu juj
(a) (b)
= ua+b1eu
va1 (1 v)b1
(a) (b)
for (u; v) 2 RUV
and 0 otherwise.
To …nd the marginal probability density functions for U and V we note that the support
set of U is B1 = fu : u > 0g and the support set of V is B2 = fv : 0 < v < 1g. Since
g (u; v) = ua+b1eu| {z }
h1(u)
va1 (1 v)b1
(a) (b)| {z }
h2(v)
for all (u; v) 2 B1B2 then, by the Factorization Theorem for Independence, U and V are
independent random variables. Also by the Factorization Theorem for Independence the
probability density function of U must be proportional to h1 (u). By writing
g (u; v) =

ua+b1eu
(a+ b)

(a+ b)
(a) (b)
va1 (1 v)b1

we note that the function in the …rst square bracket is the probability density function of a
Gamma(a+ b; 1) random variable and therefore U Gamma(a+ b; 1). It follows that the
4.2. ONE-TO-ONE TRANSFORMATIONS 131
function in the second square bracket must be the probability density function of V which
is a Beta(a; b) probability density function. Therefore U Gamma(a+ b; 1) independently
of V Beta(a; b).
In Chapter 2, Problem 9 the moments of a Beta random variable were found by inte-
gration. Here is a rather clever way of …nding E (V ) using the mean of a Gamma random
variable. In Exercise 2.7.9 it was shown that the mean of a Gamma(; ) random variable
is .
Now
E (UV ) = E

(X + Y )
X
(X + Y )

= E (X) = (a) (1) = a
since X Gamma(a; 1). But U and V are independent random variables so
a = E (UV ) = E (U)E (V )
But since U Gamma(a+ b; 1) we know E (U) = a+ b so
a = E (U)E (V ) = (a+ b)E (V )
Solving for E (V ) gives
E (V ) =
a
a+ b
Higher moments can be found in a similar manner using the higher moments of a Gamma
random variable.
4.2.7 Exercise
Suppose X Beta(a; b) and Y Beta(a + b; c) independently. Find the joint probability
density function of U = XY and V = X. Show that U Beta(a; b+ c).
In the following example we see how a rather unusual transformation can be used to trans-
form two independent Uniform(0; 1) random variables into two independent N(0; 1) random
variables. This transformation is referred to as the Box-Muller Transformation after the
two statisticians George E. P. Box and Mervin Edgar Muller who published this result in
1958.
4.2.8 Example - Box-Muller Transformation
Suppose X Uniform(0; 1) and Y Uniform(0; 1) independently. Find the joint proba-
bility density function of
U = (2 logX)1=2 cos (2Y )
V = (2 logX)1=2 sin (2Y )
Show that U N(0; 1) and V N(0; 1) independently. Explain how you could use this
result to generate independent observations from a N(0; 1) distribution.
132 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
Solution
SinceX Uniform(0; 1) and Y Uniform(0; 1) independently, the joint probability density
function of X and Y is
f (x; y) = f1 (x) f2 (y) = (1) (1)
= 1
with support set RXY = f(x; y) : 0 < x < 1; 0 < y < 1g.
Consider the transformation
S : U = (2 logX)1=2 cos (2Y ) , V = (2 logX)1=2 sin (2Y )
To determine the support set of (U; V ) we note that 0 < y < 1 implies 1 < cos (2y) < 1.
Also 0 < x < 1 implies 0 < (2 logX)1=2 <1. Therefore u = (2 log x)1=2 cos (2y) takes
on values in the interval (1;1). By a similar argument v = (2 log x)1=2 sin (2y) also
takes on values in the interval (1;1). Therefore the support set of (U; V ) is RUV = <2.
The inverse of the transformation S can be determined. In particular we note that since
U2 + V 2 =
h
(2 logX)1=2 cos (2Y )
i2
+
h
(2 logX)1=2 sin (2Y )
i2
= (2 logX) cos2 (2Y ) + sin2 (2Y )
= 2 logX
or
X = e
1
2 (U
2+V 2)
The expression for Y is not as simple and it turns out that we can …nd the joint probability
function of U and V without it. Note that f (x; y) does not depend on x and y.
To determine the Jacobian of the inverse transformation we use
@(x; y)
@(u; v)
=

@(u; v)
@(x; y)
1
=
24
@u
@x
@u
@y
@v
@x
@v
@y

351
Now
@u
@x
@u
@y
@v
@x
@v
@y

=
1x (2 log x)1=2 cos (2y) 2 (2 log x)1=2 sin (2y) 1x (2 log x)1=2 sin (2y) 2 (2 log x)1=2 cos (2y)

= 2
x

cos2 (2y) + sin2 (2y)

= 2
x
4.2. ONE-TO-ONE TRANSFORMATIONS 133
Therefore
@(x; y)
@(u; v)
=

@(u; v)
@(x; y)
1
=

2
x
1
= x
2
= 1
2
e
1
2 (u
2+v2)
Since we have not determined the inverse transformation completely we should verify
that it exists. Since the derivatives @u@x ,
@u
@y ,
@v
@x ,
@v
@y are all continuous functions of (x; y) on
the support set RXY and
@(x;y)
@(u;v) = x2 6= 0 on the support set RXY then by the Inverse
Mapping Theorem the inverse transformation does exist.
The joint probability density function of U and V is
g (u; v) = f(w1(u; v); w2(u; v))
@(x; y)@(u; v)

= (1)
12e12 (u2+v2)

=
1
2
e
1
2 (u
2+v2) for (u; v) 2 <2
The support set of U is < and the support set of V is <. Since g (u; v) can be written as
g (u; v) =

1p
2
e
1
2
u2

1p
2
e
1
2
v2

for all (u; v) 2 < < = <2, therefore by the Factorization Theorem for Independence, U
and V are independent random variables. We also note that the joint probability density
function is the product of two N(0; 1) probability density functions. Therefore U N(0; 1)
and V N(0; 1) independently.
Let x and y be two independent Uniform(0; 1) observations which have been generated
using a random number generator. Then from the result above we have that
u = (2 log x)1=2 cos (2y)
v = (2 log x)1=2 sin (2y)
are two independent N(0; 1) observations.
The result in the following theorem is one that was used (without proof) in a previous statis-
tics course such as STAT 221/231/241 to construct con…dence intervals and test hypotheses
regarding the mean in a N

; 2

model when the variance 2 is unknown.
134 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
4.2.9 Theorem - t Distribution
If X 2(n) independently of Z N(0; 1) then
T =
Zp
X=n
t(n)
Proof
The transformation T = Zp
X=n
is not a one-to-one transformation. However if we add the
variable U = X to complete the transformation and consider the transformation
S : T =
Zp
X=n
, U = X
then this transformation has inverse transformation
X = U , Z = T

U
n
1=2
and therefore is a one-to-one transformation.
Since X 2(n) independently of Z N(0; 1) the joint probability density function of
X and Z is
f (x; z) = f1 (x) f2 (z) =
1
2n=2 (n=2)
xn=21ex=2
1p
2
ez
2=2
=
1
2(n+1)=2 (n=2)
p

xn=21ex=2ez
2=2
with support set RXZ = f(x; z) : x > 0; z 2 RTU = f(t; u) : t 2 <; u > 0g.
The Jacobian of the inverse transformation is
@ (x; z)
@ (t; u)
=

@x
@t
@x
@u
@z
@t
@z
@u
=
0 1u
n
1=2 @z
@u
= un1=2
The joint probability density function of U and V is given by
g (t; u) = f

t
u
n
1=2
; u
un1=2

=
1
2(n+1)=2 (n=2)
p

un=21eu=2et
2u=(2n)
u
n
1=2
=
1
2(n+1)=2 (n=2)
p
n
u(n+1)=21eu(1+t
2=n)=2 for (t; u) 2 RTU
and 0 otherwise.
4.2. ONE-TO-ONE TRANSFORMATIONS 135
To determine the distribution of T we need to …nd the marginal probability density
function for T .
g1 (t) =
1Z
1
g (t; u) du
=
1
2(n+1)=2 (n=2)
p
n
1Z
0
u(n+1)=21eu(1+t
2=n)=2du
Let y = u2

1 + t
2
n

so that u = 2y

1 + t
2
n
1
and du = 2

1 + t
2
n
1
dy. Note that when
u = 0 then y = 0, and when u!1 then y !1. Therefore
g1 (t) =
1
2(n+1)=2 (n=2)
p
n
1Z
0
"
2y

1 +
t2
n
1#(n+1)=21
ey
"
2

1 +
t2
n
1#
dy
=
1
2(n+1)=2 (n=2)
p
n
2(n+1)=2
"
1 +
t2
n
1#(n+1)=2 1Z
0
y(n+1)=21eydy
=
1
(n=2)
p
n

1 +
t2
n
(n+1)=2


n+ 1
2

=


n+1
2

(n=2)
p
n

1 +
t2
n
(n+1)=2
for t 2 <
which is the probability density function of a random variable with a t(n) distribution.
Therefore
T =
Zp
X=n
t(n)
as required.
4.2.10 Example
Use Theorem 4.2.9 to …nd E (T ) and V ar (T ) if T t(n).
Solution
If X 2(n) independently of Z N(0; 1) then we know from the previous theorem that
T =
Zp
X=n
t(n)
Now
E (T ) = E

Zp
X=n
!
=
p
nE (Z)E

X1=2

136 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
since X and Z are independent random variables. Since E (Z) = 0 it follows that E (T ) = 0
as long as E

X1=2

exists. Since X 2(n)
E

Xk

=
1Z
0
xk
1
2n=2 (n=2)
xn=21ex=2dx
=
1
2n=2 (n=2)
1Z
0
xk+n=21ex=2dx let y = x=2
=
1
2n=2 (n=2)
1Z
0
(2y)k+n=21 ey (2) dy
=
2k+n=2
2n=2 (n=2)
1Z
0
yk+n=21eydy
=
2k (n=2 + k)
(n=2)
(4.3)
which exists for n=2 + k > 0. If k = 1=2 then the integral exists for n=2 > 1=2 or n > 1.
Therefore
E (T ) = 0 for n > 1
Now
V ar (T ) = E

T 2
[E (T )]2
= E

T 2

since E (T ) = 0
and
E

T 2

= E

Z2
X=n

= nE

Z2

E

X1

Since Z N(0; 1) then
E

Z2

= V ar (Z) + [E (Z)]2 = 1 + 02
= 1
Also by (4.3)
E

X1

=
21 (n=2 1)
(n=2)
=
1
2 (n=2 1)
=
1
n 2
which exists for n > 2. Therefore
V ar (T ) = E

T 2

= nE

Z2

E

X1

= n (1)

1
n 2

=
n
n 2 for n > 2
4.3. MOMENT GENERATING FUNCTION TECHNIQUE 137
The following theorem concerns the F distribution which is used in testing hypotheses about
the parameters in a multiple linear regression model.
4.2.11 Theorem - F Distribution
If X 2(n) independently of Y 2(m) then
U =
X=n
Y=m
F(n;m)
4.2.12 Exercise
(a) Prove Theorem 4.2.11. Hint: Complete the transformation with V = Y .
(b) Find E(U) and V ar(U) and note for what values of n and m that they exist.
Hint: Use the technique and results of Example 4.2.10.
4.3 Moment Generating Function Technique
The moment generating function technique is particularly useful in determining the distri-
bution of a sum of two or more independent random variables if the moment generating
functions of the random variables exist.
4.3.1 Theorem
Suppose X1; X2; : : : ; Xn are independent random variables and Xi has moment generating
functionMi(t) which exists for t 2 (h; h) for some h > 0. The moment generating function
of Y =
nP
i=1
Xi is given by
MY (t) =
nQ
i=1
Mi (t)
for t 2 (h; h).
If the Xi’s are independent and identically distributed random variables each with
moment generating function M (t) then Y =
nP
i=1
Xi has moment generating function
MY (t) = [M (t)]
n
for t 2 (h; h).
Proof
The moment generating function of Y =
nP
i=1
Xi is
138 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
MY (t) = E

etY

= E

exp

t
nP
i=1
Xi

=
nQ
i=1
E

etXi

since X1; X2; : : : ; Xn are independent random variables
=
nQ
i=1
Mi (t) for t 2 (h; h)
If X1; X2; : : : ; Xn are identically distributed each with moment generating function M (t)
then
MY (t) =
nQ
i=1
M (t)
= [M (t)]n for t 2 (h; h)
as required.
Note: This theorem in conjunction with the Uniqueness Theorem for Moment Generating
Functions can be used to …nd the distribution of Y .
Here is a summary of results about sums of random variables for the named distributions.
4.3.2 Special Results
(1) If Xi Binomial(ni; p), i = 1; 2; : : : ; n independently, then
nP
i=1
Xi Binomial

nP
i=1
ni; p

(2) If Xi Poisson(i), i = 1; 2; : : : ; n independently, then
nP
i=1
Xi Poisson

nP
i=1
i

(3) If Xi Negative Binomial(ki; p), i = 1; 2; : : : ; n independently, then
nP
i=1
Xi Negative Binomial

nP
i=1
ki; p

(4) If Xi Exponential(), i = 1; 2; : : : ; n independently, then
nP
i=1
Xi Gamma(n; )
(5) If Xi Gamma(i; ), i = 1; 2; : : : ; n independently, then
nP
i=1
Xi Gamma

nP
i=1
i;

4.3. MOMENT GENERATING FUNCTION TECHNIQUE 139
(6) If Xi 2 (ki), i = 1; 2; : : : ; n independently, then
nP
i=1
Xi 2

nP
i=1
ki

(7) If Xi N

; 2

, i = 1; 2; : : : ; n independently, then
nP
i=1

Xi

2
2(n)
Proof
(1) Suppose Xi Binomial(ni; p), i = 1; 2; : : : ; n independently. The moment generating
function of Xi is
Mi (t) =

pet + q
ni for t 2 <
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y =
nP
i=1
Xi is
MY (t) =
nQ
i=1
Mi (t)
=
nQ
i=1

pet + q
ni
=

pet + q
nP
i=1
ni
for t 2 <
which is the moment generating function of a Binomial

nP
i=1
ni; p

random variable. There-
fore by the Uniqueness Theorem for Moment Generating Functions
nP
i=1
Xi Binomial

nP
i=1
ni; p

.
(2) Suppose Xi Poisson(i), i = 1; 2; : : : ; n independently. The moment generating func-
tion of Xi is
Mi (t) = e
i(et1) for t 2 <
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y =
nP
i=1
Xi is
MY (t) =
nQ
i=1
Mi (t)
=
nQ
i=1
ei(e
t1)
= e

nP
i=1
i
!
(et1)
for t 2 <
which is the moment generating function of a Poisson

nP
i=1
i

random variable. Therefore
by the Uniqueness Theorem for Moment Generating Functions
nP
i=1
Xi Poisson

nP
i=1
i

.
140 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
(3) Suppose Xi Negative Binomial(ki; p), i = 1; 2; : : : ; n independently. The moment
generating function of Xi is
Mi (t) =

p
1 qet

ki for t < log (q)
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y =
nP
i=1
Xi is
MY (t) =
nQ
i=1
Mi (t)
=
nQ
i=1

p
1 qet

ki
=

p
1 qet
nP
i=1
ki
for t < log (q)
which is the moment generating function of a Negative Binomial

nP
i=1
ki; p

random vari-
able. Therefore by the Uniqueness Theorem for Moment Generating Functions
nP
i=1
Xi Negative Binomial

nP
i=1
ki; p

.
(4) Suppose Xi Exponential(), i = 1; 2; : : : ; n independently. The moment generating
function of each Xi is
M (t) =
1
1 t for t <
1

for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y =
nP
i=1
Xi is
MY (t) = [M (t)]
n =

1
1 t
n
= for t <
1

which is the moment generating function of a Gamma(n; ) random variable. Therefore by
the Uniqueness Theorem for Moment Generating Functions
nP
i=1
Xi Gamma(n; ).
(5) Suppose Xi Gamma(i; ), i = 1; 2; : : : ; n independently. The moment generating
function of Xi is
Mi (t) =

1
1 t
i
for t <
1

for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y =
nP
i=1
Xi is
MY (t) =
nQ
i=1
Mi (t) =
nQ
i=1

1
1 t
i
=

1
1 t
nP
i=1
i
for t <
1

4.3. MOMENT GENERATING FUNCTION TECHNIQUE 141
which is the moment generating function of a Gamma

nP
i=1
i;

random variable. There-
fore by the Uniqueness Theorem for Moment Generating Functions
nP
i=1
Xi Gamma

nP
i=1
i;

.
(6) Suppose Xi 2 (ki), i = 1; 2; : : : ; n independently. The moment generating function
of Xi is
Mi (t) =

1
1 2t
ki
for t <
1
2
for i = 1; 2; : : : ; n. By Theorem 4.3.1 the moment generating function of Y =
nP
i=1
Xi is
MY (t) =
nQ
i=1
Mi (t)
=
nQ
i=1

1
1 2t
ki
=

1
1 2t
nP
i=1
ki
for t <
1
2
which is the moment generating function of a 2

nP
i=1
ki

random variable. Therefore by
the Uniqueness Theorem for Moment Generating Functions
nP
i=1
Xi 2

nP
i=1
ki

.
(7) Suppose Xi N

; 2

, i = 1; 2; : : : ; n independently. Then by Example 2.6.9 and
Theorem 2.6.3
Xi

2
2(1) for i = 1; 2; : : : ; n
and by (6)
nP
i=1

Xi

2
2(n)
4.3.3 Exercise
Suppose X1; X2; : : : ; Xn are independent and identically distributed random variables with
moment generating function M (t), E (Xi) = , and V ar (Xi) = 2 < 1. Give an ex-
pression for the moment generating function of Z =
p
n

X = where X = 1n nP
i=1
Xi in
terms of M (t).
The following theorem is one that was used in your previous probability and statistics
courses without proof. The method of moment generating functions now allows us to easily
proof this result.
142 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
4.3.4 Theorem - Linear Combination of Independent Normal Random
Variables
If Xi N(i; 2i ), i = 1; 2; : : : ; n independently, then
nP
i=1
aiXi N

nP
i=1
aii;
nP
i=1
a2i
2
i

Proof
Suppose Xi N(i; 2i ), i = 1; 2; : : : ; n independently. The moment generating function of
Xi is
Mi (t) = e
it+
2
i t
2=2 for t 2 <
for i = 1; 2; : : : ; n. The moment generating function of Y =
nP
i=1
aiXi is
MY (t) = E

etY

= E

exp

t
nP
i=1
aiXi

=
nQ
i=1
E

e(ait)Xi

since X1; X2; : : : ; Xn are independent random variables
=
nQ
i=1
Mi (ait)
=
nQ
i=1
eiait+
2
i a
2
i t
2=2
= exp

nP
i=1
aii

t+

nP
i=1
a2i
2
i

t2=2

for t 2 <
which is the moment generating function of a N

nP
i=1
aii;
nP
i=1
a2i
2
i

random variable.
Therefore by the Uniqueness Theorem for Moment Generating Functions
nP
i=1
aiXi N

nP
i=1
aii;
nP
i=1
a2i
2
i

4.3.5 Corollary
If Xi N(; 2), i = 1; 2; : : : ; n independently then
nP
i=1
Xi N

n; n2

and
X =
1
n
nP
i=1
Xi N

;
2
n

4.3. MOMENT GENERATING FUNCTION TECHNIQUE 143
Proof
To prove that
nP
i=1
Xi N

n; n2

let ai = 1, i = , and
2
i =
2 in Theorem 4.3.4 to obtain
nP
i=1
Xi N

nP
i=1
;
nP
i=1
2

or
nP
i=1
Xi N

n; n2

To prove that
X =
1
n
nP
i=1
Xi N

;
2
n

we note that
X =
nP
i=1

1
n

Xi
Let ai = 1n , i = , and
2
i =
2 in Theorem 4.3.4 to obtain
X =
nP
i=1

1
n

Xi N

nP
i=1

1
n

;
nP
i=1

1
n
2
2
!
or
X N

;
2
n

The following identity will be used in proving Theorem 4.3.8.
4.3.6 Useful Identity
nP
i=1
(Xi )2 =
nP
i=1

Xi X
2
+ n

X 2
4.3.7 Exercise
Prove the identity 4.3.6
As mentioned previously the t distribution is used to construct con…dence intervals and
test hypotheses regarding the mean in a N

; 2

model. We are now able to prove the
theorem on which these results are based.
144 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
4.3.8 Theorem
If Xi N(; 2), i = 1; 2; : : : ; n independently then
X N

;
2
n

independently of
(n 1)S2
2
=
nP
i=1

Xi X
2
2
2(n 1)
where
S2 =
nP
i=1

Xi X
2
n 1
Proof
For a proof that X and S2 are independent random variables please see Problem 16.
By identity 4.3.6
nP
i=1
(Xi )2 =
nP
i=1

Xi X
2
+ n

X 2
Dividing both sides by 2 gives
nP
i=1

Xi

2
| {z }
Y
=
(n 1)S2
2| {z }
U
+
X
=
p
n
2
| {z }
V
Since X and S2 are independent random variables, it follows that U and V are independent
random variables.
By 4.3.2(7)
Y =
nP
i=1

Xi

2
2 (n)
with moment generating function
MY (t) = (1 2t)n=2 for t < 1
2
(4.4)
X N

;
2
n

was proved in Corollary 4.3.5. By Example 2.6.9
X
=
p
n
N (0; 1)
and by Theorem 2.6.3
V =
X
=
p
n
2
2 (1)
4.3. MOMENT GENERATING FUNCTION TECHNIQUE 145
with moment generating function
MV (t) = (1 2t)1=2 for t < 1
2
(4.5)
Since U and V are independent random variables and Y = U + V then
MY (t) = E

etY

= E

et(U+V )

= E

etU

E

etV

= MU (t)MV (t) (4.6)
Substituting (4.4) and (4.5) into (4.6) gives
(1 2t)n=2 = MU (t) (1 2t)1=2 for t < 1
2
or
MU (t) = (1 2t)(n1)=2 for t < 1
2
which is the moment generating function of a 2 (n 1) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions
U =
(n 1)S2
2
2 (n 1)
4.3.9 Theorem
If Xi N(; 2), i = 1; 2; : : : ; n independently then
X
S=
p
n
t (n 1)
Proof
X
S=
p
n
=
X
=
p
nr
(n1)S2
2
n1
=
Zq
U
n1
where
Z =
X
=
p
n
N (0; 1)
independently of
U =
(n 1)S2
2
2 (n 1)
Therefore by Theorem 4.2.9
X
S=
p
n
t (n 1)
The following theorem is useful for testing the equality of variances in a two sample Normal
model.
146 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
4.3.10 Theorem
Suppose X1; X2; : : : ; Xn are independent N

1;
2
1

random variables, and independently
Y1; Y2; : : : ; Ym are independent N

2;
2
2

random variables. Let
S21 =
nP
i=1

Xi X
2
n 1 and S
2
2 =
mP
i=1

Yi Y
2
m 1
Then
S21=
2
1
S22=
2
2
F (n 1;m 1)
4.3.11 Exercise
Prove Theorem 4.3.10. Hint: Use Theorems 4.3.8 and 4.2.11.
4.4. CHAPTER 4 PROBLEMS 147
4.4 Chapter 4 Problems
1. Show that if X and Y are independent random variables then U = h (X) and
V = g (Y ) are also independent random variables where h and g are real-valued
functions.
2. Suppose X and Y are continuous random variables with joint probability density
function
f(x; y) = 24xy for 0 < x+ y < 1; 0 < x < 1; 0 < y < 1
and 0 otherwise.
(a) Find the joint probability density function of U = X + Y and V = X. Be sure
to specify the support set of (U; V ).
(b) Find the marginal probability density function of U and the marginal probability
density function of V . Be sure to specify their support sets.
3. Suppose X and Y are continuous random variables with joint probability density
function
f(x; y) = ey for 0 < x < y <1
and 0 otherwise.
(a) Find the joint probability density function of U = X + Y and V = X. Be sure
to specify the support set of (U; V ).
(b) Find the marginal probability density function of U and the marginal probability
density function of V . Be sure to specify their support sets.
4. Suppose X and Y are nonnegative continuous random variables with joint probability
density function f (x; y). Show that the probability density function of U = X + Y
is given by
g (u) =
1Z
0
f (v; u v) dv
Hint: Consider the transformation U = X + Y and V = X.
5. Suppose X and Y are continuous random variables with joint probability density
function
f(x; y) = 2 (x+ y) for 0 < x < y < 1
and 0 otherwise.
(a) Find the joint probability density function of U = X and V = XY . Be sure to
specify the support set of (U; V ).
148 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
(b) Are U and V independent random variables?
(c) Find the marginal probability density function’s of U and V . Be sure to specify
their support sets.
6. Suppose X and Y are continuous random variables with joint probability density
function
f(x; y) = 4xy for 0 < x < 1; 0 < y < 1
and 0 otherwise.
(a) Find the probability density function of T = X + Y using the cumulative distri-
bution function technique.
(b) Find the joint probability density function of S = X and T = X + Y . Find the
marginal probability density function of T and compare your answer to the one
you obtained in (a).
(c) Find the joint probability density function of U = X2 and V = XY . Be sure to
specify the support set of (U; V ).
(d) Find the marginal probability density function’s of U and V:
(e) Find E(V 3): (Hint: Are X and Y independent random variables?)
7. Suppose X and Y are continuous random variables with joint probability density
function
f(x; y) = 4xy for 0 < x < 1; 0 < y < 1
and 0 otherwise.
(a) Find the joint probability density function of U = X=Y and V = XY: Be sure
to specify the support set of (U; V ).
(b) Are U and V independent random variables?
(c) Find the marginal probability density function’s of U and V . Be sure to specify
their support sets.
8. Suppose X and Y are independent Uniform(0; ) random variables. Find the proba-
bility density function of U = X Y:
(Hint: Complete the transformation with V = X + Y:)
9. Suppose Z1 N(0; 1) and Z2 N(0; 1) independently. Let
X1 = 1 + 1Z1, X2 = 2 + 2[Z1 +

1 21=2 Z2]
where 1 < 1; 2 <1, 1, 2 > 0 and 1 < < 1.
(a) Show that (X1; X2)T BVN(;).
4.4. CHAPTER 4 PROBLEMS 149
(b) Show that (X)T1(X) 2 (2). Hint: Show (X)T1(X) = ZTZ
where Z = (Z1; Z2)
T .
10. Suppose X N; 2 and Y N; 2 independently. Let U = X + Y and
V = X Y .
(a) Find the joint moment generating function of U and V .
(b) Use (a) to show that U and V are independent random variables.
11. Let X and Y be independent N(0; 1) random variables and let U = X=Y .
(a) Show that U Cauchy(1; 0).
(b) Show that the Cauchy(1; 0) probability density function is the same as the t(1)
probability density function
12. Let X1; X2; X3 be independent Exponential(1) random variables. Let the random
variables Y1; Y2; Y3 be de…ned by
Y1 =
X1
X1 +X2
; Y2 =
X1 +X2
X1 +X2 +X3
; Y3 = X1 +X2 +X3
Show that Y1; Y2; Y3 are independent random variables and …nd their marginal prob-
ability density function’s.
13. Let X1; X2; X3 be independent N(0; 1) random variables. Let the random variables
Y1; Y2; Y3 be de…ned by
X1 = Y1 cosY2 sinY3; X2 = Y1 sinY2 sinY3; X3 = Y1 cosY3
for 0 y1 <1; 0 y2 < 2; 0 y3 <
Show that Y1; Y2; Y3 are independent random variables and …nd their marginal prob-
ability density function’s.
14. Suppose X1; X2; : : : ; Xn is a random sample from the Poisson() distribution. Find
the conditional probability function of X1; X2; : : : ; Xn given T =
nP
i=1
Xi = t.
15. Suppose X 2 (n), X + Y 2 (m), m > n and X and Y independent random
variables. Use the properties of moment generating functions to show that
Y 2 (m n).
150 4. FUNCTIONS OF TWO OR MORE RANDOM VARIABLES
16. Suppose X1; X2; : : : ; Xn is a random sample from the N(; 2) distribution. In this
problem we wish to show that X and
nP
i=1
(Xi X)2 are independent random variables.
Note that this implies that X and S2 = 1n1
nP
i=1
(Xi X)2 are also independent random
variables.
Let U = (U1; U2; : : : ; Un) where Ui = Xi X; i = 1; 2; : : : ; n and let
M(s1; s2; : : : ; sn; s) = E[exp(
nP
i=1
siUi + s X)]
be the joint moment generating function of U and X.
(a) Let ti = si s+ s=n; i = 1; 2; : : : ; n where s = 1n
nP
i=1
si. Show that
E[exp(
nP
i=1
siUi + s X)] = E[exp(
nP
i=1
tiXi)] = exp


nP
i=1
ti +
2
nP
i=1
t2i =2

Hint: Since Xi N(; 2)
E[exp(tiXi)] = exp

ti +
2t2i =2

(b) Verify that
nP
i=1
ti = s and
nP
i=1
t2i =
nP
i=1
(si s)2 + s2=n.
(c) Use (a) and (b) to show that
M(s1; s2; : : : ; sn; s) = exp[s+ (
2=n)(s2=2)] exp[2
nP
i=1
(si s)2=2]
(d) Show that the random variable X is independent of the random vector U and
thus X and
nP
i=1
(Xi X)2 are independent. Hint: M X(s) = M(0; 0; : : : ; 0; s) and
MU (s1; s2; : : : ; sn) = M(s1; s2; : : : ; sn; 0).
5. Limiting or Asymptotic
Distributions
In a previous probability course the Poisson approximation to the Binomial distribution

n
x

px (1 p)nx (np)
x enp
x!
for x = 0; 1; : : : ; n
if n is large and p is small was used.
As well the Normal approximation to the Binomial distribution
P (Xn x) =

n
x

px (1 p)nx for x = 0; 1; : : : ; n
P

Z x npp
np (1 p)
!
where Z N (0; 1)
if n is large and p is close to 1=2 (a special case of the very important Central Limit Theorem)
was used. These are examples of what we will call limiting or asymptotic distributions.
In this chapter we consider a sequence of random variables X1; X2; : : : ; Xn; : : : and look
at the de…nitions and theorems related to determining the limiting distribution of such a
sequence. In Section 5:1 we de…ne convergence in distribution and look at several examples
to illustrate its meaning. In Section 5:2 we de…ne convergence in probability and examine
its relationship to convergence in distribution. In Section 5:3 we look at the Weak Law of
Large Numbers which is an important theorem when examining the behaviour of estimators
of unknown parameters (Chapter 6). In Section 5:4 we use the moment generating function
to …nd limiting distributions including a proof of the Central Limit Theorem. The Central
Limit Theorem was used in STAT 221/231/241 to construct an approximate con…dence
interval for an unknown parameter. In Section 5:5 additional limit theorems for …nding
limiting distributions are introduced. These additional theorems allow us to determine new
limiting distributions by combining the limiting distributions which have been determined
from de…nitions, the Weak Law of Large Numbers, and/or the Central Limit Theorem.
151
152 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
5.1 Convergence in Distribution
In calculus you studied sequences of real numbers a1; a2; : : : ; an; : : : and learned theorems
which allowed you to evaluate limits such as lim
n!1 an. In this course we are interested in
a sequence of random variables X1; X2; : : : ; Xn; : : : and what happens to the distribution
of Xn as n ! 1. We do this be examining what happens to Fn (x) = P (Xn x), the
cumulative distribution function of Xn, as n ! 1. Note that for a …xed value of x, the
sequence F1 (x) ; F2 (x) ; : : : ; Fn (x) : : : is a sequence of real numbers. In general we will
obtain a di¤erent sequence of real numbers for each di¤erent value of x. Since we have
a sequence of real numbers we will be able to use limit theorems you have used in your
previous calculus courses to evaluate lim
n!1Fn(x). We will need to take care in determining
how Fn (x) behaves as n!1 for all real values of x. To formalize these ideas we give the
following de…nition for convergence in distribution of a sequence of random variables.
5.1.1 De…nition - Convergence in Distribution
LetX1; X2; : : : ; Xn; : : : be a sequence of random variables and let F1 (x) ; F2 (x) ; : : : ; Fn (x) ; : : :
be the corresponding sequence of cumulative distribution functions, that is, Xn has cumu-
lative distribution function Fn (x) = P (Xn x). Let X be a random variable with cumu-
lative distribution function F (x) = P (X x). We say Xn converges in distribution to X
and write
Xn !D X
if
lim
n!1Fn(x) = F (x)
at all points x at which F (x) is continuous. We call F the limiting or asymptotic distribution
of Xn.
Note:
(1) Although we say the random variable Xn converges in distribution to the random
variable X, the de…nition of convergence in distribution is de…ned in terms of the pointwise
convergence of the corresponding sequence of cumulative distribution functions.
(2) This de…nition holds for both discrete and continuous random variables.
(3) One way to think about convergence in distribution is that, if Xn !D X, then for large
n
Fn (x) = P (Xn x) F (x) = P (X x)
if x is a point of continuity of F (x). How good the approximation is will depend on the
values of n and x.
The following theorem and corollary will be useful in determining limiting distributions.
5.1. CONVERGENCE IN DISTRIBUTION 153
5.1.2 Theorem - e Limit
If b and c are real constants and lim
n!1 (n) = 0 then
lim
n!1

1 +
b
n
+
(n)
n
cn
= ebc
5.1.3 Corollary
If b and c are real constants then
lim
n!1

1 +
b
n
cn
= ebc
5.1.4 Example
Let Yi Exponential(1) ; i = 1; 2; : : : independently. Consider the sequence of random
variables X1; X2; : : : ; Xn; : : : where Xn = max (Y1; Y2; : : : ; Yn) log n. Find the limiting
distribution of Xn.
Solution
Since Yi Exponential(1)
P (Yi y) =
(
0 y 0
1 ey y > 0
for i = 1; 2; : : :. Since the Yi’s are independent random variables
Fn (x) = P (Xn x) = P (max (Y1; Y2; : : : ; Yn) log n x)
= P (max (Y1; Y2; : : : ; Yn) x+ log n)
= P (Y1 x+ log n; Y2 x+ log n; : : : ; Yn x+ log n)
=
nQ
i=1
P (Yi x+ log n)
=
nQ
i=1

1 e(x+logn)

for x+ log n > 0
=

1 e
x
n
n
for x > log n
As n!1, log n! 1 so
lim
n!1Fn(x) = limn!1

1 +
(ex)
n
n
= ee
x
for x 2 <
by 5.1.3.
154 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Consider the function
F (x) = ee
x
for x 2 <
Since F 0 (x) = exeex > 0 for all x 2 <, therefore F (x) is a continuous, increasing
function for x 2 <. Also lim
x!1F (x) = limx!1 e
ex = 0, and lim
x!1F (x) = limx!1 e
ex = 1.
Therefore F (x) is a cumulative distribution function for a continuous random variable.
Let X be a random variable with cumulative distribution function F (x). Since
lim
n!1Fn(x) = F (x)
for all x 2 <, that is at all points x at which F (x) is continuous, therefore
Xn !D X
In Figure 5.1 you can see how quickly the curves Fn (x) approach the limiting curve
F (x).
-3 -2 -1 0 1 2 3 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Fn(x)
n = 1
n = 2
n = 5
n = 10
n = infinity
Figure 5.1: Graphs of Fn (x) =

1 exn
n
for n = 1; 2; 5; 10;1
5.1.5 Example
Let Yi Uniform(0; ), i = 1; 2; : : : independently. Consider the sequence of random vari-
ables X1; X2; : : : ; Xn; : : : where Xn = max (Y1; Y2; : : : ; Yn). Find the limiting distribution
of Xn.
Solution
Since Yi Uniform(0; )
P (Yi y) =
8>><>>:
0 y 0
y
0 < y <
1 y
5.1. CONVERGENCE IN DISTRIBUTION 155
for i = 1; 2; : : :. Since the Yi’s are independent random variables
Fn (x) = P (Xn x) = P (max (Y1; Y2; : : : ; Yn) x)
= P (Y1 x; Y2 x; : : : ; Yn x) =
nQ
i=1
P (Yi x)
=
8>>>>>>><>>>>>>>:
nQ
i=1
0 x 0
nQ
i=1
x
0 < x <
nQ
i=1
1 x
=
8>><>>:
0 x 0
x

n
0 < x <
1 x
Therefore
lim
n!1Fn(x) =
(
0 x <
1 x = F (x)
In Figure 5.2 you can see how quickly the curves Fn (x) approach the limiting curve F (x).
-0.5 0 0.5 1 1.5 2 2.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Fn(x)
n = 1
n = 2
n = 5
n = 10
n = 100
n = infinity
Figure 5.2: Graphs of Fn (x) =

x

n for = 2 and n = 1; 2; 5; 10; 100;1
It is straightforward to check that F (x) is a cumulative distribution function for the
discrete random variable X with probability function
f (x) =
(
1 y =
0 otherwise
156 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Therefore
Xn !D X
Since X only takes on one value with probability one, X is called a degenerate random
variable. When Xn converges in distribution to a degenerate random variable we also call
this convergence in probability to a constant as de…ned in the next section.
5.1.6 Comment
Suppose X1; X2; : : : ; Xn; : : :is a sequence of random variables such that Xn !D X. Then
for large n we can use the approximation
P (Xn x) P (X x)
If X is degenerate at b then P (X = b) = 1 and this approximation is not very useful.
However, if the limiting distribution is degenerate then we could use this result in another
way. In Example 5.1.5 we showed that if Yi Uniform(0; ), i = 1; 2; : : : ; n indepen-
dently then Xn = max (Y1; Y2; : : : ; Yn) converges in distribution to a degenerate random
variable X with P (X = ) = 1. This result is rather useful since, if we have observed data
y1; y2; : : : ; yn from a Uniform(0; ) distribution and is unknown, then this suggests using
y(n) = max (y1; y2; : : : ; yn) as an estimate of if n is reasonably large. We will discuss this
idea in more detail in Chapter 6.
5.2 Convergence in Probability
De…nition 5.1.1 is useful for …nding the limiting distribution of a sequence of random vari-
ables X1; X2; : : : ; Xn; : : : when the sequence of corresponding cumulative distribution func-
tions F1; F2; : : : ; Fn; : : : can be obtained. In other cases we may use the following de…nition
to determine the limiting distribution.
5.2.1 De…nition - Convergence in Probability
A sequence of random variables X1; X2; : : : ; Xn; : : : converges in probability to a random
variable X if, for all " > 0,
lim
n!1P (jXn Xj ") = 0
or equivalently
lim
n!1P (jXn Xj < ") = 1
We write
Xn !p X
Convergence in probability is a stronger former of convergence than convergence in distri-
bution in the sense that convergence in probability implies convergence in distribution as
stated in the following theorem. However, if Xn converges in distribution to X, then Xn
may or may not converge in probability to X.
5.2. CONVERGENCE IN PROBABILITY 157
5.2.2 Theorem - Convergence in Probability Implies Convergence in Dis-
tribution
If Xn !p X then Xn !D X.
In Example 5.1.5 the limiting distribution was degenerate. When the limiting distribution
is degenerate we say Xn converges in probability to a constant. The following de…nition,
which follows from De…nition 5.2.1, can be used for proving convergence in probability.
5.2.3 De…nition - Convergence in Probability to a Constant
A sequence of random variables X1; X2; : : : ; Xn; : : : converges in probability to a constant b
if, for all " > 0,
lim
n!1P (jXn bj ") = 0
or equivalently
lim
n!1P (jXn bj < ") = 1
We write
Xn !p b
5.2.4 Example
Suppose X1; X2; : : : ; Xn; : : : is a sequence of random variables with E(Xn) = n and
V ar(Xn) =
2
n. If limn!1n = a and limn!1
2
n = 0 then show Xn !p a.
Solution
To show Xn !p a we need to show that for all " > 0
lim
n!1P (jXn aj < ") = 1
or equivalently
lim
n!1P (jXn aj ") = 0
Recall Markov’s Inequality. For all k; c > 0
P (jXj c) E
Xk
ck
Therefore by Markov’s Inequality with k = 2 and c = " we have
0 lim
n!1P (jXn aj ") limn!1
E

jXn aj2

"2
(5.1)
Now
E

jXn aj2

= E
h
(Xn a)2
i
= E
h
(Xn n)2
i
+ 2 (n a)E (Xn n) + (n a)2
158 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Since
lim
n!1E

[Xn n]2

= lim
n!1V ar (Xn) = limn!1
2
n = 0
and
lim
n!1 (n a) = 0 since limn!1n = a
therefore
lim
n!1E

jXn aj2

= lim
n!1E
h
(Xn a)2
i
= 0
which also implies
lim
n!1
E

jXn aj2

"2
= 0 (5.2)
Thus by (5.1), (5.2) and the Squeeze Theorem
lim
n!1P (jXn aj ") = 0
for all " > 0 as required.
The proof in Example 5.2.4 used De…nition 5.2.3 to prove convergence in probability. The
reason for this is that the distribution of the Xi’s was not speci…ed. Only conditions on
E(Xn) and V ar(Xn) were speci…ed. This means that the result in Example 5.2.4 holds for
any sequence of random variables X1; X2; : : : ; Xn; : : : satisfying the given conditions.
If the sequence of corresponding cumulative distribution functions F1; F2; : : : ; Fn; : : : can be
obtained then the following theorem can also be used to prove convergence in probability
to a constant.
5.2.5 Theorem
Suppose X1; X2; : : : ; Xn; : : : is a sequence of random variables such that Xn has cumulative
distribution function Fn(x). If
lim
n!1Fn(x) = limn!1P (Xn x) =
8<:0 x < b1 x > b
then Xn !p b.
Note: We do not need to worry about whether lim
n!1Fn(b) exists since x = b is a point of
discontinuity of the limiting distribution (see De…nition 5.1.1).
5.3. WEAK LAW OF LARGE NUMBERS 159
5.2.6 Example
Let Yi Exponential(; 1), i = 1; 2; : : : independently. Consider the sequence of random
variables X1; X2; : : : ; Xn; : : : where Xn = min (Y1; Y2; : : : ; Yn). Show that Xn !p .
Solution
Since Yi Exponential(; 1)
P (Yi > y) =
(
e(y) y >
1 y
for i = 1; 2; : : :. Since Y1; Y2; : : : ; Yn are independent random variables
Fn (x) = 1 P (Xn > x) = 1 P (min (Y1; Y2; : : : ; Yn) > x)
= 1 P (Y1 > x; Y2 > x; : : : ; Yn > x) = 1
nQ
i=1
P (Yi > x)
=
8>>><>>>:
1
nQ
i=1
1 x
1
nQ
i=1
e(x) x >
=
(
0 x
1 en(x) x >
Therefore
lim
n!1Fn(x) =
(
0 x
1 x >
which we note is not a cumulative distribution function since the function is not right-
continuous at x = . However
lim
n!1Fn(x) =
(
0 x <
1 x >
and therefore by Theorem 5.2.5, Xn !p . In Figure 5.3 you can see how quickly the limit
is approached.
5.3 Weak Law of Large Numbers
In this section we look at a very important result which we will use in Chapter 6 to show
that maximum likelihood estimators have good properties. This result is called the Weak
Law of Large Numbers. Needless to say there is another law called the Strong Law of Large
Numbers but we will not consider this law here.
Also in this section we will look at some simulations to illustrate the theoretical result
in the Weak Law of Large Numbers.
160 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
1.5 2 2.5 3 3.5 40
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
Fn(x)
n = 1
n = 2
n = 5
n = 10
n=100
n = infinity
Figure 5.3: Graphs of Fn (x) = 1 en(x) for = 2
5.3.1 Weak Law of Large Numbers
Suppose X1; X2; : : : are independent and identically distributed random variables with
E(Xi) = and V ar(Xi) = 2 < 1. Consider the sequence of random X1; X2; : : : ; Xn; : : :
where
Xn =
1
n
nP
i=1
Xi
Then
Xn !p
Proof
Using De…nition 5.2.3 we need to show
lim
n!1P
Xn " = 0 for all " > 0
We apply Chebyshev’s Theorem (see 2.8.2) to the random variable Xn where E

Xn

=
(see Corollary 3.6.3(3)) and V ar

Xn

= 2=n (see Theorem 3.6.7(4)) to obtain
P
Xn kp
n

1
k2
for all k > 0
Let k =
p
n"
. Then
0 P Xn " 2
n"
for all " > 0
Since
lim
n!1
2
n"
= 0
5.3. WEAK LAW OF LARGE NUMBERS 161
therefore by the Squeeze Theorem
lim
n!1P
Xn " = 0 for all " > 0
as required.
Notes:
(1) The proof of the Weak Law of Large Numbers does not actually require that the random
variables be identically distributed, only that they all have the same mean and variance.
As well the proof does not require knowing the distribution of these random variables.
(2) In words the Weak Law of Large Numbers says that the sample mean Xn approaches
the population mean as n!1.
5.3.2 Example
If X Pareto(1; ) then X has probability density function
f (x) =

x+1
for x 1
and 0 otherwise. X has cumulative distribution function
F (x) =
8<:0 if x < 11 1
x
for x 1
and inverse cumulative distribution function
F1 (x) = (1 x)1= for 0 < x < 1
Also
E (X) =
8<:1 if 0 < 1
1 if > 1
and
V ar (X) =

( 1)2 ( 2)
SupposeX1; X2; : : : ; Xn are independent and identically distributed Pareto(1; ) random
variables. By the Weak Law of Large Numbers
Xn =
1
n
nP
i=1
Xi !p E (X) =
1 for > 1
If Ui Uniform(0; 1), i = 1; 2; : : : ; n independently then by Theorem 2.6.6
Xi = F
1 (Ui)
= (1 Ui)1= Pareto (1; )
i = 1; 2; : : : ; n independently. If we generate Uniform(0; 1) observations u1; u2; : : : ; un using
a random number generator and then let xi = (1 ui)1= then x1; x2; : : : ; xn are observa-
tions from the Pareto(1; ) distribution.
162 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
The points (i; xi), i = 1; 2; : : : ; 500 for one simulation of 500 observations from a Pareto(1; 5)
distribution are plotted in Figure 5.4.
0 100 200 300 400 5000.5
1
1.5
2
2.5
3
3.5
4
4.5
i
xi
Figure 5.4: 500 observations from a Pareto(1; 5) distribution
Figure 5.5 shows a plot of the points (n; xn), n = 1; 2; : : : ; 500 where xn = 1n
nP
i=1
xi is
the sample mean. We note that the sample mean xn is approaching the population mean
= E (X) = 551 = 1:25 as n increases.
0 100 200 300 400 5001.15
1.2
1.25
1.3
1.35
1.4
1.45
1.5
n
sample
mean
m=1.25
Figure 5.5: Graph of xn versus n for 500 Pareto(1; 5) observations
5.3. WEAK LAW OF LARGE NUMBERS 163
If we generate a further 1000 values of xi, and plot (i; xi) ; i = 1; 2; : : : ; 1500 we obtain the
graph in Figure 5.6.
0 500 1000 15000.5
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
i
xi
Figure 5.6: 1500 observations from a Pareto(1; 5) distribution
The corresponding plot of (n; xn), n = 1; 2; : : : ; 1500 in shown in Figure 5.7. We note that
the sample mean xn stays very close to the population mean = 1:25 for n > 500.
0 500 1000 15001.15
1.2
1.25
1.3
1.35
1.4
1.45
1.5
n
sample
mean
m=1.25
Figure 5.7: Graph of xn versus n for 1500 Pareto(1; 5) observations
Note that these …gures correspond to only one set of simulated data. If we generated
another set of data using a random number generator the actual data points would change.
However what would stay the same is that the sample mean for the new data set would
still approach the mean value E (X) = 1:25 as n increases.
164 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
The points (i; xi), i = 1; 2; : : : ; 500 for one simulation of 500 observations from a Pareto(1; 0:5)
distribution are plotted in Figure 5.8. For this distribution
E (X) =
1Z
1
x

x+1
dx =
1Z
1
1
x
dx which diverges to 1
Note in Figure 5.8 that the are some very large observations. In particular there is one
observation which is close to 40; 000.
0 100 200 300 400 5000
0.5
1
1.5
2
2.5
3
3.5
4 x 10
4
i
xi
Figure 5.8: 500 observations from a Pareto(1; 0:5) distribution
The corresponding plot of (n; xn), n = 1; 2; : : : ; 500 for these data is given in Figure 5.9.
Note that the mean xn does not appear to be approaching a …xed value.
0 100 200 300 400 5000
100
200
300
400
500
600
700
n
sample
mean
Figure 5.9: Graph of xn versus n for 500 Pareto(1; 0:5) observations
5.3. WEAK LAW OF LARGE NUMBERS 165
In Figure 5.10 the points (n; xn) for a set of 50; 000 observations generated from a Pareto(1; 0:5)
distribution are plotted. Note that the mean xn does not approach a …xed value and in
general is getting larger as n gets large. This is consistent with E (X) diverging to 1.
0 1 2 3 4 5
x 10 4
0
2
4
6
8
10
12 x 10
4
n
sample
mean
Figure 5.10: Graph of xn versus n for 50000 Pareto(1; 0:5) observations
5.3.3 Example
If X Cauchy(0; 1) then
f (x) =
1
(1 + x2)
for x 2 <
The probability density function, shown in Figure 5.11, is symmetric about the y axis and
the median of the distribution is equal to 0.
-5 0 50
0.05
0.1
0.15
0.2
0.25
0.3
0.35
x
f(x)
Figure 5.11: Cauchy(0; 1) probability density function
166 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
The mean of a Cauchy(0; 1) random variable does not exist since
1

0Z
1
x
1 + x2
dx diverges to 1 (5.3)
and
1

1Z
0
x
1 + x2
dx diverges to 1 (5.4)
The cumulative distribution function of a Cauchy(0; 1) random variable is
F (x) =
1

arctanx+
1
2
for x 2 <
and the inverse cumulative distribution function is
F1 (x) = tan



x 1
2

for 0 < x < 1
If we generate Uniform(0; 1) observations u1; u2; : : : ; uN using a random number generator
and then let xi = tan



ui 12

, then the xi’s are observations from the Cauchy(0; 1)
distribution.
The points (n; xn) for a set of 500; 000 observations generated from a Cauchy(0; 1) distrib-
ution are plotted in Figure 5.12. Note that xn does not approach a …xed value. However,
unlike the Pareto(1; 0:5) example in which xn was getting larger as n got larger, we see in
Figure 5.12 that xn drifts back and forth around the line y = 0. This behaviour, which is
consistent with (5.3)and (5.4), continues even if more observations are generated.
0 1 2 3 4 5
x 10 5
-12
-10
-8
-6
-4
-2
0
2
4
6
8
n
sample
mean
Figure 5.12: Graph of xn versus n for 50000 Cauchy(0; 1) observations
5.4. MOMENTGENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS167
5.4 Moment Generating Function Technique for Limiting Dis-
tributions
We now look at the moment generating function technique for determining a limiting
distribution. Suppose we have a sequence of random variables X1; X2; : : : ; Xn; : : : and
M1 (t) ;M1 (t) ; : : : ;Mn (t) ; : : : is the corresponding sequence of moment generating func-
tions. For a …xed value of t, the sequence M1 (t) ;M1 (t) ; : : : ;Mn (t) ; : : : is a sequence of
real numbers. In general we will obtain a di¤erent sequence of real numbers for each dif-
ferent value of t. Since we have a sequence of real numbers we will be able to use limit
theorems you have used in your previous calculus courses to evaluate lim
n!1Mn (t). We will
need to take care in determining how Mn (t) behaves as n!1 for an interval of values of
t containing the value 0. Of course this technique only works if the the moment generating
function exists and is tractable.
5.4.1 Limit Theorem for Moment Generating Functions
Let X1; X2; : : : ; Xn; : : : be a sequence of random variables such that Xn has moment gener-
ating function Mn(t). Let X be a random variable with moment generating function M(t).
If there exists an h > 0 such that
lim
n!1Mn(t) = M(t) for all t 2 (h; h)
then
Xn !D X
Note:
(1) The sequence of random variables X1; X2; : : : ; Xn; : : : converges if the corresponding se-
quence of moment generating functions M1 (t) ;M1 (t) ; : : : ;Mn (t) ; : : : converges pointwise.
(2) This de…nition holds for both discrete and continuous random variables.
Recall from De…nition 5.1.1 that
Xn !D X
if
lim
n!1Fn(x) = F (x)
at all points x at which F (x) is continuous. If X is a discrete random variable then the
cumulative distribution function is a right continuous function. The values of x of main
interest for a discrete random variable are exactly the points at which F (x) is discontinu-
ous. The following theorem indicates that lim
n!1Fn(x) = F (x) holds for the values of x at
which F (x) is discontinuous if Xn and X are non-negative integer-valued random variables.
The named discrete distributions Bernoulli, Binomial, Geometric, Negative Binomial, and
Poisson are all non-negative integer-valued random variables.
168 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
5.4.2 Theorem
Suppose Xn and X are non-negative integer-valued random variables. If Xn !D X then
lim
n!1P (Xn x) = P (X x) holds for all x and in particular
lim
n!1P (Xn = x) = P (X = x) for x = 0; 1; : : : :
5.4.3 Example
Consider the sequence of random variables X1; X2; : : : ; Xk; : : : where
Xk Negative Binomial(k; p). Use Theorem 5.4.1 to determine the limiting distribution
of Xk as k ! 1, p ! 1 such that kq=p = remains constant where q = 1 p. Use this
limiting distribution and Theorem 5.4.2 to give an approximation for P (Xk = x).
Solution
If Xk Negative Binomial(k; p) then
Mk (t) = E

etXk

=

p
1 qet
k
for t < log q (5.5)
If = kq=p then
p =
k
+ k
and q =

+ k
(5.6)
Substituting 5.6 into 5.5 and simplifying gives
Mk (t) =

k
+k
1 +ket
!k
=

1
+ket
k
!k
=
"
1
1 (et1)k
#k
=
"
1

et 1
k
#k
for t < log


+ k

Now
lim
k!1
"
1

et 1
k
#k
= e(e
t1) for t <1
by Corollary 5.1.3. Since M (t) = e(e
t1) for t 2 < is the moment generating function of
a Poisson() random variable then by Theorem 5.4.1, Xk !D X Poisson().
By Theorem 5.4.2
P (Xk = x) =
k
x

pk (q)x

kq
p
x
ekq=p
x!
for x = 0; 1; : : :
5.4. MOMENTGENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS169
5.4.4 Exercise - Poisson Approximation to the Binomial Distribution
Consider the sequence of random variables X1; X2; : : : ; Xn; : : : where Xn Binomial(n; p).
Use Theorem 5.4.1 to determine the limiting distribution of Xn as n ! 1, p ! 0 such
that np = remains constant. Use this limiting distribution and Theorem 5.4.2 to give an
approximation for P (Xn = x).
In your previous probability and statistics courses you would have used the Central Limit
Theorem (without proof!) for approximating Binomial and Poisson probabilities as well as
constructing approximate con…dence intervals. We now give a proof of this theorem.
5.4.5 Central Limit Theorem
Suppose X1; X2; : : : are independent and identically distributed random variables with
E (Xi) = and V ar (Xi) = 2 <1. Consider the sequence of random variables Z1; Z2; : : : ; Zn; : : :
where Zn =
p
n( Xn)
and
Xn =
1
n
nP
i=1
Xi. Then
Zn =
p
n

Xn


!D Z N(0; 1)
Proof
We can write Zn as
Zn =
1

p
n
nP
i=1
(Xi )
Suppose that for i = 1; 2; : : :, Xi has moment generating function MX (t), t 2 (h; h) for
some h > 0. Then for i = 1; 2; : : :, (Xi ) has moment generating function
M (t) = etMX (t), t 2 (h; h) for some h > 0. Note that
M (0) = 1, M 0 (0) = E (Xi ) = E (Xi) = 0
and
M 00 (0) = E
h
(Xi )2
i
= V ar (Xi) =
2
Also by Taylor’s Theorem (see 2.11.15) for n = 2 we have
M (t) = M (0) +M 0 (0) t+
1
2
M 00 (c) t2
= 1 +
1
2
M 00 (c) t2 (5.7)
for some c between 0 and t.
170 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Since X1; X2; : : : ; Xn are independent and identically distributed, the moment generat-
ing function of Zn is
Mn (t) = E

etZn

= E

exp

t

p
n
nP
i=1
(Xi )

=

M

t

p
n
n
for
tpn
< h (5.8)
Using (5.7) in (5.8) gives
Mn (t) =
"
1 +
1
2
M 00 (cn)

t

p
n
2#n
=
(
1 +
1
2

t

p
n
2
M 00 (cn)M 00 (0)

+
1
2

t

p
n
2
M 00 (0)
)n
for some cn between 0 and
t

p
n
But M 00 (0) = 2 so
Mn (t) =
(
1 +

1
2 t
2

n
+

t2
2

[M 00 (cn)M 00 (0)]
n
+
)n
for some cn between 0 and
t

p
n
Since cn is between 0 and tpn , cn ! 0 as n!1. Since M 00 (t) is continuous on (h; h)
lim
n!1M
00 (cn) = M 00

lim
n!1 cn

= M 00 (0) = 2
and
lim
n!1

t2
2

[M 00 (cn)M 00 (0)]
n
= 0
Therefore by Theorem 5.1.2, with
(n) =

t2
2

M 00 (cn)M 00 (0)

we have
lim
n!1Mn (t) = limn!1
(
1 +

1
2 t
2

n
+

t2
2

[M 00 (cn)M 00 (0)]
n
+
)n
= e
1
2
t2 for jtj <1
which is the moment generating function of a N(0; 1) random variable. Therefore by The-
orem 5.4.1
Zn !D Z N(0; 1)
5.4. MOMENTGENERATING FUNCTION TECHNIQUE FOR LIMITING DISTRIBUTIONS171
as required.
Note: Although this proof assumes that the moment generating function of Xi, i = 1; 2; : : :
exists, it does not make any assumptions about the form of the distribution of the Xi’s.
There are other more general proofs of the Central Limit Theorem which only assume the
existence of the variance 2 (which implies the existence of the mean ).
5.4.6 Example - Normal Approximation to the 2 Distribution
Suppose Yn 2(n), n = 1; 2; : : :. Consider the sequence of random variables Z1; Z2; : : : ; Zn; : : :
where Zn = (Yn n) =
p
2n. Show that
Zn =
Yn np
2n
!D Z N(0; 1)
Solution
LetXi 2(1), i = 1; 2; : : : independently. SinceX1; X2; : : : are independent and identically
distributed random variables with E (Xi) = 1 and V ar (Xi) = 2, then by the Central Limit
Theorem p
n

Xn 1

p
2
!D Z N(0; 1)
But Xn = 1n
nP
i=1
Xi so
p
n

1
n
nP
i=1
Xi 1

p
2
=
Sn np
2n
where Sn =
nP
i=1
Xi. Therefore
Sn np
2n
!D Z N(0; 1)
Now by 4.3.2(6), Sn 2(n) and therefore Yn and Sn have the same distribution. It follows
that
Zn =
Yn np
2n
!D Z N(0; 1)
5.4.7 Exercise - Normal Approximation to the Binomial Distribution
Suppose Yn Binomial(n; p); n = 1; 2; : : : Consider the sequence of random variables
Z1; Z2; : : : ; Zn; : : : where Zn = (Yn np) =
p
np(1 p). Show that
Zn =
Yn npp
np(1 p) !D Z N(0; 1)
Hint: Let Xi Binomial(1; p); i = 1; 2; : : : ; n independently.
172 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
5.5 Additional Limit Theorems
Suppose we know the limiting distribution of one or more sequences of random variables by
using the de…nitions and/or theorems in the previous sections of this chapter. The theorems
in this section allow us to more easily determine the limiting distribution of a function of
these sequences.
5.5.1 Limit Theorems
(1) If Xn !p a and g is continuous at x = a then g(Xn)!p g(a).
(2) If Xn !p a, Yn !p b and g(x; y) is continuous at (a; b) then g(Xn; Yn)!p g(a; b).
(3) (Slutsky’s Theorem) If Xn !p a, Yn !D Y and g(a; y) is continuous for all y 2 support
set of Y then g(Xn; Yn)!D g(a; Y ).
Proof of (1)
Since g is continuous at x = a then for every " > 0 there exists a > 0 such that jx aj <
implies jg (x) g (a)j < ". By Example 2.1.4(d) this implies that
P (jg (Xn) g (a)j < ") P (jX aj < )
But Xn !p a it follows that for every " > 0 there exists a > 0 such that
lim
n!1P (jg (Xn) g (a)j < ") limn!1P (jX aj < ) = 1
But
lim
n!1P (jg (Xn) g (a)j < ") 1
so by the Squeeze Theorem
lim
n!1P (jg (Xn) g (a)j < ") = 1
and therefore g(Xn)!p g(a).
5.5.2 Example
If Xn !p a > 0, Yn !p b 6= 0 and Zn !D Z N(0; 1) then …nd the limiting distributions
of each of the following:
(b)
p
Xn
(b) Xn + Yn
(c) Yn + Zn
(d) XnZn
(e) Z2n
5.5. ADDITIONAL LIMIT THEOREMS 173
Solution
(a) Let g (x) =
p
x which is a continuous function for all x 2 <+. Since Xn !p a then by
5.5.1(1),
p
Xn = g (Xn)!p g (a) =
p
a or
p
Xn !p
p
a.
(b) Let g (x; y) = x + y which is a continuous function for all (x; y) 2 <2. Since Xn !p a
and Yn !p b then by 5.5.1(2), Xn+Yn = g (Xn; Yn)!p g (a; b) = a+b or Xn+Yn !p a+b.
(c) Let g (y; z) = y + z which is a continuous function for all (y; z) 2 <2. Since Yn !p b
and Zn !D Z N(0; 1) then by 5.5.1(3), Yn + Zn = g (Yn; Zn) !D g (b; z) = b + Z or
Yn+Zn !D b+Z where Z N(0; 1). Since b+Z N(b; 1), therefore Yn+Zn !D b+Z
N(b; 1)
(d) Let g (x; z) = xz which is a continuous function for all (x; z) 2 <2. Since Xn !p a and
Zn !D Z N(0; 1) then by Slutsky’s Theorem, XnZn = g (Xn; Zn) !D g (a; z) = aZ or
XnZn !D aZ where Z N(0; 1). Since aZ N

0; a2

, therefore XnZn !D aZ N

0; a2

(e) Let g (x; z) = z2 which is a continuous function for all (x; z) 2 <2. Since Zn !D Z
N(0; 1) then by Slutsky’s Theorem, Z2n = g (Xn; Zn)!D g (a; z) = Z2 or Z2n !D Z2 where
Z N(0; 1). Since Z2 2 (1), therefore Z2n !D Z2 2 (1)
5.5.3 Exercise
If Xn !p a > 0, Yn !p b 6= 0 and Zn !D Z N(0; 1) then …nd the limiting distributions
of each of the following:
(a) X2n
(b) XnYn
(c) Xn=Yn
(d) Xn 2Zn
(e) 1=Zn
In Example 5.5.2 we identi…ed the function g in each case. As with other limit theorems we
tend not to explicitly identify the function g once we have a good idea of how the theorems
work as illustrated in the next example.
5.5.4 Example
Suppose Xi Poisson(), i = 1; 2; : : : independently. Consider the sequence of random
variables Z1; Z2; : : : ; Zn; : : : where
Zn =
p
n

Xn
p
Xn
Find the limiting distribution of Zn.
174 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Solution
Since X1; X2; : : : are independent and identically distributed random variables with
E (Xi) = and V ar (Xi) = , then by the Central Limit Theorem
Wn =
p
n

Xn

p

!D Z N(0; 1) (5.9)
and by the Weak Law of Large Numbers
Xn !p (5.10)
By (5.10) and 5.5.1(1)
Un =
s
Xn

!p
r


= 1 (5.11)
Now
Zn =
p
n

Xn
p
Xn
=
p
n( Xn)p
q
Xn

=
Wn
Un
By (5.9), (5.11), and Slutsky’s Theorem
Zn =
Wn
Un
!D Z
1
= Z N(0; 1)
5.5.5 Example
Suppose Xi Uniform(0; 1), i = 1; 2; : : : independently. Consider the sequence of random
variables U1; U2; : : : ; Un; : : : where Un = max (X1; X2; : : : ; Xn). Show that
(a) Un !p 1
(b) eUn !p e
(c) sin (1 Un)!p 0
(d) Vn = n (1 Un)!D V Exponential(1)
(e) 1 eVn !D 1 eV Uniform(0; 1)
(f) (Un + 1)
2 [n (1 Un)]!D 4V Exponential(4)
5.5. ADDITIONAL LIMIT THEOREMS 175
Solution
(a) Since X1; X2; : : : ; Xn are Uniform(0; 1) random variables then for i = 1; 2; : : :
P (Xi x) =
8>><>>:
0 x 0
x 0 < x < 1
1 x 1
Since X1; X2; : : : ; Xn are independent random variables
Fn (x) = P (Un u)
= P (max (X1; X2; : : : ; Xn) u)
=
nQ
i=1
P (Xi u)
=
8>><>>:
0 u 0
un 0 < u < 1
1 u 1
Therefore
lim
n!1Fn(u) =
(
0 u < 1
1 u 1
and by Theorem 5.2.5
Un = max (X1; X2; : : : ; Xn)!p 1 (5.12)
(b) By (5.12) and 5.5.1(1)
eUn !p e1 = e
(c) By (5.12) and 5.5.1(1)
sin (1 Un)!p sin (1 1) = sin (0) = 0
(d) The cumulative distribution function of Vn = n (1 Un) is
Gn (x) = P (Vn v)
= P (n (1max (X1; X2; : : : ; Xn)) v)
= P

max (X1; X2; : : : ; Xn) 1 v
n

= 1 P

max (X1; X2; : : : ; Xn) 1 v
n

= 1
nQ
i=1
P

Xi 1 v
n

=
(
0 v 0
1 1 vnn v > 0
176 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
Therefore
lim
n!1Gn(v) =
(
0 v 0
1 ev v > 0
which is the cumulative distribution function of an Exponential(1) random variable. There-
fore by De…nition (5.1.1)
Vn = n (1 Un)!D V Exponential(1) (5.13)
(e) By (5.13) and Slutsky’s Theorem
1 eVn !D 1 eV where V Exponential(1)
The probability density function of W = 1 eV is
g (w) = elog(1w)
ddw [ log (1 w)]

= 1 for 0 < w < 1
which is the probability of a Uniform(0; 1) random variable. Therefore
1 eVn !D 1 eV Uniform (0; 1)
(f) By (5.12) and 5.5.1(1)
(Un + 1)
2 !p (1 + 1)2 = 4 (5.14)
By (5.13), (5.14), and Slutsky’s Theorem
(Un + 1)
2 [n (1 Un)]!D 4V
where V Exponential(1). If V Exponential(1) then 4V Exponential(4) so
(Un + 1)
2 [n (1 Un)]!D 4V Exponential (4)
5.5.6 Delta Method
Let X1; X2; : : : ; Xn; : : : be a sequence of random variables such that
nb(Xn a)!D X (5.15)
for some b > 0. Suppose the function g(x) is di¤erentiable at a and g0(a) 6= 0. Then
nb[g(Xn) g(a)]!D g0(a)X
5.5. ADDITIONAL LIMIT THEOREMS 177
Proof
By Taylor’s Theorem (2.11.15) we have
g (Xn) = g (a) + g
0 (cn) (Xn a)
or
g (Xn) g (a) = g0 (cn) (Xn a) (5.16)
where cn is between a and Xn.
From (5.15) it follows that Xn !p a. Since cn is between Xn and a, therefore cn !p a and
by 5.5.1(1)
g0 (cn)!p g0 (a) (5.17)
Multiplying (5.16) by nb gives
nb[g (Xn) g (a)] = g0 (cn)nb(Xn a) (5.18)
Therefore by (5.15), (5.17), (5.18) and Slutsky’s Theorem
nb[g(Xn) g(a)]!D g0(a)X
5.5.7 Example
Suppose Xi Exponential(), i = 1; 2; : : : independently. Find the limiting distributions
of each of the following:
(a) Xn
(b) Un =
p
n

Xn

(c) Zn =
p
n( Xn)
Xn
(d) Vn =
p
n

log( Xn) log

Solution
(a) Since X1; X2; : : : are independent and identically distributed random variables with
E (Xi) = and V ar (Xi) = 2, then by the Weak Law of Large Numbers
Xn !p (5.19)
(b) Since X1; X2; : : : are independent and identically distributed random variables with
E (Xi) = and V ar (Xi) = 2, then by the Central Limit Theorem
Wn =
p
n

Xn


!D Z N (0; 1) (5.20)
Therefore by Slutsky’s Theorem
Un =
p
n

Xn
!D Z N 0; 2 (5.21)
178 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
(c) Zn can be written as
Zn =
p
n

Xn

Xn
=
p
n( Xn)

Xn

By (5.19) and 5.5.1(1),
Xn

!p

= 1 (5.22)
By (5.20), (5.22), and Slutsky’s Theorem
Zn !D Z
1
= Z N(0; 1)
(d) Let g (x) = log x, a = , and b = 1=2. Then g0 (x) = 1x and g
0 (a) = g0 () = 1 . By
(5.21) and the Delta Method
n1=2

log( Xn) log
!D 1

(Z) = Z N (0; 1)
5.5.8 Exercise
Suppose Xi Poisson(), i = 1; 2; : : : independently. Show that
Un =
p
n

Xn
!D U N (0; )
and Vn =
p
n
p
Xn p

!D V N

0;
1
4

5.5.9 Theorem
Let X1; X2; : : : ; Xn; : : : be a sequence of random variables such that
p
n(Xn a)!D X N

0; 2

(5.23)
Suppose the function g(x) is di¤erentiable at a. Then
p
n[g(Xn) g(a)]!D W N

0;

g0(a)
2
2

provided g0(a) 6= 0.
Proof
Suppose g(x) is a di¤erentiable function at a and g0(a) 6= 0. Let b = 1=2. Then by (5.23)
and the Delta Method it follows that
p
n[g(Xn) g(a)]!D W N

0;

g0(a)
2
2

5.6. CHAPTER 5 PROBLEMS 179
5.6 Chapter 5 Problems
1. Suppose Yi Exponential(; 1), i = 1; 2; : : : independently. Find the limiting distri-
butions of
(a) Xn = min (Y1; Y2; : : : ; Yn)
(b) Un = Xn=
(c) Vn = n(Xn )
(d) Wn = n2(Xn )
2. Suppose X1; X2; : : : ; Xn are independent and identically distributed continuous ran-
dom variables with cumulative distribution function F (x) and probability density
function f(x). Let Yn = max (X1; X2; : : : ; Xn).
Show that
Zn = n[1 F (Yn)]!D Z Exponential(1)
3. Suppose Xi Poisson(), i = 1; 2; : : : independently. Find Mn(t) the moment
generating function of
Yn =
p
n( Xn )
Show that
lim
n!1 logMn(t) =
1
2
t2
What is the limiting distribution of Yn?
4. Suppose Xi Exponential(), i = 1; 2; : : : independently. Show that the moment
generating function of
Zn =

nP
i=1
Xi n

=
p
n
is
Mn(t) =
h
et=
p
n

1 t=pnin
Find lim
n!1Mn (t) and thus determine the limiting distribution of Zn.
5. If Z N(0; 1) and Wn 2(n) independently then we know
Tn =
Zp
Wn=n
t(n)
Show that
Tn !D Y N(0; 1)
180 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
6. SupposeX1; X2; : : : are independent and identically distributed random variables with
E(Xi) = , V ar(Xi) = 2 <1, and E(X4i ) <1. Let
Xn =
1
n
nP
i=1
Xi
S2n =
1
n 1
nP
i=1

Xi Xn
2
and
Tn =
p
n

Xn

Sn
Show that
Sn !p
and
Tn !D Z N(0; 1)
7. Let Xn Binomial(n; ). Find the limiting distributions of
(a) Tn = Xnn
(b) Un = Xnn

1 Xnn

(c) Wn =
p
n

Xn
n

(d) Zn = WnpUn
(e) Vn =
p
n

arcsin
q
Xn
n arcsin
p


(f) Compare the variances of the limiting distributions of Wn, Zn and Vn and com-
ment.
8. Suppose Xi Geometric(), i = 1; 2; : : : independently. Let
Yn =
nP
i=1
Xi
Find the limiting distributions of
(a) Xn = Ynn
(b) Wn =
p
n

Xn (1)

(c) Vn = 11+ Xn
(d) Zn =
p
n(Vn)p
V 2n (1Vn)
5.6. CHAPTER 5 PROBLEMS 181
9. Suppose Xi Gamma(2; ), i = 1; 2; : : : independently. Let
Yn =
nP
i=1
Xi
Find the limiting distributions of
(a) Xn = Ynn and
p
2
Xn=
p
2
(b) Wn =
p
n( Xn2)p
22
and Vn =
p
n

Xn 2

(c) Zn =
p
n( Xn2)
Xn=
p
2
(d) Un =
p
n

log( Xn) log(2)

(e) Compare the variances of the limiting distributions of Zn and Un.
10. Suppose X1; X2; : : : ; Xn; : : : is a sequence of random variables with
E(Xn) =
and
V ar(Xn) =
a
np
for p > 0
Show that
Xn !p
182 5. LIMITING OR ASYMPTOTIC DISTRIBUTIONS
6. Maximum Likelihood
Estimation - One Parameter
In this chapter we look at the method of maximum likelihood to obtain both point and
interval estimates of one unknown parameter. Some of this material was introduced in a
previous statistics course such as STAT 221/231/241.
In Section 6:2 we review the de…nitions needed for the method of maximum likelihood
estimation, the derivations of the maximum likelihood estimates for the unknown parameter
in the Binomial, Poisson and Exponential models, and the important invariance property of
maximum likelihood estimates. You will notice that we pay more attention to verifying that
the maximum likelihood estimate does correspond to a maximum using the …rst derivative
test. Example 6.2.9 is new and illustrates how the maximum likelihood estimate is found
when the support set of the random variable depends on the unknown parameter.
In Section 6:3 we de…ne the score function, the information function, and the expected
information function. These functions play an important role in the distribution of the
maximum likelihood estimator. These functions are also used in Newton’s Method which
is a method for determining the maximum likelihood estimate in cases where there is no
explicit solution. Although the maximum likelihood estimates in nearly all the examples
you saw previously could be found explicitly, this is not true in general.
In Section 6:4 we review likelihood intervals. Likelihood intervals provide a way to
summarize the uncertainty in an estimate. In Section 6:5 we give a theorem on the limiting
distribution of the maximum likelihood estimator. This important theorem tells us why
maximum likelihood estimators are good estimators.
In Section 6:6 we review how to …nd a con…dence interval using a pivotal quantity.
Con…dence intervals also give us a way to summarize the uncertainty in an estimate. We
also give a theorem on how to obtain a pivotal quantity using the maximum likelihood
estimator if the parameter is either a scale or location parameter. In Section 6:7 we review
how to …nd an approximate con…dence interval using an asymptotic pivotal quantity. We
then show how to use asymptotic pivotal quantities based on the limiting distribution of
the maximum likelihood estimator to construct approximate con…dence intervals.
183
184 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.1 Introduction
Suppose the random variable X (possibly a vector of random variables) has probability
function/probability density function f(x; ). Suppose also that is unknown and 2

where
is the parameter space or the set of possible values of . Let X be the potential
data that is to be collected. In your previous statistics course you learned how numerical
and graphical summaries as well as goodness of …t tests could be used to check whether the
assumed model for an observed set of data x was reasonable. In this course we will assume
that the …t of the model has been checked and that the main focus now is to use the model
and the data to determine point and interval estimates of .
6.1.1 De…nition - Statistic
A statistic, T = T (X), is a function of the data X which does not depend on any unknown
parameters.
6.1.2 Example
SupposeX = (X1; X2; : : : ; Xn) is a random sample, that is, X1; X2; : : : ; Xn are independent
and identically distributed random variables, from a distribution with E(Xi) = and
V ar(Xi) =
2 where and 2 are unknown.
The sample mean X = 1n
nP
i=1
Xi, the sample variance S2 = 1n1
nP
i=1

Xi X
2, and the
sample minimum X(1) = min (X1; X2; : : : ; Xn) are statistics.
The random variable
X
=
p
n
is not a statistic since it is not only a function of the data X
but also a function of the unknown parameters and 2.
6.1.3 De…nition - Estimator and Estimate
A statistic T = T (X) that is used to estimate (), a function of , is called an estimator
of () and an observed value of the statistic t = t(x) is called an estimate of ().
6.1.4 Example
Suppose X = (X1; X2; : : : ; Xn) are independent and identically distributed random vari-
ables from a distribution with E(Xi) = and V ar(Xi) = 2, i = 1; 2; : : : ; n. Suppose
x = (x1; x2; : : : ; xn) is an observed random sample from this distribution.
The random variable X is an estimator of . The number x is an estimate of .
The random variable S is an estimator of . The number s is an estimate of .
6.2. MAXIMUM LIKELIHOOD METHOD 185
6.2 Maximum Likelihood Method
Suppose X is discrete random variable with probability function P (X = x; ) = f (x; ),
2
where the scalar parameter is unknown. Suppose also that x is an observed value
of the random variable X. Then the probability of observing this value is, P (X = x; ) =
f (x; ). With the observed value of x substituted into f(x; ) we have a function of the
parameter only, referred to as the likelihood function and denoted L (;x) or L(). In the
absence of any other information, it seems logical that we should estimate the parameter
using a value most compatible with the data. For example we might choose the value of
which maximizes the probability of the observed data or equivalently the value of which
maximizes the likelihood function.
6.2.1 De…nition - Likelihood Function: Discrete Case
Suppose X is a discrete random variable with probability function f(x; ), where is a
scalar and 2
and x is an observed value of X. The likelihood function for based on
the observed data x is
L() = L (;x)
= P (observing the data x; )
= P (X = x; )
= f(x; ) for 2

If X = (X1; X2; : : : ; Xn) is a random sample from a distribution with probability function
f(x; ) and x = (x1; x2; : : : ; xn) are the observed data then the likelihood function for
based on the observed data x is
L() = L (;x)
= P (observing the data x; )
= P (X1 = x1; X2 = x2; : : : ; Xn = xn; )
=
nQ
i=1
f(xi; ) for 2

6.2.2 De…nition - Maximum Likelihood Estimate and Estimator
The value of that maximizes the likelihood function L() is called the maximum likelihood
estimate. The maximum likelihood estimate is a function of the observed data x and we
write ^ = ^ (x). The corresponding maximum likelihood estimator, which is a random
variable, is denoted by ~ = ~(X).
The shape of the likelihood function and the value of at which it is maximized are not
a¤ected if L() is multiplied by a constant. Indeed it is not the absolute value of the
likelihood function that is important but the relative values at two di¤erent values of the
186 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
parameter, for example, L(1)=L(2). This ratio can be interpreted as how much more or
less consistent the data are with the parameter 1 as compared to 2. The ratio L(1)=L(2)
is also una¤ected if L() is multiplied by a constant. In view of this the likelihood may be
de…ned as P (X = x; ) or as any constant multiple of it.
To …nd the maximum likelihood estimate we usually solve the equation ddL () = 0. If
the value of which maximizes L () occurs at an endpoint of
, of course this does not
provide the value at which the likelihood is maximized. In Example 6.2.9 we see that solving
d
dL () = 0 does not give the maximum likelihood estimate. Most often however we will
…nd ^ by solving ddL () = 0.
Since the log function (Note: log = ln) is an increasing function, the value of which
maximizes the likelihood L () also maximizes logL(), the logarithm of the likelihood
function. Since it is usually simpler to …nd the derivative of the sum of n terms rather
than the product, it is often easier to determine the maximum likelihood estimate of by
solving dd logL () = 0.
It is important to verify that ^ is the value of which maximizes L () or equivalently
l (). This can be done using the …rst derivative test. Recall that the second derivative test
checks for a local extremum.
6.2.3 De…nition - Log Likelihood Function
The log likelihood function is de…ned as
l() = l (;x) = logL() for 2

where x are the observed data and log is the natural logarithmic function.
6.2.4 Example
Suppose in a sequence of n Bernoulli trials the probability of success is equal to and we
have observed x successes. Find the likelihood function, the log likelihood function, the
maximum likelihood estimate of and the maximum likelihood estimator of .
Solution
The likelihood function for based on x successes in n trials is
L() = P (X = x; )
=

n
x

x (1 )nx for 0 1
or more simply
L() = x (1 )nx for 0 1
6.2. MAXIMUM LIKELIHOOD METHOD 187
Suppose x 6= 0 and x 6= n. The log likelihood function is
l() = x log + (n x) log (1 ) for 0 < < 1
with derivative
d
d
l() =
x

n x
1
=
x (1 ) (n x)
(1 )
=
x n
(1 ) for 0 < < 1
The solution to dd l() = 0 is = x=n which is the sample proportion. Since
d
d l() > 0 if
0 < < x=n and dd l() < 0 if x=n < < 1 then, by the …rst derivative test, l() has a
absolute maximum at = x=n.
If x = 0 then
L() = (1 )n for 0 1
which is a decreasing function of on the interval [0; 1]. L() is maximized at the endpoint
= 0 or = 0=n.
If x = n then
L() = n for 0 1
which is a increasing function of on the interval [0; 1]. L() is maximized at the endpoint
= 1 or = n=n.
In all cases the value of which maximizes the likelihood function is the sample pro-
portion = x=n. Therefore the maximum likelihood estimate of is ^ = x=n and the
maximum likelihood estimator is ~ = x=n.
6.2.5 Example
Suppose x1; x2; : : : ; xn is an observed random sample from the Poisson() distribution. Find
the likelihood function, the log likelihood function, the maximum likelihood estimate of
and the maximum likelihood estimator of .
Solution
The likelihood function is
L() =
nQ
i=1
f(xi; ) =
nQ
i=1
P (Xi = xi; )
=
nQ
i=1
xie
xi!
=

nQ
i=1
1
xi!


nP
i=1
xi
en for 0
188 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
or more simply
L() = nxen for 0
Suppose x 6= 0. The log likelihood function is
l () = n (x log ) for > 0
with derivative
d
d
l () = n
x

1

=
n

(x ) for > 0
The solution to dd l() = 0 is = x which is the sample mean. Since
d
d l() > 0 if 0 < < x
and dd l() < 0 if > x then, by the …rst derivative test, l() has a absolute maximum at
= x.
If x = 0 then
L() = en for 0
which is a decreasing function of on the interval [0;1). L() is maximized at the endpoint
= 0 or = 0 = x.
In all cases the value of which maximizes the likelihood function is the sample mean
= x. Therefore the maximum likelihood estimate of is ^ = x and the maximum
likelihood estimator is ~ = X.
6.2.6 Likelihood Functions for Continuous Models
Suppose X is a continuous random variable with probability density function f(x; ). For
continuous continuous random variable, P (X = x; ) is unsuitable as a de…nition of the
likelihood function since this probability always equals zero.
For continuous data we usually observe only the value of X rounded to some degree
of precision, for example, data on waiting times is rounded to the closest second or data
on heights is rounded to the closest centimeter. The actual observation is really a discrete
random variable. For example, suppose we observe X correct to one decimal place. Then
P (we observe 1:1 ; ) =
1:15Z
1:05
f(x; )dx (0:1)f(1:1; )
assuming the function f(x; ) is reasonably smooth over the interval. More generally, sup-
pose x1; x2; : : : ; xn are the observations from a random sample from the distribution with
probability density function f(x; ) which have been rounded to the nearest which is
assumed to be small. Then
P (X1 = x1; X2 = x2; : : : ; Xn = xn; )
nQ
i=1
f(xi; ) =
n
nQ
i=1
f(xi; )
6.2. MAXIMUM LIKELIHOOD METHOD 189
If we assume that the precision does not depend on the unknown parameter , then the
term n can be ignored. This argument leads us to adopt the following de…nition of the
likelihood function for a random sample from a continuous distribution.
6.2.7 De…nition - Likelihood Function: Continuous Case
If x = (x1; x2; : : : ; xn) are the observed values of a random sample from a distribution with
probability density function f(x; ), then the likelihood function is de…ned as
L () = L (;x) =
nQ
i=1
f(xi; ) for 2

6.2.8 Example
Suppose x1; x2; : : : ; xn is an observed random sample from the Exponential() distribution.
Find the likelihood function, the log likelihood function, the maximum likelihood estimate
of and the maximum likelihood estimator of .
Solution
The likelihood function is
L() =
nQ
i=1
f(xi; ) =
nQ
i=1
1

exi=
=
1
n
exp


nP
i=1
xi=

= nenx= for > 0
The log likelihood function is
l() = n

log +
x


for > 0
with derivative
d
d
l () = n

1

x
2

=
n
2
(x )
Now dd l () = 0 for = x. Since
d
d l() > 0 if 0 < < x and
d
d l() < 0 if > x then, by
the …rst derivative test, l() has a absolute maximum at = x. Therefore the maximum
likelihood estimate of is ^ = x and the maximum likelihood estimator is ~ = X.
6.2.9 Example
Suppose x1; x2; : : : ; xn is an observed random sample from the Uniform(0; ) distribution.
Find the likelihood function, the maximum likelihood estimate of and the maximum
likelihood estimator of .
190 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Solution
The probability density function of a Uniform(0; ) random variable is
f (x; ) =
1

for 0 x
and zero otherwise. The support set of the random variable X is [0; ] which depends on the
unknown parameter . In such examples care must be taken in determining the maximum
likelihood estimate of .
The likelihood function is
L() =
nQ
i=1
f(xi; )
=
nQ
i=1
1

if 0 xi , i = 1; 2; : : : ; n, and > 0
=
1
n
if 0 xi , i = 1; 2; : : : ; n, and > 0
To determine the value of which maximizes L() we note that L() can be written as
L() =
8<:0 if 0 < < x(n)1
n if x(n)
where x(n) = max (x1; x2; : : : ; xn) is the maximum of the sample. To see this remember
that in order to observe the sample x1; x2; : : : ; xn the value of must be larger than all the
observed xi’s.
L() is a decreasing function of on the interval [x(n);1). Therefore L() is maximized
at = x(n). The maximum likelihood estimate of is ^ = x(n) and the maximum likelihood
estimator is ~ = X(n).
Note: In this example there is no solution to dd l () =
d
d (n log ) = 0 and the maximum
likelihood estimate of is not found by solving dd l () = 0.
One of the reasons the method of maximum likelihood is so widely used is the invariance
property of the maximum likelihood estimate under one-to-one transformations.
6.2.10 Theorem - Invariance of the Maximum Likelihood Estimate
If ^ is the maximum likelihood estimate of then g(^) is the maximum likelihood estimate
of g ().
Note: The invariance property of the maximum likelihood estimate means that if we know
the maximum likelihood estimate of then we know the maximum likelihood estimate of
any function of .
6.3. SCORE AND INFORMATION FUNCTIONS 191
6.2.11 Example
In Example 6.2.8 …nd the maximum likelihood estimate of the median of the distribution
and the maximum likelihood estimate of V ar(~).
Solution
If X has an Exponential() distribution then the median m is found by solving
0:5 =
mZ
0
1

ex=dx
to obtain
m = log (0:5)
By the Invariance of the Maximum Likelihood Estimate the maximum likelihood estimate
of m is m^ = ^ log (0:5) = x log (0:5).
Since Xi has an Exponential() distribution with V ar (Xi) = 2, i = 1; 2; : : : ; n indepen-
dently, the variance of the maximum likelihood estimator ~ = X is
V ar

~

= V ar( X) =
2
n
By the Invariance of the Maximum Likelihood Estimate the maximum likelihood estimate
of V ar(~) is (^)2=n = (x)2 =n.
6.3 Score and Information Functions
The derivative of the log likelihood function plays an important role in the method of
maximum likelihood. This function is often called the score function.
6.3.1 De…nition - Score Function
The score function is de…ned as
S() = S(;x) =
d
d
l () =
d
d
logL () for 2

where x are the observed data.
Another function which plays an important role in the method of maximum likelihood
is the information function.
192 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.3.2 De…nition - Information Function
The information function is de…ned as
I() = I(;x) = d
2
d2
l() = d
2
d2
logL() for 2

where x are the observed data. I(^) is called the observed information.
In Section 6:7 we will see how the observed information I(^) can be used to construct
an approximate con…dence interval for the unknown parameter . I(^) also tells us about
the concavity of the log likelihood function l ().
6.3.3 Example
Find the observed information for Example 6.2.5. Suppose the maximum likelihood esti-
mate of was ^ = 2. Compare I(^) = I (2) if n = 10 and n = 25. Plot the function
r () = l () l(^) for n = 10 and n = 25 on the same graph.
Solution
From Example 6.2.5, the score function is
S() =
d
d
l () =
n

(x ) for > 0
Therefore the information function is
d
2
d2
l () = d
d
h
n
x

1
i
=

nx
2

=
nx
2
and the observed information is
I(^) =
nx
^
2 =
n^
^
2
=
n
^
If n = 10 then I(^) = I (2) = n=^ = 10=2 = 5. If n = 25 then I(^) = I (2) = 25=2 = 12:5.
See Figure 6.1. The function r () = l () l(^) is more concave and symmetric for n = 25
than for n = 10. As the number of observations increases we have more “information”
about the unknown parameter .
Although we view the likelihood, log likelihood, score and information functions as
functions of , they are also functions of the observed data x. When it is important to
emphasize the dependence on the data x we will write L(;x); S(;x); and I(;x). When
we wish to determine the sampling distribution of the corresponding random variables we
will write L(;X), S(;X), and I(;X).
6.3. SCORE AND INFORMATION FUNCTIONS 193
1 1.5 2 2.5 3 3.5
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
q
r(q)
n=10
n=25
Figure 6.1: Poisson Log Likelihoods for n = 10 and n = 25
Here is one more function which plays an important role in the method of maximum
likelihood.
6.3.4 De…nition - Expected Information Function
If is a scalar then the expected information function is given by
J() = E [I(;X)]
= E

d
2
d2
l(;X)

for 2

where X is the potential data.
Note:
If X = (X1; X2; : : : ; Xn) is a random sample from f(x; ) then
J () = E

d
2
d2
l(;X)

= nE

d
2
d2
log f(X; )

for 2

where X has probability density function f(x; ).
194 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.3.5 Example
For each of the following …nd the observed information I(^) and the expected information
J (). Compare I(^) and J(^). Determine the mean and variance of the maximum likeli-
hood estimator ~. Compare the expected information with the variance of the maximum
likelihood estimator.
(a) Example 6.2.4 (Binomial)
(b) Example 6.2.5 (Poisson)
(c) Example 6.2.8 (Exponential)
Solution
(a) From Example 6.2.4, the score function based on x successes in n Bernoulli trials is
S() =
d
d
l () =
x n
(1 ) for 0 < < 1
Therefore the information function is
I () = d
2
d2
l () = d
d

x

n x
1

=
x
2
n x
(1 )2

=
x
2
+
n x
(1 )2 for 0 < < 1
Since the maximum likelihood estimate is ^ = xn the observed information is
I(^) =
x
^
2 +
n x
(1 ^)2 =
x
x
n
2 + n x
1 xn
2
= n
"
x
n
x
n
2 + 1 xn
1 xn
2
#
= n

1
x
n
+
1
1 xn

=
n
x
n

1 xn

=
n
^(1 ^)
If X has a Binomial(n; ) distribution then E (X) = n and V ar (X) = n (1 ). There-
fore the expected information is
J() = E [I(;X)] = E

X
2
+
nX
(1 )2

=
n
2
+
n (1 )
(1 )2
=
n [(1 ) + ]
(1 )
=
n
(1 ) for 0 < < 1
We note that I(^) = J(^).
6.3. SCORE AND INFORMATION FUNCTIONS 195
The maximum likelihood estimator is ~ = X=n with
E(~) = E

X
n

=
1
n
E (X) =
n
n
=
and
V ar(~) = V ar

X
n

=
1
n2
V ar (X) =
n (1 )
n2
=
(1 )
n
=
1
J ()
(b) From Example 6.3.3 the information function based on Poisson data x1; x2; : : : ; xn is
I() =
nx
2
for > 0
Since the maximum likelihood estimate is ^ = x the observed information is
I(^) =
nx
^
2 =
n
^
We note that I(^) = J(^).
Since Xi has a Poisson() distribution with E (Xi) = V ar (Xi) = then E

X

= and
V ar

X

= =n. Therefore the expected information is
J() = E [I(;X1; X2; : : : ; Xn)] = E

n X
2

=
n
2
()
=
n

for > 0
The maximum likelihood estimator is ~ = X with
E(~) = E

X

=
and
V ar(~) = V ar

X

=

n
=
1
J ()
(c) From Example 6.2.8 the score function based on Exponential data x1; x2; : : : ; xn is
S() =
d
d
l () = n

1

x
2

for > 0
Therefore the information function is
I () = d
2
d2
l () = n
d
d

1

x
2

= n
1
2
+
2x
3

for > 0
196 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Since the maximum likelihood estimate is ^ = x the observed information is
I(^) = n

1
^
2 +
2^
^
3
!
=
n
(^)2
Since Xi has a Exponential() distribution with E (Xi) = and V ar (Xi) = 2 then
E

X

= and V ar

X

= 2=n. Therefore the expected information is
J() = E [I(;X1; X2; : : : ; Xn)] = nE
1
2
+
2 X
3

= n
1
2
+
2
2

=
n
2
for > 0
We note that I(^) = J(^).
The maximum likelihood estimator is ~ = X with
E(~) = E

X

=
and
V ar(~) = V ar

X

=
2
n
=
1
J ()
In all three examples we have I(^) = J(^), E(~) = , and V ar(~) = [J ()]1.
In the three previous examples, we observed that E(~) = and therefore ~ was an
unbiased estimator of . This is not always true for maximum likelihood estimators as we
see in the next example. However, maximum likelihood estimators usually have other good
properties. Suppose ~n = ~n (X1; X2; : : : ; Xn) is the maximum likelihood estimator based
on a sample of size n. If lim
n!1E(
~n) = then ~n is an asymptotically unbiased estimator of
. If ~n !p then ~n is called a consistent estimator of .
6.3.6 Example
Suppose x1; x2; : : : ; xn is an observed random sample from the distribution with probability
density function
f(x; ) = x1 for 0 x 1 for > 0 (6.1)
(a) Find the score function, the maximum likelihood estimator, the information function,
the observed information, and the expected information.
(b) Show that T =
nP
i=1
logXi Gamma

n; 1

(c) Use (b) and 2.7.9 to show that ~ is not an unbiased estimator of . Show however that
~ is an aysmptotically unbiased estimator of .
(d) Show that ~ is a consistent estimator of .
6.3. SCORE AND INFORMATION FUNCTIONS 197
(e) Use (b) and 2.7.9 to …nd V ar(~). Compare V ar(~) with the expected information.
Solution
(a) The likelihood function is
L() =
nQ
i=1
f(xi; ) =
nQ
i=1
x1i
= n

nQ
i=1
xi
1
for > 0
or more simply
L() = n

nQ
i=1
xi

for > 0
The log likelihood function is
l() = n log +
nP
i=1
log xi
= n log t for > 0
where t =
nP
i=1
log xi. The score function is
S () =
d
d
l ()
=
n

t
=
1

(n t) for > 0
Now dd l () = 0 for = n=t. Since
d
d l() > 0 if 0 < < n=t and
d
d l() < 0 if > n=t
then, by the …rst derivative test, l() has a absolute maximum at = n=t. Therefore the
maximum likelihood estimate of is ^ = n=t and the maximum likelihood estimator is
~ = n=T where T =
nP
i=1
logXi.
The information function is
I() = d
2
d2
l()
=
n
2
for > 0
and the observed information is
I(^) =
n
^
2
The expected information is
J() = E [I(;X1; X2; : : : ; Xn)] = E

n
2

=
n
2
for > 0
198 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
(b) From Exercise 2.6.12 we have that if Xi has the probability density function (6.1) then
Yi = logXi Exponential

1


for i = 1; 2; : : : ; n (6.2)
From 4.3.2(4) have that
T =
nP
i=1
logXi =
nP
i=1
Yi Gamma

n;
1


(c) Since T Gamman; 1 then from 2.7.9 we have
E (T p) =
(n+ p)
p (n)
for p > n
Assuming n > 1 we have
E

T1

=
(n 1)
1 (n)
=

n 1
so
E

~

= E
n
T

= nE

1
T

= nE

T1

= n


n 1

=

1 1n
6=
and therefore ~ is not an unbiased estimator of .
Now ~ is an estimator based on a sample of size n. Since
lim
n!1E

~

= lim
n!1

1 1n
=
therefore ~ is an asymptotically unbiased estimator of .
(d) By (6.2), Y1; Y2; : : : ; Yn are independent and identically distributed random variables
with E (Yi) = 1 and V ar (Yi) =
1
2
<1. Therefore by the Weak Law of Large Numbers
T
n
=
1
n
nP
i=1
Yi !p 1

and by the Limit Theorems
~ =
n
T
!p
Thus ~ is a consistent estimator of .
6.3. SCORE AND INFORMATION FUNCTIONS 199
(e) To determine V ar(~) we note that
V ar(~) = V ar
n
T

= n2V ar

1
T

= n2
(
E
"
1
T
2#


E

1
T
2)
= n2
n
E

T2
E T12o
Since
E

T2

=
(n 2)
2 (n)
=
2
(n 1) (n 2)
then
V ar(~) = n2
n
E

T2
E T12o
= n2
"
2
(n 1) (n 2)


n 1
2#
= n22

(n 1) (n 2)
(n 1)2 (n 2)

=
2
1 1n
2
(n 2)
We note that V ar

~

6= 2n = 1J() , however for large n, V ar

~

1J() .
6.3.7 Example
Suppose x1; x2; : : : ; xn is an observed random sample from the Weibull(; 1) distribution
with probability density function
f (x; ) = x1ex

for x > 0; > 0
Find the score function and the information function. How would you …nd the maximum
likelihood estimate of ?
Solution
The likelihood function is
L() =
nQ
i=1
f(xi; ) =
nQ
i=1
x1i e
xi
= n

nQ
i=1
xi
1
exp


nP
i=1
xi

for > 0
200 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
or more simply
L() = n

nQ
i=1
xi

exp


nP
i=1
xi

for > 0
The log likelihood function is
l() = n log +
nP
i=1
log xi
nP
i=1
xi for > 0
The score function is
S () =
d
d
l ()
=
n

+ t
nP
i=1
xi log xi for > 0
where t =
nP
i=1
log xi.
Notice that S() = 0 cannot be solved explicitly. The maximum likelihood estimate can
only be determined numerically for a given sample of data x1; x2; : : : ; xn. Since
d
d
S () =

n
2
+
nP
i=1
xi (log xi)
2

for > 0
is negative for all values of > 0 then we know that the function S () is always decreasing
and therefore there is only one solution to S() = 0. The solution to S() = 0 gives the
maximum likelihood estimate.
The information function is
I() = d
2
d2
l()
=
n
2
+
nP
i=1
xi (log xi)
2 for > 0
To illustrate how to …nd the maximum likelihood estimate for a given sample of data, we
randomly generate 20 observations from the Weibull(; 1) distribution. To do this we use the
result of Example 2.6.7 in which we showed that if u is an observation from the Uniform(0; 1)
distribution then x = [ log (1 u)]1= is an observation from the Weibull(; 1) distribution.
The following R code generates the data, plots the likelihood function, …nds ^ by solving
S() = 0 using the R function uniroot, and determines S(^) and the observed information
I(^).
6.3. SCORE AND INFORMATION FUNCTIONS 201
# randomly generate 20 observations from a Weibull(theta,1)
# using a random theta value between 0.5 and 1.5
set.seed(20086689) # set the seed so results can be reproduced
truetheta<-runif(1,min=0.5,max=1.5)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round((-log(1-runif(20)))^(1/truetheta),2))
x
#
# function for calculating Weibull likelihood for data x and theta=th
WBLF<-function(th,x)
{n<-length(x)
L<-th^n*prod(x)^th*exp(-sum(x^th))
return(L)}
#
# function for calculating Weibull score for data x and theta=th
WBSF<-function(th,x)
{n<-length(x)
t<-sum(log(x))
S<-(n/th)+t-sum(log(x)*x^th)
return(S)}
#
# function for calculating Weibull information for data x and theta=th
WBIF<-function(th,x)
{n<-length(x)
I<-(n/th^2)+sum(log(x)^2*x^th)
return(I)}
#
# plot the Weibull likelihood function
th<-seq(0.25,0.75,0.01)
L<-sapply(th,WBLF,x)
plot(th,R,"l",xlab=expression(theta),
ylab=expression(paste("L(",theta,")")),lwd=3)
#
# find thetahat using uniroot function
thetahat<-uniroot(function(th) WBSF(th,x),lower=0.4,upper=0.6)$root
cat("thetahat = ",thetahat) # display value of thetahat
# display value of Score function at thetahat
cat("S(thetahat) = ",WBSF(thetahat,x))
# calculate observed information
cat("Observed Information = ",WBIF(thetahat,x))
202 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
The generated data are
0:01 0:01 0:05 0:07 0:10 0:11 0:23 0:28 0:44 0:46
0:64 1:07 1:16 1:25 2:40 3:03 3:65 5:90 6:60 30:07
The likelihood function is graphed in Figure 6.2. The solution to S() = 0 determined
by uniroot was ^ = 0:4951607. S(^) = 2:468574 105 which is close to zero and the
observed information was I(^) = 181:8069 which is positive and indicates a local maximum.
We have already shown in this example that the solution to S() = 0 is the unique maximum
likelihood estimate.
0.3 0.4 0.5 0.6 0.7
0e
+0
0
2e
-2
0
4e
-2
0
6e
-2
0
8e
-2
0
q
L(
q)
Figure 6.2: Likelihood function for Example 6.3.7
Note that the interval [0:25; 0:75] used to graph the likelihood function was determined
by trial and error. The values lower=0.4 and upper=0.6 used for uniroot were determined
from the graph of the likelihood function. From the graph is it easy to see that the value
of ^ lies in the interval [0:4; 0:6].
Newton’s Method, which is a numerical method for …nding the roots of an equation usually
discussed in …rst year calculus, is a method which can be used for …nding the maximum like-
lihood estimate. Newton’s Method usually works quite well for …nding maximum likelihood
estimates because the likelihood function is often very quadratic in shape.
6.3. SCORE AND INFORMATION FUNCTIONS 203
6.3.8 Newton’s Method
Let (0) be an initial estimate of . The estimate (i) can be updated using
(i+1) = (i) +
S((i))
I((i))
for i = 0; 1; : : :
Notes:
(1) The initial estimate, (0), may be determined by graphing L () or l ().
(2) The algorithm is usually run until the value of (i) no longer changes to a reasonable
number of decimal places. When the algorithm is stopped it is always important to check
that the value of obtained does indeed maximize L ().
(3) This algorithm is also called the Newton-Raphson Method.
(4) I () can be replaced by J () for a similar algorithm which is called the Method of
Scoring.
(5) If the support set of X depends on (e.g. Uniform(0; )) then ^ is not found by solving
S() = 0.
6.3.9 Example
Use Newton’s Method to …nd the maximum likelihood in Example 6.3.7.
Solution
Here is R code for Newton’s Method for the Weibull Example
# Newton’s Method for Weibull Example
NewtonWB<-function(th,x)
{thold<-th
thnew<-th+0.1
while (abs(thold-thnew)>0.00001)
{thold<-thnew
thnew<-thold+WBSF(thold,x)/WBIF(thold,x)
print(thnew)}
return(thnew)}
#
thetahat<-NewtonWB(0.2,x)
Newton’s Method converges after four iterations and the value of thetahat returned is
^ = 0:4951605 which is the same value to six decimal places as was obtained above using
the uniroot function.
204 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.4 Likelihood Intervals
In your previous statistics course, likelihood intervals were introduced as one approach to
constructing an interval estimate for the unknown parameter .
6.4.1 De…nition - Relative Likelihood Function
The relative likelihood function R() is de…ned by
R() = R (;x) =
L()
L(^)
for 2

where x are the observed data.
The relative likelihood function takes on values between 0 and 1 and can be used to rank
parameter values according to their plausibilities in light of the observed data. If R(1) =
0:1, for example, then 1 is rather an implausible parameter value because the data are
ten times more probable when = ^ than they are when = 1. However, if R(1) = 0:5,
then 1 is a fairly plausible value because it gives the data 50% of the maximum possible
probability under the model.
6.4.2 De…nition - Likelihood Interval
The set of values for which R() p is called a 100p% likelihood interval for .
Values of inside a 10% likelihood interval are referred to as plausible values in light of the
observed data. Values of outside a 10% likelihood interval are referred to as implausible
values given the observed data. Values of inside a 50% likelihood interval are very plausible
and values of outside a 1% likelihood interval are very implausible in light of the data.
6.4.3 De…nition - Log Relative Function
The log relative likelihood function is the natural logarithm of the relative likelihood func-
tion:
r() = r (;x) = log[R()] for 2

where x are the observed data.
Likelihood regions or intervals may be determined from a graph of R() or r(). Alterna-
tively, they can be found by solving R () p = 0 or r() log p = 0. In most cases this
must be done numerically.
6.4. LIKELIHOOD INTERVALS 205
6.4.4 Example
Plot the relative likelihood function for in Example 6.3.7. Find 10% and 50% likelihood
intervals for .
Solution
Here is R code to plot the relative likelihood function for the Weibull Example with lines
for determining 10% and 50% likelihood intervals for as well as code to determine these
intervals using uniroot.
The R function WBRLF uses the R function WBLF from Example 6.3.7.
# function for calculating Weibull relative likelihood function
WBRLF<-function(th,thetahat,x)
{R<-WBLF(th,x)/WBLF(thetahat,x)
return(R)}
#
# plot the Weibull relative likelihood function
th<-seq(0.25,0.75,0.01)
R<-sapply(th,WBRLF,thetahat,x)
plot(th,R,"l",xlab=expression(theta),
ylab=expression(paste("R(",theta,")")),lwd=3)
# add lines to determine 10% and 50% likelihood intervals
abline(a=0.10,b=0,col="red",lwd=2)
abline(a=0.50,b=0,col="blue",lwd=2)
#
# use uniroot to determine endpoints of 10%, 15%, and 50% likelihood intervals
uniroot(function(th) WBRLF(th,thetahat,x)-0.1,lower=0.3,upper=0.4)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.1,lower=0.6,upper=0.7)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.5,lower=0.35,upper=0.45)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.5,lower=0.55,upper=0.65)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.15,lower=0.3,upper=0.4)$root
uniroot(function(th) WBRLF(th,thetahat,x)-0.15,lower=0.6,upper=0.7)$root
The graph of the relative likelihood function is given in Figure 6.3.
The upper and lower values used for uniroot were determined using this graph.
The 10% likelihood interval for is [0:34; 0:65].
The 50% likelihood interval for is [0:41; 0:58].
For later reference the 15% likelihood interval for is [0:3550; 0:6401].
206 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
0.3 0.4 0.5 0.6 0.7
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
q
R
( q
)
Figure 6.3: Relative Likelihood function for Example 6.3.7
6.4.5 Example
Suppose x1; x2; : : : ; xn is an observed random sample from the Uniform(0; ) distribution.
Plot the relative likelihood function for if n = 20 and x(20) = 1:5. Find 10% and 50%
likelihood intervals for .
Solution
From Example 6.2.9 the likelihood function for n = 10 and ^ = x(10) = 0:5 is
L() =
8<:0 if 0 < < 0:51
10
if 0:5
The relative likelihood function is
R() =
8<:0 if 0 < < 0:50:5

10
if 0:5
is graphed in Figure 6.4 along with lines for determining 10% and 50% likelihood intervals.
To determine the value of at which the horizontal line R = p intersects the graph of
R() we solve

0:5

10
= p to obtain = 0:5p1=10. Since R() = 0 if 0 < < 0:5 then a
100p% likelihood interval for is of the form

0:5; 0:5p1=10

.
6.4. LIKELIHOOD INTERVALS 207
A 10% likelihood interval ish
0:5; 0:5 (0:1)1=10
i
= [0:5; 0:629]
A 50% likelihood interval ish
0:5; 0:5 (0:5)1=10
i
= [0:5; 0:536]
0 0.2 0.4 0.6 0.8 1
0
0.1
0.3
0.5
0.7
0.9
1
q
R
( q
)
Figure 6.4: Relative likelihood function for Example 6.4.5
More generally for an observed random sample x1; x2; : : : ; xn from the Uniform(0; ) distri-
bution a 100p% likelihood interval for will be of the form

x(n); x(n)p
1=n.
6.4.6 Exercise
Suppose x1; x2; : : : ; xn is an observed random sample from the Two Parameter Exponential(1; )
distribution. Plot the relative likelihood function for if n = 12 and x(1) = 2. Find 10%
and 50% likelihood intervals for .
208 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.5 Limiting Distribution of Maximum Likelihood Estimator
If there is no explicit expression for the maximum likelihood estimator ~, as in Example
6.3.7, then its sampling distribution can only be obtained by simulation. This makes it
di¢ cult to determine the properties of the maximum likelihood estimator. In particular,
to determine how good an estimator is, we look at its mean and variance. The mean
indicates whether the estimator is close, on average, to the true value of , while the variance
indicates the uncertainty in the estimator. The larger the variance the more uncertainty in
our estimation. To determine how good an estimator is we also look at how the mean and
variance behave as the sample size n!1.
We saw in Example 6.3.5 that E(~) = and V ar(~) = [J ()]1 ! 0 as n ! 1
for the Binomial, Poisson, Exponential models. For each of these models the maximum
likelihood estimator is an unbiased estimator of . In Example 6.3.6 we saw that E(~)!
and V ar(~) ! [J ()]1 ! 0 as n ! 1. The maximum likelihood estimator ~ is an
asymptotically unbiased estimator. In Example 6.3.7 we are not able to determine E(~)
and V ar(~).
The following theorem gives the limiting distribution of the maximum likelihood esti-
mator in general under certain restrictions.
6.5.1 Theorem - Limiting Distribution of Maximum Likelihood Estima-
tor
SupposeXn = (X1; X2; : : : ; Xn) be a random sample from the probability (density) function
f(x; ) for 2
. Let ~n = ~n (Xn) be the maximum likelihood estimator of based on
Xn.
Then under certain (regularity) conditions
~n !p (6.3)
(~n )[J()]1=2 !D Z N(0; 1) (6.4)
2 logR(;Xn) = 2 log

L(;Xn)
L(~n;Xn)

!D W 2(1) (6.5)
for each 2
.
The proof of this result which depends on applying Taylor’s Theorem to the score
function is beyond the scope of this course. The regularity conditions are a bit complicated
but essentially they are a set of conditions which ensure that the error term in Taylor’s
Theorem goes to zero as n!1. One of the conditions is that the support set of f(x; ) does
not depend on . Therefore, for example, this theorem cannot be applied to the maximum
likelihood estimator in the case of a random sample from the Uniform(0; ) distribution.
6.5. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR 209
This is actually not a problem since the distribution of the maximum likelihood estimator
in this case can be determined exactly.
Since (6.3) holds ~n is called a consistent estimator of .
Theorem 6.5.1 implies that for su¢ ciently large n, ~n has an approximately N(; 1=J())
distribution. Therefore for large n
E(~n)
and the maximum likelihood estimator is an asymptotically unbiased estimator of .
Since ~n has an approximately N(; 1=J()) distribution this also means that for su¢ -
ciently large n
V ar(~n) 1
J()
1=J() is called the asymptotic variance of ~n. Of course J() is unknown because is
unknown. By (6.3), (6.4), and Slutsky’s Theorem
(~n )
q
J(~n)!D Z N(0; 1) (6.6)
which implies that the asymptotic variance of ~n can be estimated using 1=J(^n). Therefore
for su¢ ciently large n we have
V ar(~n) 1
J(^n)
By the Weak Law of Large Numbers
1
n
I(;Xn) = 1
n
nP
i=1
d2
d2
l(;Xi)!p E

d
2
d2
l(;Xi)

(6.7)
Therefore by (6.3), (6.4), (6.7) and the Limit Theorems it follows that
(~n )
q
I(~n;Xn)!D Z N(0; 1) (6.8)
which implies the asymptotic variance of ~n can be also be estimated using 1=I(^n) where
I(^n) is the observed information. Therefore for su¢ ciently large n we have
V ar(~n) 1
I(^n)
Results (6.6), (6.8) and (6.5) can be used to construct approximate con…dence intervals
for .
In Chapter 8 we will see how result (6.5) can be used in a test of hypothesis.
Although we will not prove Theorem 6.5.1 in general we can prove the results in a par-
ticular case. The following example illustrates how techniques and theorems from previous
chapters can be used together to obtained the results of interest. It is also a good review
of several ideas covered thus far in these Course Notes.
210 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.5.2 Example
(a) Suppose X Weibull(2; ). Show that E X2 = 2 and V ar X2 = 4.
(b) Suppose X1; X2; : : : ; Xn is a random sample from the Weibull(2; ) distribution. Find
the maximum likelihood estimator ~ of , the information function I (), the observed
information I(^), and the expected information J ().
(c) Show that
~n !p
(d) Show that
(~n )[J()]1=2 !D Z N(0; 1)
Solution
(a) From Exercise 2.7.9 we have
E

Xk

= k

k
2
+ 1

for k = 1; 2; : : : (6.9)
if X Weibull(2; ). Therefore
E

X2

= 2

2
2
+ 1

= 2
E

X4

= 4

4
2
+ 1

= 24
and
V ar

X2

= E
h
X2
2i E X22 = 24 22 = 4
(b) The likelihood function is
L() =
nQ
i=1
f(xi; ) =
nQ
i=1
2xie
(xi=)2
2
= 2n

nQ
i=1
2xi

exp

1
2
nP
i=1
x2i

for > 0
or more simply
L() = 2n exp

t
2

for > 0
where t =
nP
i=1
x2i . The log likelihood function is
l() = 2n log t
2
for > 0
The score function is
S () =
d
d
l () =
2n

+
2t
3
=
2
2

t n2 for > 0
6.5. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR 211
Now dd l () = 0 for =

t
n
1=2. Since dd l() > 0 if 0 < < tn1=2 and dd l() < 0 if
>

t
n
1=2 then, by the …rst derivative test, l() has a absolute maximum at = tn1=2.
Therefore the maximum likelihood estimate of is ^ =

t
n
1=2. The maximum likelihood
estimator is
~ =

T
n
1=2
where T =
nP
i=1
X2i
The information function is
I() = d
2
d2
l()
=
6T
4
2n
2
for > 0
and the observed information is
I(^) =
6t
^
4
2n
^
2 =
6(n^
2
)
^
4
2n
^
2
=
4n
^
2
To …nd the expected information we note that, from (a), E

X2i

= 2 and thus
E (T ) = E

nP
i=1
X2i

= n2
Therefore the expected information is
J() = E

6T
4
2n
2

=
6E (T )
4
2n
2
=
6n2
4
2n
2
=
4n
2
for > 0
(c) To show that ~n !p we need to show that
~ =

T
n
1=2
=

1
n
nP
i=1
X2i
1=2
!p
Since X21 ; X
2
2 ; : : : ; X
2
n are independent and identically distributed random variables with
E

X2i

= 2 and V ar

X2i

= 4 for i = 1; 2; : : : ; n then by the Weak Law of Large
Numbers
T
n
=
1
n
nP
i=1
X2i !p 2
and by 5.5.1(1)
~ =

T
n
1=2
!p (6.10)
as required.
212 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
(d) To show that
(~n )[J()]1=2 !D Z N(0; 1)
we need to show that
[J()]1=2(~n )
=
2
p
n

"
T
n
1=2

#
!D Z N(0; 1)
Since X21 ; X
2
2 ; : : : ; X
2
n are independent and identically distributed random variables with
E

X2i

= 2 and V ar

X2i

= 4 for i = 1; 2; : : : ; n then by the Central Limit Theorem
p
n

T
n 2

2
!D Z N(0; 1) (6.11)
Let g (x) =
p
x and a = 2. Then ddxg (x) =
1
2
p
x
and g0 (a) = 12 . By (6.11) and the Delta
Method we have p
n
q
T
n
p
2

2
!D 1
2
Z N

0;
1
42

or
(2)
p
n
q
T
n

2
(6.12)
=
2
p
n

"
T
n
1=2

#
!D Z N(0; 1) (6.13)
as required.
6.5.3 Example
Suppose X1; X2; : : : ; Xn is a random sample from the Uniform(0; ) distribution. Since
the support set of Xi depends on Theorem 6.5.1 does not hold. Show however that the
maximum likelihood estimator ~n = X(n) is a consistent estimator of .
Solution
In Example 5.1.5 we showed that ~n = max (X1; X2; : : : ; Xn) !p and therefore ~n is a
consistent estimator of .
6.6. CONFIDENCE INTERVALS 213
6.6 Con…dence Intervals
In your previous statistics course a con…dence interval was used to summarize the available
information about an unknown parameter. Con…dence intervals allow us to quantify the
uncertainty in the unknown parameter.
6.6.1 De…nition - Con…dence Interval
Suppose X is a random variable (possibly a vector) whose distribution depends on , and
A(X) and B(X) are statistics. If
P [A(X) B(X)] = p for 0 < p < 1
then [a(x); b(x)] is called a 100p% con…dence interval for where x are the observed data.
Con…dence intervals can be constructed in a straightforward manner if a pivotal quantity
exists.
6.6.2 De…nition - Pivotal Quantity
Suppose X is a random variable (possibly a vector) whose distribution depends on . The
random variable Q(X; ) is called a pivotal quantity if the distribution of Q does not depend
on .
Pivotal quantities can be used for constructing con…dence intervals in the following way.
Since the distribution of Q(X; ) is known we can write down a probability statement of
the form
P (q1 Q(X; ) q2) = p
where q1 and q2 do not depend on . If Q is a monotone function of then this statement
can be rewritten as
P [A(X) B(X)] = p
and the interval [a(x); b(x)] is a 100p% con…dence interval.
6.6.3 Example
Suppose X = (X1; X2; : : : ; Xn) is a random sample from the Exponential() distribution.
Determine the distribution of
Q(X; ) =
2
nP
i=1
Xi

and thus show Q(X; ) is a pivotal quantity. Show how Q(X; ) can be used to construct
a 100p% equal tail con…dence interval for .
214 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Solution
Since X = (X1; X2; : : : ; Xn) is a random sample from the Exponential() distribution then
by 4.3.2(4)
Y =
nP
i=1
Xi Gamma (n; )
and by Chapter 2, Problem 7(c)
Q(X; ) =
2Y

=
2
nP
i=1
Xi

2 (2n)
and therefore Q(X; ) is a pivotal quantity.
Find values a and b such that
P (W a) = 1 p
2
and P (W b) = 1 p
2
where W 2 (2n).
Since P (a W b) = p then
P
0BB@a 2
nP
i=1
Xi

b
1CCA = p
or
P
0BB@2
nP
i=1
Xi
b

2
nP
i=1
Xi
a
1CCA = p
Therefore 26642
nP
i=1
xi
b
;
2
nP
i=1
xi
a
3775
is a 100p% equal tail con…dence interval for .
The following theorem gives the pivotal quantity in the case in which is either a location
or scale parameter.
6.6.4 Theorem
Let X = (X1; X2; : : : ; Xn) be a random sample from f (x; ) and let ~ = ~(X) be the
maximum likelihood estimator of the scalar parameter based on X.
(1) If is a location parameter of the distribution then Q(X; ) = ~ is a pivotal quantity.
(2) If is a scale parameter of the distribution then Q(X; ) = ~= is a pivotal quantity.
6.6. CONFIDENCE INTERVALS 215
6.6.5 Example
Suppose X = (X1; X2; : : : ; Xn) is a random sample from the Weibull(2; ) distribution. Use
Theorem 6.6.4 to …nd a pivotal quantity Q(X; ). Show how the pivotal quantity can be
used to construct a 100p% equal tail con…dence interval for .
Solution
From Chapter 2, Problem 3(a) we know that is a scale parameter for the Weibull(2; )
distribution. From Example 6.5.2 the maximum likelihood estimator of is
~ =

T
n
1=2
=

1
n
nP
i=1
X2i
1=2
Therefore by Theorem 6.6.4
Q(X; ) =
~

=
1


1
n
nP
i=1
X2i
1=2
is a pivotal quantity. To construct a con…dence interval for we need to determine the
distribution of Q(X; ). This looks di¢ cult at …rst until we notice that
~

!2
=
1
n2
nP
i=1
X2i
This form suggests looking for the distribution of
nP
i=1
X2i which is a sum of independent and
identically distributed random variables X21 ; X
2
2 ; : : : ; X
2
n.
From Chapter 2, Problem 7(h) we have that if Xi Weibull(2; ), i = 1; 2; : : : ; n then
X2i Exponential(2) for i = 1; 2; : : : ; n
Therefore
nP
i=1
X2i is a sum of independent Exponential(
2) random variables. By 4.3.2(4)
nP
i=1
X2i Gamma

n; 2

and by Chapter 2, Problem 7(h)
Q1(X; ) =
2
nP
i=1
X2i
2
2 (2n)
and therefore Q1(X; ) is a pivotal quantity.
216 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
Now
Q1(X; ) = 2n

~

!2
= 2n [Q(X; )]2
is a one-to-one function of Q(X; ). To see that Q1(X; ) and Q(X; ) generate the same
con…dence intervals for we note that
P (a Q1(X; ) b)
= P
0@a 2n ~

!2
b
1A
= P
" a
2n
1=2 ~



b
2n
1=2#
= P
" a
2n
1=2 Q(X; ) b
2n
1=2#
To construct a 100p% equal tail con…dence interval we choose a and b such that
P (W a) = 1 p
2
and P (W b) = 1 p
2
where W 2 (2n). Since
p = P
" a
2n
1=2 ~



b
2n
1=2#
= P
"
~

2n
b
1=2
~

2n
a
1=2#
a 100p% equal tail con…dence interval is"
^

2n
b
1=2
; ^

2n
a
1=2#
where ^ =

1
n
nP
i=1
x2i
1=2
6.6.6 Example
Suppose X = (X1; X2; : : : ; Xn) is a random sample from the Uniform(0; ) distribution.
Use Theorem 6.6.4 to …nd a pivotal quantity Q(X; ). Show how the pivotal quantity can
be used to construct a 100p% con…dence interval for of the form [^; a^].
6.6. CONFIDENCE INTERVALS 217
Solution
From Chapter 2, Problem 3(b) we know that is a scale parameter for the Uniform(0; )
distribution. From Example 6.2.9 the maximum likelihood estimator of is
~ = X(n)
Therefore by Theorem 6.6.4
Q(X; ) =
~

=
X(n)

is a pivotal quantity. To construct a con…dence interval for we need to determine the
distribution of Q(X; ).
P (Q (X; ) q) = P

~

q
!
= P

X(n) q

=
nQ
i=1
P (Xi q)
=
nQ
i=1
q since P (Xi x) = x

for 0 x
= qn for 0 q 1
To construct a 100p% con…dence interval for of the form [^; a^] we need to choose a such
that
p = P

~ a~

= P

1
~
a

= P

1
a

~

1
!
= P

1
a
Q (X; ) 1

= P (Q (X; ) 1) P

Q (X; ) 1
a

= 1 P

Q (X; ) 1
a

= 1 an
or a = (1 p)1=n. The 100p% con…dence interval for ish
^; (1 p)1=n ^
i
where ^ = x(n).
218 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.7 Approximate Con…dence Intervals
A pivotal quantity does not always exist. For example there is no pivotal quantity for the
Binomial or the Poisson distributions. In these cases we use an asymptotic pivotal quantity
to construct approximate con…dence intervals.
6.7.1 De…nition - Asymptotic Pivotal Quantity
Suppose Xn = (X1; X2; : : : ; Xn) is a random variable whose distribution depends on . The
random variable Q(Xn; ) is called an asymptotic pivotal quantity if the limiting distribution
of Q(Xn; ) as n!1 does not depend on .
In your previous statistics course, approximate con…dence intervals for the Binomial and
Poisson distribution were justi…ed using a Central Limit Theorem argument. We are now
able to clearly justify the asymptotic pivotal quantity using the theorems of Chapter 5.
6.7.2 Example
SupposeXn = (X1; X2; : : : ; Xn) is a random sample from the Poisson() distribution. Show
that
Q(Xn; ) =
p
n

Xn
p
Xn
is an asymptotic pivotal quantity. Show how Q(Xn; ) can be used to construct an approx-
imate 100p% equal tail con…dence interval for .
Solution
In Example 5.5.4, the Weak Law of Large Numbers, the Central Limit Theorem and Slut-
sky’s Theorem were all used to prove
Q(Xn; ) =
p
n

Xn
p
Xn
!D Z N(0; 1) (6.14)
and thus Q(Xn; ) is an asymptotic pivotal quantity.
Let a be the value such that P (Z a) = (1 + p) =2 where Z N(0; 1). Then by (6.14) we
have
p P

a
p
n

Xn
p
Xn
a
!
= P

Xn a
r
Xn
n
Xn a
r
Xn
n
!
and an approximate 100p% equal tail con…dence interval for is"
xn a
r
xn
n
; xn + a
r
xn
n
#
6.7. APPROXIMATE CONFIDENCE INTERVALS 219
or 24^n a
s
^n
n
; ^n + a
s
^n
n
35 (6.15)
since ^n = xn.
6.7.3 Exercise
Suppose Xn Binomial(n; ). Show that
Q(Xn; ) =
p
n

Xn
n
q
Xn
n

1 Xnn

is an asymptotic pivotal quantity. Show that an approximate 100p% equal tail con…dence
interval for based on Q(Xn; ) is given by24^n a
s
^n(1 ^n)
n
; ^n + a
s
^n(1 ^n)
n
35 (6.16)
where ^n = xnn .
6.7.4 Approximate Con…dence Intervals and the Limiting Distribution
of the Maximum Likelihood Estimator
The limiting distribution of the maximum likelihood estimator ~n = ~n (X1; X2; : : : ; Xn)
can also be used to construct approximate con…dence intervals. This is particularly useful
in cases in which the maximum likelihood estimate cannot be found explicitly.
Since h
J(~n)
i1=2
(~n )!D Z N(0; 1)
then
h
J(~n)
i1=2
(~n ) is an asymptotic pivotal quantity. An approximate 100p% con…-
dence interval based on this asymptotic pivotal quantity is given by
^n a
s
1
J(^n)
=
"
^n a
s
1
J(^n)
; ^n + a
s
1
J(^n)
#
(6.17)
where P (Z a) = 1+p2 and Z N(0; 1).
Similarly since
[I(~n;X)]
1=2(~n )!D Z N(0; 1)
then [I(~n;X)]1=2(~n ) is an asymptotic pivotal quantity. An approximate 100p% con…-
dence interval based on this asymptotic pivotal quantity is given by
^n a
s
1
I(^n)
=
"
^n a
s
1
I(^n)
; ^n + a
s
1
I(^n)
#
(6.18)
220 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
where I(^n) is the observed information.
Notes:
(1) One drawback of these intervals is that we don’t know how large n needs to be to obtain
a good approximation.
(2) These approximate con…dence intervals are both symmetric about ^n which may not
be a reasonable summary of the plausible values of in light of the observed data. See
likelihood intervals below.
(3) It is possible to obtain approximate con…dence intervals based on a given data set which
contain values which are not valid for . For example, an interval may contain negative
values although must be positive.
6.7.5 Example
Use the results from Example 6.3.5 to determine the approximate 100p% con…dence intervals
based on (6.17) and (6.18) in the case of Binomial data and Poisson data. Compare these
intervals with the intervals in (6.16) and (6.15).
Solution
From Example 6.3.5F we have that for Binomial data
I(^n) = J(^n) =
n
^n(1 ^n)
so (6.17) and (6.18) both give the intervals24^n a
s
^n(1 ^n)
n
; ^n + a
s
^n(1 ^n)
n
35
which is the same interval as in (6.16).
From Example 6.3.5F we have that for Poisson data
I(^n) = J(^n) =
n
^n
so (6.17) and (6.18) both give the intervals24^n a
s
^n
n
; ^n + a
s
^n
n
35
which is the same interval as in (6.15).
6.7. APPROXIMATE CONFIDENCE INTERVALS 221
6.7.6 Example
For Example 6.3.7 construct an approximate 95% con…dence interval based on (6.18). Com-
pare this with the 15% likelihood interval determined in Example 6.4.4.
Solution
From Example 6.3.7 we have ^ = 0:4951605 and I(^) = 181:8069. Therefore an approximate
95% con…dence interval based on (6.18) is
^ 1:96
s
1
I(^)
= 0:4951605 1:96
r
1
181:8069
= 0:4951605 0:145362
= [0:3498; 0:6405]
From Example 6.4.4 the 15% likelihood interval is [0:3550; 0:6401] which is very similar. We
expect this to happen since the relative likelihood function in Figure 6.3 is very symmetric.
6.7.7 Approximate Con…dence Intervals and Likelihood Intervals
In your previous statistics course you learned that likelihood intervals are also approximate
con…dence intervals.
6.7.8 Theorem
If a is a value such that p = 2P (Z a)1 where Z N (0; 1), then the likelihood intervaln
: R() ea2=2
o
is an approximate 100p% con…dence interval.
Proof
By Theorem 6.5.1
2 logR(;Xn) = 2 log

L(;Xn)
L(~n;Xn)

!D W 2(1) (6.19)
where Xn = (X1; X2; : : : ; Xn). The con…dence coe¢ cient corresponding to the intervaln
: R() ea2=2
o
is
P

L(;Xn)
L(~n;Xn)
ea2=2

= P
2 logR(;Xn) a2
P W a2 where W 2 (1) by 6.19
= 2P (Z a) 1 where Z N (0; 1)
= p
as required.
222 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.7.9 Example
Since
0:95 = 2P (Z 1:96) 1 where Z N (0; 1)
and
e(1:96)
2=2 = e1:9208 0:1465 0:15
therefore a 15% likelihood interval for is also an approximate 95% con…dence interval for
.
6.7.10 Exercise
(a) Show that a 1% likelihood interval is an approximate 99:8% con…dence interval.
(b) Show that a 50% likelihood interval is an approximate 76% con…dence interval.
Note that while the con…dence intervals given by (6.17) or (6.18) are symmetric about the
point estimate ^n, this is not true in general for likelihood intervals.
6.7.11 Example
For Example 6.3.7 compare a 15% likelihood interval with the approximate 95% con…dence
interval in Example 6.7.6.
Solution
From Example 6.3.7 the 15% likelihood interval is
[0:3550; 0:6401]
and from Example 6.7.6 the approximate 95% con…dence interval is
[0:3498; 0:6405]
These intervals are very close and agree to 2 decimal places. The reason for this is because
the likelihood function (see Figure 6.3) is very symmetric about the maximum likelihood
estimate. The approximate intervals (6.17) or (6.18) will be close to the corresponding
likelihood interval whenever the likelihood function is reasonably symmetric about the
maximum likelihood estimate.
6.7.12 Exercise
Suppose x1; x2; : : : ; xn is an observed random sample from the Logistic(; 1) distribution
with probability density function
f (x; ) =
e(x)
1 + e(x)
2 for x 2 <; 2 <
6.7. APPROXIMATE CONFIDENCE INTERVALS 223
(a) Find the likelihood function, the score function, and the information function. How
would you …nd the maximum likelihood estimate of ?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then
x = log

1
u
1

is an observation from the Logistic(; 1) distribution.
(c) Use the following R code to randomly generate 30 observations from a Logistic(; 1)
distribution.
# randomly generate 30 observations from a Logistic(theta,1)
# using a random theta value between 2 and 3
set.seed(21086689) # set the seed so results can be reproduced
truetheta<-runif(1,min=2,max=3)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round((truetheta-log(1/runif(30)-1)),2))
x
(d) Use R to plot the likelihood function for based on these data.
(e) Use Newton’s Method and R to …nd ^.
(f) What are the values of S(^) and I(^)?
(g) Use R to plot the relative likelihood function for based on these data.
(h) Compare the 15% likelihood interval with the approximate 95% con…dence interval
(6.18).
224 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
6.8 Chapter 6 Problems
1. Suppose X Binomial(n; ). Plot the log relative likelihood function for if x = 3 is
observed for n = 100. On the same graph plot the log relative likelihood function for
if x = 6 is observed for n = 200. Compare the graphs as well as the 10% likelihood
interval and 50% likelihood interval for .
2. Suppose x1; x2; : : : ; xn is an observed random sample from the Discrete Uniform(1; )
distribution. Find the likelihood function, the maximum likelihood estimate of and
the maximum likelihood estimator of . If n = 20 and x20 = 33, …nd a 15% likelihood
interval for .
3. Suppose x1; x2; : : : ; xn is an observed random sample from the Geometric() distrib-
ution.
(a) Find the score function and the maximum likelihood estimator of .
(b) Find the observed information and the expected information.
(c) Find the maximum likelihood estimator of E(Xi).
(d) If n = 20 and
20P
i=1
xi = 40 then …nd the maximum likelihood estimate of and a
15% likelihood interval for . Is = 0:5 a plausible value of ? Why?
4. Suppose (X1; X2; X3) Multinomial(n; 2; 2 (1 ) ; (1 )2). Find the maximum
likelihood estimator of , the observed information and the expected information.
5. Suppose x1; x2; : : : ; xn is an observed random sample from the Pareto(1; ) distribu-
tion.
(a) Find the score function and the maximum likelihood estimator of .
(b) Find the observed information and the expected information.
(c) Find the maximum likelihood estimator of E(Xi).
(d) If n = 20 and
20P
i=1
log xi = 10 …nd the maximum likelihood estimate of and a
15% likelihood interval for . Is = 0:1 a plausible value of ? Why?
(e) Show that
Q(X; ) = 2
nP
i=1
logXi
is a pivotal quantity. (Hint: What is the distribution of logX ifX Pareto(1; )?)
Use this pivotal quantity to determine a 95% equal tail con…dence interval for
the data in (d). Compare this interval with the 15% likelihood interval.
6.8. CHAPTER 6 PROBLEMS 225
6. The following model is proposed for the distribution of family size in a large popula-
tion:
P (k children in family; ) = k for k = 1; 2; : : :
P (0 children in family; ) =
1 2
1
The parameter is unknown and 0 < < 12 . Fifty families were chosen at random
from the population. The observed numbers of children are given in the following
table:
No. of children 0 1 2 3 4 Total
Frequency observed 17 22 7 3 1 50
(a) Find the likelihood, log likelihood, score and information functions for .
(b) Find the maximum likelihood estimate of and the observed information.
(c) Find a 15% likelihood interval for .
(d) A large study done 20 years earlier indicated that = 0:45. Is this value plausible
for these data?
(e) Calculate estimated expected frequencies. Does the model give a reasonable …t
to the data?
7. Suppose x1; x2; : : : ; xn is a observed random sample from the Two Parameter
Exponential(; 1) distribution. Show that is a location parameter and ~ = X(1) is
the maximum likelihood estimator of . Show that
P (~ q) = 1 enq for q 0
and thus show that
^ +
1
n
log (1 p) ; ^

and
^ +
1
n
log

1 p
2

; ^ +
1
n
log

1 + p
2

are both 100p% con…dence intervals for . Which con…dence interval seems more
reasonable?
8. Suppose X1; X2; : : : ; Xn is a random sample from the Gamma

1
2 ;
1


distribution.
(a) Find ~n, the maximum likelihood estimator of .
(b) Justify the statement ~n !p .
(c) Find the maximum likelihood estimator of V ar(Xi).
226 6. MAXIMUM LIKELIHOOD ESTIMATION - ONE PARAMETER
(d) Use moment generating functions to show that Q = 2
nP
i=1
Xi 2(n). If n = 20
and
20P
i=1
xi = 6, use the pivotal quantity Q to construct an exact 95% equal tail
con…dence interval for . Is = 0:7 a plausible value of ?
(e) Verify that (~n )
h
J(~n)
i1=2 !D Z N(0; 1). Use this asymptotic pivotal
quantity to construct an approximate 95% con…dence interval for . Compare
this interval with the exact con…dence interval from (d) and a 15% likelihood
interval for . What do the approximate con…dence interval and the likelihood
interval indicate about the plausibility of the value = 0:7?
9. The number of coliform bacteria X in a 10 cubic centimeter sample of water from a
section of lake near a beach has a Poisson() distribution.
(a) If a random sample of n specimen samples is taken and X1; X2; : : : ; Xn are
the respective numbers of observed bacteria, …nd the likelihood function, the
maximum likelihood estimator and the expected information for :
(b) If n = 20 and
20P
i=1
xi = 40, obtain an approximate 95% con…dence interval for
and a 15% likelihood interval for . Compare the intervals.
(c) Suppose there is a fast, simple test which can detect whether there are bacteria
present, but not the exact number. If Y is the number of samples out of n which
have bacteria, show that
P (Y = y) =

n
y

(1 e)y(e)ny for y = 0; 1; : : : ; n
(d) If n = 20 and we found y = 17 of the samples contained bacteria, use the
likelihood function from part (c) to get an approximate 95% con…dence interval
for . Hint: Let = 1 e and use the likelihood function for to get an
approximate con…dence interval for and then transform this to an approximate
con…dence interval for = log(1 ).
7. Maximum Likelihood
Estimation - Multiparameter
In this chapter we look at the method of maximum likelihood to obtain both point and
interval estimates for the case in which the unknown parameter is a vector of unknown
parameters = (1; 2; : : : ; k). In your previous statistics course you would have seen the
N

; 2

model with two unknown parameters =

; 2

and the simple linear regression
model N

+ x; 2

with three unknown parameters =

; ; 2

.
Although the case of k parameters is a natural extension of the one parameter case, the
k parameter case is more challenging. For example, the maximum likelihood estimates are
usually found be solving k nonlinear equations in k unknowns 1; 2; : : : ; k. In most cases
there are no explicit solutions and the maximum likelihood estimates must be found using a
numerical method such as Newton’s Method. Another challenging issue is how to summarize
the uncertainty in the k estimates. For one parameter it is straightforward to summarize
the uncertainty using a likelihood interval or a con…dence interval. For k parameters these
intervals become regions in In Section 7:1 we give all the de…nitions related to …nding the maximum likelihood
estimates for k unknown parameters. These de…nitions are analogous to the de…nitions
which were given in Chapter 6 for one unknown parameter. We also give the extension
of the invariance property of maximum likelihood estimates and Newton’s Method for k
variables. In Section 7:2 we de…ne likelihood regions and show how to …nd them for the
case k = 2.
In Section 7:3 we introduce the Multivariate Normal distribution which is the natural
extension of the Bivariate Normal distribution discussed in Section 3.10. We also give the
limiting distribution of the maximum likelihood estimator of which is a natural extension
of Theorem 6.5.1.
In Section 7:4 we show how to obtain approximate con…dence regions for based on
the limiting distribution of the maximum likelihood estimator of and show how to …nd
them for the case k = 2. We also show how to …nd approximate con…dence intervals for
individual parameters and indicate that these intervals must be used with care.
227
228 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
7.1 Likelihood and Related Functions
7.1.1 De…nition - Likelihood Function
Suppose X is a (vector) random variable with probability (density) function f(x;), where
= (1; 2; : : : ; k) 2
and
is the parameter space or set of possible values of . Suppose
also that x is an observed value of X. The likelihood function for based on the observed
data x is
L() = L(;x) = f(x;) for 2
(7.1)
If X = (X1; X2; : : : ; Xn) is a random sample from a distribution with probability function
f(x;) and x = (x1; x2; : : : ; xn) are the observed data then the likelihood function for
based on the observed data x1; x2; : : : ; xn is
L() = L (;x)
=
nQ
i=1
f(xi;) for 2

Note: If X is a discrete random variable then L() = P (observing the data x;). If X is
a continuous random variable then an argument similar to the one in 6.2.6 can be made to
justify the use of (7.1).
7.1.2 De…nition - Maximum Likelihood Estimate and Estimator
The value of that maximizes the likelihood function L() is called the maximum likelihood
estimate. The maximum likelihood estimate is a function of the observed data x and we
write ^ = ^ (x). The correspondingmaximum likelihood estimator which is a random vector
is denoted by ~ = ~(X).
As in the case of k = 1 it is frequently easier to work with the log likelihood function which
is maximized at the same value of as the likelihood function.
7.1.3 De…nition - Log Likelihood Function
The log likelihood function is de…ned as
l() = l (;x) = logL() for 2

where x are the observed data and log is the natural logarithmic function.
The maximum likelihood estimate of , ^ = (^1; ^2; : : : ; ^k) is usually found by solving
@l()
@j
= 0; j = 1; 2; : : : ; k simultaneously. See Chapter 7, Problem 1 for an example in which
the maximum likelihood estimate is not found in this way.
7.1. LIKELIHOOD AND RELATED FUNCTIONS 229
7.1.4 De…nition - Score Vector
If = (1; 2; : : : ; k) then the score vector (function) is a 1 k vector of functions de…ned
as
S() = S(;x) =

@l()
@1
;
@l()
@2
; : : : ;
@l()
@k

for 2

where x are the observed data.
We will see that, as in the case of one parameter, the information matrice will provides
information about the variance of the maximum likelihood estimator.
7.1.5 De…nition - Information Matrix
If = (1; 2; : : : ; k) then the information matrix (function) I() = I (;x) is a k k
symmetric matrix of functions whose (i; j) entry is given by
@
2l()
@i@j
for 2

where x are the observed data. I(^) is called the observed information matrix.
7.1.6 De…nition - Expected Information Matrix
If = (1; 2; : : : ; k) then the expected information matrix (function) J() is a k k
symmetric matrix of functions whose (i; j) entry is given by
E

@
2l(;X)
@i@j

for 2

The invariance property of the maximum likelihood estimator also holds in the multipara-
meter case.
7.1.7 Theorem - Invariance of the Maximum Likelihood Estimate
If ^ = (^1; ^2; : : : ; ^k) is the maximum likelihood estimate of = (1; 2; : : : ; k) then g(^)
is the maximum likelihood estimate of g ().
7.1.8 Example
Suppose x1; x2; : : : ; xn is a observed random sample from the N(; 2) distribution. Find the
score vector, the information matrix, the expected information matrix and the maximum
likelihood estimator of =

; 2

. Find the observed information matrix I

^; ^2

and thus
verify that

^; ^2

is the maximum likelihood estimator of

; 2

. What is the maximum
likelihood estimator of the parameter =

; 2

= = which is called the coe¢ cient of
variation?
230 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Solution
The likelihood function is
L

; 2

=
nQ
i=1
1p
2
exp
1
22
(xi )2

= (2)n=2

2
n=2
exp
1
22
nP
i=1
(xi )2

for 2 <, 2 > 0
or more simply
L

; 2

=

2
n=2
exp
1
22
nP
i=1
(xi )2

for 2 <, 2 > 0
The log likelihood function is
l

; 2

= n
2
log

2
1
22
nP
i=1
(xi )2
= n
2
log

2
1
2

2
1 nP
i=1
(xi x)2 + n (x )2

= n
2
log

2
1
2

2
1 h
(n 1) s2 + n (x )2
i
for 2 <, 2 > 0
where
s2 =
1
n 1
nP
i=1
(xi x)2
Now
@l
@
=
n
2
(x ) = n 21 (x )
and
@l
@2
= n
2

2
1
+
1
2

2
2 h
(n 1) s2 + n (x )2
i
The equations
@l
@
= 0;
@l
@2
= 0
are solved simultaneously for
= x and 2 =
1
n
nP
i=1
(xi x)2 = (n 1)
n
s2
Since
@
2l
@2
=
n
2
; @
2l
@2@
=
n (x )
4
@
2l
@ (2)2
= n
2
1
4
+
1
6
h
(n 1) s2 + n (x )2
i
7.1. LIKELIHOOD AND RELATED FUNCTIONS 231
the information matrix is
I

; 2

=
24 n2 n(x)4
n(x)
4
n2 14 + 16
h
(n 1) s2 + n (x )2
i 35
Since
I11

^; ^2

=
n
^2
> 0 and det I

^; ^2

=
n2
2^6
> 0
then by the second derivative test the maximum likelihood estimates of of and 2 are
^ = x and ^2 =
1
n
nP
i=1
(xi x)2 = (n 1)
n
s2
and the maximum likelihood estimators are
~ = X and ~2 =
1
n
nP
i=1

Xi X
2
=
(n 1)
n
S2
The observed information is
I

^; ^2

=
24 n^2 0
0 n
2^4
35
Now
E
n
2

=
n
2
; E
"
n

X
4
#
= 0
Also
E

n
2
1
4
+
1
6
h
(n 1)S2 + n X 2i
= n
2
1
4
+
1
6
n
(n 1)E(S2) + nE
h
X 2io
= n
2
1
4
+
1
6

(n 1)2 + 2
=
n
24
since
E

X = 0; E h X 2i = V ar X = 2
n
and E

S2

= 2
Therefore the expected information matrix is
J

; 2

=
"
n
2
0
0 n
24
#
and the inverse of the expected information matrix is

J

; 2
1
=
24 2n 0
0 2
4
n
35
232 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Note that
V ar

X

=
2
n
V ar

^2

= V ar

1
n
nP
i=1

Xi X
2
=
2(n 1)4
n2
2
4
n
and
Cov( X; ^2) =
1
n
Cov( X;
nP
i=1

Xi X
2
) = 0
since X and
nP
i=1

Xi X
2 are independent random variables.
By the invariance property of maximum likelihood estimators the maximum likelihood
estimator of = = is ^ = ^=^.
Recall from your previous statistics course that inferences for and 2 are made using
the independent pivotal quantities
X
S=
p
n
t (n 1) and (n 1)S
2
2
2 (n 1)
See Figure 7.1 for a graph of R

; 2

for n = 50; ^ = 5 and ^2 = 4.
4
4.5
5
5.5
6
0
2
4
6
8
0
0.2
0.4
0.6
0.8
1
ms
2
Figure 7.1: Normal Relative Likelihood Function for n = 50; ^ = 5 and ^2 = 4
7.1. LIKELIHOOD AND RELATED FUNCTIONS 233
7.1.9 Exercise
Suppose Yi N(+xi; 2), i = 1; 2; : : : ; n independently where the xi are known constants.
Show that the maximum likelihood estimators of , and 2 are given by
~ = Y ^x
~ =
nP
i=1
(xi x)

Yi Y

nP
i=1
(xi x)2
~2 =
1
n
nP
i=1
(Yi ~ ~xi)2
Note: ~ and ~ are also the least squares estimators of and .
7.1.10 Example
Suppose x1; x2; : : : ; xn is an observed random sample from the Beta(a; b) distribution with
probability density function
f (x; a; b) =
(a+ b)
(a)(b)
xa1 (1 x)b1 for 0 < x < 1, a > 0; b > 0
Find the likelihood function, the score vector, and the information matrix and the expected
information matrix. How would you …nd the maximum likelihood estimates of a and b?
Solution
The likelihood function is
L(a; b) =
nQ
i=1
(a+ b)
(a)(b)
xa1i (1 xi)b1 for a > 0; b > 0
=

(a+ b)
(a)(b)
n nQ
i=1
xi
a1 nQ
i=1
(1 xi)
b1
or more simply
L(a; b) =

(a+ b)
(a)(b)
n nQ
i=1
xi
a nQ
i=1
(1 xi)
b
for a > 0; b > 0
The log likelihood function is
l(a; b) = n [log (a+ b) log (a) log (b) + at1 + bt2] for a > 0; b > 0
where
t1 =
1
n
nP
i=1
log xi and t2 =
1
n
nP
i=1
log(1 xi)
234 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
Let
(z) =
d log (z)
dz
=
0 (z)
(z)
which is called the digamma function.
The score vector is
S (a; b) =
h
@l
@a
@l
@b
i
= n
h
(a+ b) (a) + t1 (a+ b) (b) + t2
i
for a > 0; b > 0. S (a; b) = (0; 0)must be solved numerically to …nd the maximum likelihood
estimates of a and b.
Let
0 (z) =
d
dz
(z)
which is called the trigamma function.
The information matrix is
I(a; b) = n
"
0 (a) 0 (a+ b) 0 (a+ b)
0 (a+ b) 0 (b) 0 (a+ b)
#
for a > 0; b > 0, which is also expected information matrix.
7.1.11 Exercise
Suppose x1; x2; : : : ; xn is an observed random sample from the Gamma(; ) distribution.
Find the likelihood function, the score vector, and the information matrix and the expected
information matrix. How would you …nd the maximum likelihood estimates of and ?
Often S(1; 2; : : : ; k) = (0; 0; : : : ; 0) must be solved numerically using a method such as
Newton’s Method.
7.1.12 Newton’s Method
Let (0) be an initial estimate of = (1; 2; : : : ; k). The estimate (i) can be updated
using
(i+1) = (i) +
h
S((i))
i h
I((i))
i1
for i = 0; 1; : : :
Note: The initial estimate, (0), may be determined by calculating L () for a grid of
values to determine the region in which L () obtains a maximum.
7.1. LIKELIHOOD AND RELATED FUNCTIONS 235
7.1.13 Example
Use the following R code to randomly generate 35 observations from a Beta(a; b) distribution
# randomly generate 35 observations from a Beta(a,b)
set.seed(32086689) # set the seed so results can be reproduced
# use randomly generated a and b values
truea<-runif(1,min=2,max=3)
trueb<-runif(1,min=1,max=4)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round(rbeta(35,truea,trueb),2))
x
Use Newton’s Method and R to …nd (a^; b^).
What are the values of S(a^; b^) and I(a^; b^)?
Solution
The generated data are
0:08 0:19 0:21 0:25 0:28 0:29 0:29 0:30 0:30 0:32
0:34 0:36 0:39 0:45 0:45 0:47 0:48 0:49 0:54 0:54
0:55 0:55 0:56 0:56 0:61 0:63 0:64 0:65 0:69 0:69
0:73 0:77 0:79 0:81 0:85
The maximum likelihood estimates of a and b can be found using Newton’s Method given
by "
a(i+1)
b(i+1)
#
=
"
a(i)
b(i)
#
+ S(a(i); b(i))
h
I(a(i); b(i))
i1
for i = 0; 1; ::: until convergence.
Here is R code for Newton’s Method for the Beta Example.
# function for calculating Beta score for a and b and data x
BESF<-function(a,b,x)
{S<-length(x)*c(digamma(a+b)-digamma(a)+mean(log(x)),
digamma(a+b)-digamma(b)+mean(log(1-x))))
return(S)}
#)
# function for calculating Beta information for a and b)
BEIF<-function(a,b)
{I<-length(x)*cbind(c(trigamma(a)-trigamma(a+b),-trigamma(a+b)),
c(-trigamma(a+b),trigamma(b)-trigamma(a+b))))
return(I)}
236 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
# Newton’s Method for Beta Example
NewtonBE<-function(a,b,x)
{thold<-c(a,b)
thnew<-thold+0.1
while (sum(abs(thold-thnew))>0.0000001)
{thold<-thnew
thnew<-thold+BESF(thold[1],thold[2],x)%*%solve(BEIF(thold[1],thold[2]))
print(thnew)}
return(thnew)}
thetahat<-NewtonBE(2,2,x)
The maximum likelihood estimates are a^ = 2:824775 and b^ = 2:97317. The score vector
evaluated at (a^; b^) is
S(a^; b^) =
h
3:108624 1014 7:771561 1015
i
which indicates we have obtained a local extrema. The observed information matrix is
I(a^; b^) =
"
8:249382 6:586959
6:586959 7:381967
#
Note that since
det[I(a^; b^)] = (8:249382) (7:381967) (6:586959)2 > 0
and
[I(a^; b^)]11 = 8:249382 > 0
then by the second derivative test we have found the maximum likelihood estimates.
7.1.14 Exercise
Use the following R code to randomly generate 30 observations from a Gamma(; ) dis-
tribution
# randomly generate 35 observations from a Gamma(a,b)
set.seed(32067489) # set the seed so results can be reproduced
# use randomly generated a and b values
truea<-runif(1,min=1,max=3)
trueb<-runif(1,min=3,max=5)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round(rgamma(30,truea,scale=trueb),2))
x
Use Newton’s Method and R to …nd (^; ^). What are the values of S(^; ^) and I(^; ^)?
7.2. LIKELIHOOD REGIONS 237
7.2 Likelihood Regions
For one unknown parameter likelihood intervals provide a way of summarizing the uncer-
tainty in the maximum likelihood estimate by providing an intervals of values which are
plausible given the observed data. For k unknown parameter summarizing the uncertainty
is more challenging. We begin with the de…nition of a likelihood region which is the natural
extension of a likelihood interval.
7.2.1 De…nition - Likelihood Regions
The set of values for which R() p is called a 100p% likelihood region for .
A 100p% likelihood region for two unknown parameters = (1; 2) is given by
f(1; 2) ; R (1; 2) pg. These regions will be elliptical in shape. To show this we note
that for (1; 2) su¢ ciently close to (^1; ^2) we have
L (1; 2) L(^1; ^2) + (^1; ^2)
"
^1 1
^2 2
#
+
1
2
h
^1 1 ^2 2
i
I(^1; ^2)
"
^1 1
^2 2
#
= L(^1; ^2) +
1
2
h
^1 1 ^2 2
i
I(^1; ^2)
"
^1 1
^2 2
#
since S(^1; ^2) = 0
Therefore
R (1; 2) =
L (1; 2)
L(^1; ^2)
1 1
2L(^1; ^2)
h
^1 1 ^2 2
i " I^11 I^12
I^12 I^22
#"
^1 1
^2 2
#
= 1
h
2L(^1; ^2)
i1 h
(1 ^1)2I^11 + 2

1 ^1

(2 ^2)I^12 + (2 ^2)2I^22
i
The set of points (1; 2) which satisfy R (1; 2) = p is approximately the set of points
(1; 2) which satisfy
(1 ^1)2I^11 + 2(1 ^1)(2 ^2)I^12 + (2 ^2)2I^22 = 2 (1 p)L(^1; ^2)
which we recognize as the points on an ellipse centred at (^1; ^2). Therefore a 100p%
likelihood region for two unknown parameters = (1; 2) will be the set of points on and
inside a region which will be approximately elliptical in shape.
A similar argument can be made to show that the likelihood regions for three unknown
parameters = (1; 2; 3) will be approximate ellipsoids in <3.
238 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
7.2.2 Example
(a) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for the parameters (a; b)
in Example 7.1.13. Comment on the shapes of the regions.
(b) Is the value (2:5; 3:5) a plausible value of (a; b)?
Solution
(a) The following R code generates the required likelihood regions.
# function for calculating Beta relative likelihood function
# for parameters a,b and data x
BERLF<-function(a,b,that,x)
{t1<-prod(x)
t2<-prod(1-x)
n<-length(x)
ah<-that[1]
bh<-that[2]
L<-<-((gamma(a+b)*gamma(ah)*gamma(bh))/
(gamma(a)*gamma(b)*gamma(ah+bh)))^n*t1^(a-ah)*t2^(b-bh)
return(L)}
#
a<-seq(0.5,5.5,0.01)
b<-seq(0.5,6,0.01)
R<-outer(a,b,FUN = BERLF,thetahat,x)
contour(a,b,R,levels=c(0.01,0.05,0.10,0.50,0.9),xlab="a",ylab="b",lwd=2)
The 1%, 5%, 10%, 50%, and 90% likelihood regions for (a; b) are shown in Figure 7.2.
The likelihood contours are approximate ellipses but they are not symmetric about
the maximum likelihood estimates (a^; b^) = (2:824775; 2:97317). The likelihood regions are
more stretched for larger values of a and b. The ellipses are also skewed relative to the ab
coordinate axes. The skewness of the likelihood contours relative to the ab coordinate axes
is determined by the value of I^12. If the value of I^12 is close to zero the skewness will be
small.
(b) Since R (2:5; 3:5) = 0:082 the point (2:5; 3:5) lies outside a 10% likelihood region so it
is not a very plausible value of (a; b).
Note however that a = 2:5 is a plausible value of a for some values of b, for example,
a = 2:5, b = 2:5 lies inside a 50% likelihood region so (2:5; 2:5) is a plausible value of (a; b).
We see that when there is more than one parameter then we need to determine whether a
set of values are jointly plausible given the observed data.
7.2. LIKELIHOOD REGIONS 239
a
b
1 2 3 4 5
1
2
3
4
5
6
Figure 7.2: Likelihood regions for Beta example
7.2.3 Exercise
(a) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for the parameters (; )
in Exercise 7.1.14. Comment on the shapes of the regions.
(b) Is the value (3; 2:7) a plausible value of (; )?
(c) Use the R code in Exercise 7.1.14 to generate 100 observations from the Gamma(; )
distribution.
(d) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for (; ) for the data
generated in (c). Comment on the shapes of these regions as compared to the regions in
(a).
240 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
7.3 Limiting Distribution of Maximum Likelihood Estimator
To discuss the asymptotic properties of the maximum likelihood estimator in the multipa-
rameter case we need to de…ne convergence in probability and convergence in distribution
of a sequence of random vectors.
7.3.1 De…nition - Convergence of a Sequence of Random Vectors
Let X1;X2; : : : ;Xn; : : : be a sequence of random vectors where Xn = (X1n; X2n; : : : ; Xkn).
Let X = (X1; X2; : : : ; Xk) be a random vector and x = (x1; x2; : : : ; xk).
(1) If Xin !p Xi for i = 1; 2; : : : ; k, then
Xn !p X
(2) Let Fn (x) = P (X1n x1; X2n x2; : : : ; Xkn xk) be the cumulative distribution
function of Xn. Let F (x) = P (X1 x1; X2 x2; : : : ; Xk xk) be the cumulative dis-
tribution function of X. If
lim
n!1Fn (x) = F (x)
at all points of continuity of F (x) then
Xn !D X = (X1; X2; : : : ; Xk)
To discuss the asymptotic properties of the maximum likelihood estimator in the multipa-
rameter case we also need the de…nition and properties of the Multivariate Normal Dis-
tribution. The Multivariate Normal distribution is the natural extension of the Bivariate
Normal distribution which was discussed in Section 3.10.
7.3.2 De…nition - Multivariate Normal Distribution (MVN)
LetX = (X1; X2; : : : ; Xk) be a 1k random vector with E(Xi) = i and Cov(Xi; Xj) = ij ;
i; j = 1; 2; : : : ; k. (Note: Cov(Xi; Xi) = ii = V ar(Xi) = 2i .) Let = (1; 2; : : : ; k)
be the mean vector and be the k k symmetric covariance matrix whose (i; j) entry is
ij . Suppose also that the inverse matrix of , 1, exists. If the joint probability density
function of (X1; X2; : : : ; Xk) is given by
f(x1; x2; : : : ; xk) =
1
(2)k=2jj1=2 exp

1
2
(x )1(x )T

for x 2 where x = (x1; x2; : : : ; xk) then X is said to have a Multivariate Normal distribution. We
write X MVN(;).
The following theorem gives some important properties of the Multivariate Normal distri-
bution. These properties are a natural extension of the properties of the Bivariate Normal
distribution found in Theorem 3.10.2.
7.3. LIMITING DISTRIBUTION OF MAXIMUM LIKELIHOOD ESTIMATOR 241
7.3.3 Theorem - Properties of MVN Distribution
Suppose X = (X1; X2; : : : ; Xk) MVN(;). Then
(1) X has joint moment generating function
M (t) = exp

tT +
1
2
ttT

for t = (t1; t2; : : : ; tk) 2 (2) Any subset of X1; X2; : : : ; Xk also has a MVN distribution and in particular
Xi N

i;
2
i

; i = 1; 2; : : : ; k.
(3)
(X )1(X )T 2(k)
(4) Let c = (c1; c2; : : : ; ck) be a nonzero vector of constants then
XcT =
kP
i=1
ciXi N

cT ; ccT

(5) Let A be a k p vector of constants of rank p k then
XA N(A; ATA)
(6) The conditional distribution of any subset of (X1; X2; : : : ; Xk) given the rest of the co-
ordinates is a MVN distribution. In particular the conditional probability density function
of Xi given Xj = xj ; i 6= j; is
XijXj = xj N(i + iji(xj j)=j ; (1 2ij)2i )
The following theorem gives the asymptotic distribution of the maximum likelihood esti-
mator in the multiparameter case. This theorem looks very similar to Theorem 6.5.1 with
the scalar quantities replaced by the appropriate vectors and matrices.
7.3.4 Theorem - Limiting Distribution of the Maximum Likelihood Esti-
mator
SupposeXn = (X1; X2; : : : ; Xn) is a random sample from f(x;) where = (1; 2; : : : ; k) 2

and the dimension of
is k. Let ~n = ~n (X1; X2; : : : ; Xn) be the maximum likelihood
estimator of based on Xn. Let 0k be a 1 k vector of zeros, Ik be the k k identity
matrix, and [J()]1=2 be a matrix such that [J()]1=2 [J()]1=2 = J(). Then under certain
(regularity) conditions
~n !p (7.2)
(~n ) [J()]1=2 !D Z MVN(0k; Ik) (7.3)
2 logR(;Xn) = 2[l(~n;Xn) l(;Xn)]!D W 2(k) (7.4)
242 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
for each 2
.
Since ~n !p , ~n is a consistent estimator of .
Theorem 7.3.4 implies that for su¢ ciently large n, ~n has an approximately
MVN(; [J(0)]1) distribution. Therefore for su¢ ciently large n
E(~n)
and therefore ~n is an asymptotically unbiased estimator of . Also
V ar(~n) [J()]1
where [J()]1 is the inverse matrix of of the matrix J(). (Since J() is a k k sym-
metric matrix, [J()]1 also a k k symmetric matrix.) [J()]1 is called the asymptotic
variance/covariance matrix of ~n. Of course J() is unknown because is unknown. But
(7.2), (7.3) and Slutsky’s Theorem imply that
(~n )[J(~n)]1=2 !D Z MVN(0k; Ik)
Therefore for su¢ ciently large n we have
V ar(~n)
h
J(^n)
i1
where
h
J(^n)
i1
is the inverse matrix of J(^n).
It is also possible to show that
(~n )[I(~n;X)]1=2 !D MVN(0k; Ik)
so that for su¢ ciently large n we also have
V ar(~n)
h
I(^n)
i1
where
h
I(^n)
i1
is the inverse matrix of the observed information matrix I(^n).
These results can be used to construct approximate con…dence regions for as shown
in the next section.
Note: These results do not hold if the support set of X depends on .
7.4 Approximate Con…dence Regions
7.4.1 De…nition - Con…dence Region
A 100p% con…dence region for = (1; 2; : : : ; k) based on X is a region R(X) satis…es
P [ 2 R(X)] = p
Exact con…dence regions can only be obtained in a very few special cases such as Normal
linear models. More generally we must rely on approximate con…dence regions based on
the results of Theorem 7.3.3.
7.4. APPROXIMATE CONFIDENCE REGIONS 243
7.4.2 Asymptotic Pivotal Quantities and Approximate Con…dence Re-
gions
The limiting distribution of ~n can be used to obtain approximate con…dence regions for
. Since
(~n )[J()]1=2 !D Z MVN(0k; Ik)
it follows from Theorem 7.3.3(3) and Limit Theorems that
(~n )J(~n)(~n )T !D W 2(k)
An approximate 100p% con…dence region for based on the asymptotic pivotal quantity
(~n )J(~n)(~n )T is the set of all vectors in the set
f : (^n )J(^n)(^n )T cg
where c is the value such that P (W c) = p and W 2(k).
Similarly since
(~n )[I(~n;Xn)]1=2 !D Z MVN(0k; Ik)
it follows from Theorem 7.3.3(3) and Limit Theorems that
(~n )I(~n;Xn)(~n )T !D W 2(k)
An approximate 100p% con…dence region for based on the asymptotic pivotal quantity
(~n )I(~n;Xn)(~n )T is the set of all vectors in the set
f : (^n )I(^n)(^n )T cg
where I(^n) is the observed information.
Finally since
2 logR(;Xn)!D W 2(k)
an approximate 100p% con…dence region for based on this asymptotic pivotal quantity is
the set of all vectors satisfying
f : 2 logR(;x) cg
where x = (x1; x2; : : : ; xn) are the observed data. Since
f : 2 logR(;x) cg
=
n
: R(;x) ec=2
o
we recognize that this interval is actually a likelihood region.
244 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
7.4.3 Example
Use R and the results from Examples 7.1.10 and 7.1.13 to graph approximate 90%, 95%,
and 99% con…dence regions for (a; b). Compare these approximate con…dence regions with
the likelihood regions in Example 7.2.2.
Solution
From Example 7.1.10 that for a random sample from the Beta(a; b) distribution the infor-
mation matrix and the expected information matrix are given by
I(a; b) = n
"
0 (a) 0 (a+ b) 0 (a+ b)
0 (a+ b) 0 (b) 0 (a+ b)
#
= J(a; b)
Since h
~a a ~b b
i
J(~a;~b)
"
~a a
~b b
#
!D W 2 (2)
an approximate 100p% con…dence region for (a; b) is given by
f(a; b) :
h
a^ a b^ b
i
J(a^; b^)
"
a^ a
b^ b
#
cg
where P (W c) = p. Since 2 (2) = Gamma(1; 2) = Exponential(2), c can be determined
using
p = P (W c) =
cZ
0
1
2
ex=2dx = 1 ec=2
which gives
c = 2 log(1 p)
For p = 0:95, c = 2 log(0:05) = 5:99; an approximate 95% con…dence region is given by
f(a; b) :
h
a^ a b^ b
i
J(a^; b^)
"
a^ a
b^ b
#
5:99g
If we let
J(a^; b^) =
"
J^11 J^12
J^12 J^22
#
then the approximate con…dence region can be written as
f(a; b) : (a^ a)2 J^11 + 2 (a^ a) (b^ b)J^12 + (b^ b)2J^22 5:99g
We note that the approximate con…dence region is the set of points on and inside the ellipse
(a^ a)2 J^11 + 2 (a^ a) (b^ b)J^12 + (b^ b)2J^22 = 5:99
7.4. APPROXIMATE CONFIDENCE REGIONS 245
which is centred at (a^; b^).
For the data in Example 7.1.13, a^ = 2:824775, b^ = 2:97317 and
I(a^; b^) = J(a^; b^) =
"
8:249382 6:586959
6:586959 7:381967
#
Approximate 90% (2 log(0:1) = 4:61), 95% (2 log(0:05) = 5:99), and 99% (2 log(0:01) = 9:21)
con…dence regions are shown in Figure 7.3.
The following R code generates the required approximate con…dence regions.
# function for calculating values for determining confidence regions
ConfRegion<-function(a,b,th,info)
{c<-(th[1]-a)^2*info[1,1]+2*(th[1]-a)*
(th[2]-b)*info[1,2]+(th[2]-b)^2*info[2,2]
return(c)}
#
# graph approximate confidence regions
a<-seq(1,5.5,0.01)
b<-seq(1,6,0.01)
c<-outer(a,b,FUN = ConfRegion,thetahat,Ithetahat)
contour(a,b,c,levels=c(4.61,5.99,9.21),xlab="a",ylab="b",lwd=2)
#
a
b
1 2 3 4 5
1
2
3
4
5
6
Figure 7.3: Approximate con…dence regions for Beta(a; b) example
246 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
A 10% likelihood region for (a; b) is given by f(a; b) : R(a; b;x) 0:1g. Since
2 logR(a; b;Xn)!D W 2 (2) = Exponential (2)
we have
P [R(a; b;X) 0:1] = P [2 logR(a; b;X) 2 log (0:1)]
P (W 2 log (0:1))
= 1 e[2 log(0:1)]=2
= 1 0:1 = 0:9
and therefore a 10% likelihood region corresponds to an approximate 90% con…dence region.
Similarly 1% and 5% likelihood regions correspond to approximate 99% and 95% con…dence
regions respectively.
If we compare the likelihood regions in Figure 7.2 with the approximate con…dence
regions shown in Figure 7.3 we notice that the con…dence regions are exact ellipses centred
at the maximum likelihood estimates whereas the likelihood regions are only approximate
ellipses not centered at the maximum likelihood estimates. We notice that there are values
inside an approximate 99% con…dence region but which are outside a 1% likelihood region.
The point (a; b) = (1; 1:5) is an example. There were only 35 observations in this data
set. The di¤erences between the likelihood regions and the approximate con…dence regions
indicate that the Normal approximation might not be good. In this example the likelihood
regions provide a better summary of the uncertainty in the estimates.
7.4.4 Exercise
Use R and the results from Exercises 7.1.11 and 7.1.14 to graph approximate 90%, 95%,
and 99% con…dence regions for (a; b). Compare these approximate con…dence regions with
the likelihood regions in Exercise 7.2.3.
Since likelihood regions and approximate con…dence regions cannot be graphed or easily
interpreted for more than two parameters, we often construct approximate con…dence inter-
vals for individual parameters. Such con…dence intervals are often referred to as marginal
con…dence intervals. These con…dence intervals must be used with care as we will see in
Example 7.4.6.
Approximate con…dence intervals can also be constructed for a linear combination of para-
meters. An illustration is given in Example 7.4.6.
7.4. APPROXIMATE CONFIDENCE REGIONS 247
7.4.5 Approximate Marginal Con…dence Intervals
Let i be the ith entry in the vector = (1; 2; : : : ; k). Since
(~n )[J()]1=2 !D Z MVN(0k; Ik)
it follows that an approximate 100p% marginal con…dence interval for i is given by
[^i a
p
v^ii; ^i + a
p
v^ii]
where ^i is the ith entry in the vector ^n, v^ii is the (i; i) entry of the matrix [J(^n)]1, and
a is the value such that P (Z a) = 1+p2 where Z N(0; 1).
Similarly since
(~n )[I(~n;Xn)]1=2 !D Z MVN(0k; Ik)
it follows that an approximate 100p% con…dence interval for i is given by
[^i a
p
v^ii; ^i + a
p
v^ii]
where v^ii is now the (i; i) entry of the matrix [I(^n)]1.
7.4.6 Example
Using the results from Examples 7.1.10 and 7.1.13 determine approximate 95% marginal
con…dence intervals for a, b, and an approximate con…dence interval for a+ b.
Solution
Let h
J(a^; b^)
i1
=
"
v^11 v^12
v^12 v^22
#
Since h
~a a ~b b
i
[J(~a;~b)]1=2 !D Z BVN
h
0 0
i
;
"
1 0
0 1
#!
then for large n, V ar(~a) v^11, V ar(~b) v^22 and Cov(~a;~b) v^12. Therefore an approxi-
mate 95% con…dence interval for a is given by
[a^ 1:96
p
v^11; a^+ 1:96
p
v^11]
and an approximate 95% con…dence interval for b is given by
[b^ 1:96
p
v^22; b^+ 1:96
p
v^22]
248 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
For the data in Example 7.1.13, a^ = 2:824775, b^ = 2:97317 andh
I(a^; b^)
i1
=
h
J(a^; b^)
i1
=
"
8:249382 6:586959
6:586959 7:381967
#1
=
"
0:4216186 0:3762120
0:3762120 0:4711608
#
An approximate 95% marginal con…dence interval for a is
[2:824775 + 1:96
p
0:4216186; 2:824775 1:96
p
0:4216186] = [1:998403; 3:651148]
and an approximate 95% con…dence interval for b is
[2:97317 1:96
p
0:4711608; 2:97317 + 1:96
p
0:4711608] = [2:049695; 3:896645]
Note that a = 2:1 is in the approximate 95% marginal con…dence interval for a and b = 3:8
is in the approximate 95% marginal con…dence interval for b and yet the point (2:1; 3:8)
is not in the approximate 95% joint con…dence region for (a; b). Clearly these marginal
con…dence intervals for a and b must be used with care.
To obtain an approximate 95% marginal con…dence interval for a+ b we note that
V ar(~a+ ~b) = V ar(~a) + V ar(~b) + 2Cov(~a;~b)
v^11 + v^22 + 2v^12 = v^
so that an approximate 95% con…dence interval for a+ b is given by
[a^+ b^ 1:96
p
v^; a^+ b^+ 1:96
p
v^]
For the data in Example 7.1.13
a^+ b^ = 2:824775 + 2:824775 = 5:797945
v^ = v^11 + v^22 + 2v^12 = 0:4216186 + 0:4711608 + 2(0:3762120) = 1:645203
and an approximate 95% marginal con…dence interval for a+ b is
[5:797945 + 1:96
p
1:645203; 5:797945 1:96
p
1:645203] = [2:573347; 9:022544]
7.4.7 Exercise
Using the results from Exercises 7.1.11 and 7.1.14 determine approximate 95% marginal
con…dence intervals for a, b, and an approximate con…dence interval for a+ b.
7.5. CHAPTER 7 PROBLEMS 249
7.5 Chapter 7 Problems
1. Suppose x1; x2; : : : ; xn is a observed random sample from the distribution with cumu-
lative distribution function
F (x; 1; 2) = 1

1
x
2
for x 1; 1 > 0; 2 > 0
Find the maximum likelihood estimators of 1 and 2.
2. Suppose (X1; X2; X3) Multinomial(n; 1; 2; 3). Verify that the maximum likeli-
hood estimators of 1 and 2 are ~1 = X1=n and ~2 = X2=n. Find the expected
information for 1 and 2.
3. Suppose x11; x12; : : : ; x1n1 is an observed random sample from the N(1;
2) distri-
bution and independently x21; x22; : : : ; x2n2 is an observed random sample from the
N(2;
2) distribution. Find the maximum likelihood estimators of 1; 2; and
2.
4. In a large population of males ages 4050, the proportion who are regular smokers is
where 0 1 and the proportion who have hypertension (high blood pressure) is
where 0 1. Suppose that n men are selected at random from this population
and the observed data are
Category SH S H SH S H
Frequency x11 x12 x21 x22
where S is the event the male is a smoker andH is the event the male has hypertension.
(a) Assuming the events S and H are independent determine the likelihood function,
the score vector, the maximum likelihood estimates, and the information matrix
for and .
(b) Determine the expected information matrix and its inverse matrix. What do
you notice regarding the diagonal entries of the inverse matrix?
5. Suppose x1; x2; : : : ; xn is an observed random sample from the Logistic(; ) distrib-
ution.
(a) Find the likelihood function, the score vector, and the information matrix for
and . How would you …nd the maximum likelihood estimates of and ?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then
x = log

1
u
1

is an observation from the Logistic(; ) distribution.
250 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
(c) Use the following R code to randomly generate 30 observations from a Logistic(; )
distribution.
# randomly generate 30 observations from a Logistic(mu,beta)
# using a random mu and beta values
set.seed(21086689) # set the seed so results can be reproduced
truemu<-runif(1,min=2,max=3)
truebeta<-runif(1,min=3,max=4)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round((truemu-truebeta*log(1/runif(30)-1)),2))
x
(d) Use Newton’s Method and R to …nd (^; ^). Determine S(^; ^) and I(^; ^).
(e) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for (; ).
(f) Use R to graph approximate 90%, 95%, and 99% con…dence regions for (; ).
Compare these approximate con…dence regions with the likelihood regions in (e).
(g) Determine approximate 95% marginal con…dence intervals for , , and an ap-
proximate con…dence interval for + .
6. Suppose x1; x2; : : : ; xn is an observed random sample from the Weibull(; ) distrib-
ution.
(a) Find the likelihood function, the score vector, and the information matrix for
and . How would you …nd the maximum likelihood estimates of and ?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then
x = [ log (1 u)]1=
is an observation from the Weibull(; ) distribution.
(c) Use the following R code to randomly generate 40 observations from aWeibull(; )
distribution.
# randomly generate 40 observations from a Weibull(alpha,beta)
# using random values for alpha and beta
set.seed(21086689) # set the seed so results can be reproduced
truealpha<-runif(1,min=2,max=3)
truebeta<-runif(1,min=3,max=4)
# data are sorted and rounded to two decimal places for easier display
x<-sort(round(truebeta*(-log(1-runif(40)))^(1/truealpha),2))
x
(d) Use Newton’s Method and R to …nd (^; ^). Determine S(^; ^) and I(^; ^).
(e) Use R to graph 1%, 5%, 10%, 50%, and 90% likelihood regions for (; ).
7.5. CHAPTER 7 PROBLEMS 251
(f) Use R to graph approximate 90%, 95%, and 99% con…dence regions for (; ).
Compare these approximate con…dence regions with the likelihood regions in (e).
(g) Determine approximate 95% marginal con…dence intervals for , , and an ap-
proximate con…dence interval for + .
7. Suppose Yi Binomial(1; pi), i = 1; 2; : : : ; n independently where pi =

1 + exi
1
and the xi are known constants.
(a) Determine the likelihood function, the score vector, and the expected information
matrix for and .
(b) Explain how you would use Newton’s method to …nd the maximum likelihood
estimates of and .
8. Suppose x1; x2; : : : ; xn is an observed random sample from the Three Parameter Burr
distribution with probability density function
f (x;; ; ) =
(x=)1
[1 + (x=)] +1
for x > 0, > 0; > 0; > 0
(a) Find the likelihood function, the score vector, and the information matrix for ,
, and . How would you …nd the maximum likelihood estimates of , , and
?
(b) Show that if u is an observation from the Uniform(0; 1) distribution then
x =
h
(1 u)1= 1
i1=
is an observation from the Three Parameter Burr distribution.
(c) Use the following R code to randomly generate 60 observations from the Three
Parameter Burr distribution.
# randomly generate 60 observations from the 3 Parameter Burr
# distribution using random values for alpha, beta and gamma
set.seed(21086689) # set the seed so results can be reproduced
truea<-runif(1,min=2,max=3)
trueb<-runif(1,min=3,max=4)
truec<-runif(1,min=3,max=4)
# data are sorted and rounded to 2 decimal places for easier display
x<-sort(round(trueb*((1-runif(60))^(-1/truec)-1)^(1/truea),2))
x
(d) Use Newton’s Method and R to …nd (^; ^; ^). Determine S(^; ^; ^) and I(^; ^; ^).
Use the second derivative test to verify that (^; ^; ^) are the maximum likelihood
estimates.
(e) Determine approximate 95% marginal con…dence intervals for , , and , and
an approximate con…dence interval for + + .
252 7. MAXIMUM LIKELIHOOD ESTIMATION - MULTIPARAMETER
8. Hypothesis Testing
Point estimation is a useful statistical procedure for estimating unknown parameters in
a model based on observed data. Interval estimation is a useful statistical procedure for
quantifying the uncertainty in these estimates. Hypothesis testing is another important
statistical procedure which is used for deciding whether a given statement is supported by
the observed data.
In Section 8:1 we review the de…nitions and steps of a test of hypothesis. Much of
this material was introduced in a previous statistics course such as STAT 221/231/241. In
Section 8:2 we look at how the likelihood function can be use to construct a test of hypothesis
when the model is completely speci…ed by the hypothesis of interest. The material in this
section is mostly a review of material covered in a previous statistics course. In Section 8:3
we look at how the likelihood function function can be use to construct a test of hypothesis
when the model is not completely speci…ed by the hypothesis of interest. The material in
this section is mostly new material.
8.1 Test of Hypothesis
In order to analyse a set of data x we often assume a model f(x;) where 2
and

is the parameter space or set of possible values of . A test of hypothesis is a statistical
procedure used for evaluating the strength of the evidence provided by the observed data
against an hypothesis. An hypothesis is a statement about the model. In many cases the
hypothesis can be formulated in terms of the parameter as
H0 : 2
0
where
0 is some subset of
. H0 is called the null hypothesis. When conducting a test
of hypothesis there is usually another statement of interest which is the statement which
re‡ects what might be true if H0 is not supported by the observed data. This statement is
called the alternative hypothesis and is denoted HA or H1. In many cases HA may simply
take the form
HA : =2
0
In constructing a test of hypothesis it is useful to distinguish between simple and com-
posite hypotheses.
253
254 8. HYPOTHESIS TESTING
8.1.1 De…nition - Simple and Composite Hypotheses
If the hypothesis completely speci…es the model including any parameters in the model
then the hypothesis is simple otherwise the hypothesis is composite.
8.1.2 Example
For each of the following indicate whether the null hypothesis is simple or composite. Spec-
ify
and
0 and determine the dimension of each.
(a) It is assumed that the observed data x = (x1; x2; : : : ; xn) represent a random sample
from a Poisson() distribution. The hypothesis of interest is H0 : = 0 where 0 is a
speci…ed value of .
(b) It is assumed that the observed data x = (x1; x2; : : : ; xn) represent a random sample
from a Gamma(; ) distribution. The hypothesis of interest is H0 : = 0 where 0 is a
speci…ed value of .
(c) It is assumed that the observed data x = (x1; x2; : : : ; xn) represent a random sample
from an Exponential(1) distribution and independently the observed data y = (y1; y2; : : : ; ym)
represent a random sample from an Exponential(2) distribution. The hypothesis of inter-
est is H0 : 1 = 2.
(d) It is assumed that the observed data x = (x1; x2; : : : ; xn) represent a random sample
from a N(1;
2
1) distribution and independently the observed data y = (y1; y2; : : : ; ym)
represent a random sample from a N(2;
2
2) distribution. The hypothesis of interest is
H0 : 1 = 2;
2
1 =
2
2.
Solution
(a) This is a simple hypothesis since the model and the unknown parameter are completely
speci…ed.
= f : > 0g which has dimension 1 and
0 = f0g which has dimension 0.
(b) This is a composite hypothesis since is not speci…ed byH0.
= f(; ) : > 0; > 0g
which has dimension 2 and
0 = f(0; ) : > 0g which has dimension 1.
(c) This is a composite hypothesis since 1 and 2 are not speci…ed by H0.

= f(1; 2) : 1 > 0; 2 > 0g which has dimension 2 and

0 = f(1; 2) : 1 = 2; 1 > 0; 2 > 0g which has dimension 1.
(d) This is a composite hypothesis since 1,
2
1, 2, and
2
2 are not speci…ed by H0.

=

1;
2
1; 2;
2
2

: 1 2 <; 21 > 0; 2 2 <; 22 > 0

which has dimension 4
and
0 =

1;
2
1; 2;
2
2

: 1 = 2;
2
1 =
2
2; 1 2 <; 21 > 0; 2 2 <; 22 > 0

which has
dimension 2.
To measure the evidence against H0 based on the observed data we use a test statistic or
discrepancy measure.
8.1. TEST OF HYPOTHESIS 255
8.1.3 De…nition - Test Statistic or Discrepancy Measure
A test statistic or discrepancy measure D is a function of the data X that is constructed
to measure the degree of “agreement”between the data X and the null hypothesis H0.
A test statistic is usually chosen so that a small observed value of the test statistic
indicates close agreement between the observed data and the null hypothesis H0 while a
large observed value of the test statistic indicates poor agreement. The test statistic is
chosen before the data are examined and the choice re‡ects the type of departure from the
null hypothesis H0 that we wish to detect as speci…ed by the alternative hypothesis HA. A
general method for constructing test statistics can be based on the likelihood function as
we will see in the next two sections.
8.1.4 Example
For Example 8.1.2(a) suggest a test statistic which could be used if the alternative hypoth-
esis is HA : 6= 0. Suggest a test statistic which could be used if the alternative hypothesis
is HA : > 0 and if the alternative hypothesis is HA : < 0.
Solution
If H0 : = 0 is true then E

X

= 0. If the alternative hypothesis is HA : 6= 0 then a
reasonable test statistic which could be used is D =
X 0.
If the alternative hypothesis is HA : > 0 then a reasonable test statistic which could be
used is D = X 0.
If the alternative hypothesis is HA : < 0 then a reasonable test statistic which could be
used is D = 0 X.
After the data have been collected the observed value of the test statistic is calculated.
Assuming the null hypothesis H0 is true we compute the probability of observing a value
of the test statistic at least as great as that observed. This probability is called the p-value
of the data in relation to the null hypothesis H0.
8.1.5 De…nition - p-value
Suppose we use the test statistic D = D (X) to test the null hypothesis H0. Suppose also
that d = D (x) is the observed value of D. The p-value or observed signi…cance level of the
test of hypothesis H0 using test statistic D is
p-value = P (D d;H0)
The p-value is the probability of observing such poor agreement using test statistic D
between the null hypothesis H0 and the data if the null hypothesis H0 is true. If the p-value
256 8. HYPOTHESIS TESTING
is very small, then such poor agreement would occur very rarely if the null hypothesis H0 is
true, and we interpret this to mean that the observed data are providing evidence against
the null hypothesis H0. The smaller the p-value the stronger the evidence against the null
hypothesis H0 based on the observed data. A large p-value does not mean that the null
hypothesis H0 is true but only indicates a lack of evidence against the null hypothesis H0
based on the observed data and using the test statistic D.
The following table gives a rough guideline for interpreting p-values. These are only
guidelines. The interpretation of p-values must always be made in the context of a given
study.
Table 10.1: Guidelines for interpreting p-values
p-value Interpretation
p-value > 0:10 No evidence against H0 based on the observed data.
0:05 < p-value 0:10 Weak evidence against H0 based on the observed data.
0:01 < p-value 0:05 Evidence against H0 based on the observed data.
0:001 < p-value 0:01 Strong evidence against H0 based on the observed data.
p-value 0:001 Very strong evidence against H0 based on the observed data.
8.1.6 Example
For Example 8.1.4 suppose x = 5:7, n = 25 and 0 = 5. Determine the p-value for both
HA : 6= 0 and HA : > 0. Give a conclusion in each case.
Solution
For x = 5:7, n = 25; 0 = 5, and HA : 6= 5 the observed value of the test statistic is
d = j5:7 5j = 0:7, and
p-value = P
X 5 0:7;H0 : = 5
= P (jT 125j 17:5) where T =
25P
i=1
Xi Poisson (125)
= P (T 107:5) + P (T 142:5)
= P (T 107) + P (T 143)
= 0:05605429 + 0:06113746
= 0:1171917
calculated using R. Since p-value > 0:1 there is no evidence against H0 : = 5 based on
the data.
For x = 5:7, n = 25; 0 = 5, and HA : > 5 the observed value of the test statistic is
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 257
d = 5:7 5 = 0:7, and
p-value = P

X 5 0:7;H0 : = 5

= P (T 125 17:5) where T =
25P
i=1
Xi Poisson (125)
= P (T 143)
= 0:06113746
Since 0:05 < p-value 0:1 there is weak evidence against H0 : = 5 based on the data.
8.1.7 Exercise
Suppose in a Binomial experiment 42 successes have been observed in 100 trials and the
hypothesis of interest is H0 : = 0:5.
(a) If the alternative hypothesis is HA : 6= 0:5, suggest a suitable test statistic, calculate
the p-value and give a conclusion.
(b) If the alternative hypothesis is HA : < 0:5, suggest a suitable test statistic, calculate
the p-value and give a conclusion.
8.2 Likelihood Ratio Tests for Simple Hypotheses
In Examples 8.1.4 and 8.1.7 it was reasonably straightforward to suggest a test statistic
which made sense. In this section we consider a general method for constructing a test
statistic which has good properties in the case of a simple hypothesis. The test statistic
we use is the likelihood ratio test statistic which was introduced in your previous statistics
course.
Suppose X = (X1; X2; : : : ; Xn) is a random sample from f(x; ) where 2
and the
dimension of
is k. Suppose also that the hypothesis of interest is H0 : = 0 where the
elements of 0 are completely speci…ed. H0 can also be written as H0 : 2
0 where
0
consists of the single point 0. The dimension of
0 is zero. H0 is a simple hypothesis
since the model and all the parameters are completely speci…ed. The likelihood ratio test
statistic for this simple hypothesis is
(X;0) = 2 logR (0;X)
= 2 log
"
L (0;X)
L(~;X)
#
= 2
h
l(~;X) l (0;X)
i
where ~ = ~ (X) is the maximum likelihood estimator of . Note that this test statistic
implicitly assumes that the alternative hypothesis is HA : 6= 0 or HA : =2
0.
258 8. HYPOTHESIS TESTING
Let the observed value of the likelihood ratio test statistic be
(x;0) = 2
h
l(^;x) l (0;x)
i
where x = (x1; x2; : : : ; xn) are the observed data. The p-value is
p-value = P [ (X;0) (x;0);H0]
Note that the p-value is calculated assuming H0 : = 0 is true. In general this p-value
is di¢ cult to determine exactly since the distribution of the random variable (X;0) is
usually intractable. We use the result from Theorem 7.3.4 which says that under certain
(regularity) conditions
2 logR(;Xn) = 2[l(~n;Xn) l(;Xn)]!D W 2(k) (8.1)
for each 2
where Xn = (X1; X2; : : : ; Xn) and ~n = ~n (X1; X2; : : : ; Xn).
Therefore based on the asymptotic result (8.1) and assuming H0 : = 0 is true, the
p-value for testing H0 : = 0 using the likelihood ratio test statistic can be approximated
using
p-value P [W (x;0)] where W 2(k)
8.2.1 Example
Suppose X1; X2; : : : ; Xn is a random sample from the N

; 2

distribution where 2 is
known. Show that, in this special case, the likelihood ratio test statistic for testing
H0 : = 0 has exactly a
2(1) distribution.
Solution
From Example 7.1.8 we have that the likelihood function of is
L() = n exp

1
22
nP
i=1
(xi x)2

exp

n(x )
2
22

for 2 <
or more simply
L() = exp

n(x )
2
22

for 2 <
The corresponding log likelihood function is
l() = n(x )
2
22
for 2 <
Solving
dl
d
=
n(x )
2
= 0
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 259
gives = x. Since l() is a quadratic function which is concave down we know that ^ = x
is the maximum likelihood estimate. The corresponding maximum likelihood estimator of
is
~ = X =
1
n
nP
i=1
Xi
Since L(^) = 1, the relative likelihood function is
R () =
L()
L(^)
= exp

n(x )
2
22

for 2 <
The likelihood ratio test statistic for testing the hypothesis H0 : = 0 is
(0;X) = 2 log

L (0;X)
L (~;X)

= 2 log

exp

n(
X 0)2
22

since ~ = X
=
n( X 0)2
2
=
X 0
=
p
n
2
If H0 : = 0 is true then X N

0;
2

and X 0
=
p
n
2
2(1)
as required.
8.2.2 Example
Suppose X1; X2; : : : ; Xn is a random sample from the Poisson() distribution.
(a) Find the likelihood ratio test statistic for testing H0 : = 0. Verify that the likelihood
ratio statistic takes on large values if ~ > 0 or ~ < 0.
(b) Suppose x = 6 and n = 25. Use the likelihood ratio test statistic to test H0 : = 5.
Compare this with the test in Example 8.1.6.
Solution
(a) From Example 6.2.5 we have the likelihood function
L() = nxen for 0
and maximum likelihood estimate ^ = x. The relative likelihood function can be written
as
R () =
L()
L(^)
=


^
n^
en(^) for 0
260 8. HYPOTHESIS TESTING
The likelihood ratio test statistic for H0 : = 0 is
(0;X) = 2 logR (0;X)
= 2 log
"
0
~
n~
en(
~0)
#
= 2n

~ log

0
~

+

0 ~

= 2n~

0
~
1

log

0
~

(8.2)
To verify that the likelihood ratio statistic takes on large values if ~ > 0 or ~ < 0 or
equivalently if 0~ < 1 or
0
~
> 1, consider the function
g (t) = a [(t 1) log (t)] for t > 0 and a > 0 (8.3)
We note that g (t)!1 as t! 0+ and t!1. Now
g0 (t) = a

1 1
t

= a

t 1
t

for t > 0 and a > 0
Since g0 (t) < 0 for 0 < t < 1, and g0 (t) > 0 for t > 1 we can conclude that the function
g (t) is a decreasing function for 0 < t < 1 and an increasing function for t > 1 with an
absolute minimum at t = 1. Since g (1) = 0, g (t) is positive for all t > 0 and t 6= 0.
Therefore if we let t = 0~ in (8.2) then we see that (0;X) will be large for small values
of t = 0~ < 1 or large values of t =
0
~
> 1.
(b) If x = 6, n = 25, and H0 : = 5 then the observed value of the likelihood ratio test
statistic is
(5;x) = 2 logR (0;X)
= 2 log
"
5
6
25(5:6)
e25(65)
#
= 4:6965
The parameter space is
= f : > 0g which has dimension 1 and thus k = 1. The
approximate p-value is
p-value P (W 4:6965) where W 2 (1)
= 2
h
1 P

Z
p
4:6965
i
where Z N (0; 1)
= 0:0302
calculated using R. Since 0:01 < p-value 0:05 there is evidence against H0 : = 5 based
on the data. Compared with the answer in Example 8.1.6 for HA : 6= 5 we note that the
p-values are slightly di¤erent by the conclusion is the same.
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 261
8.2.3 Example
Suppose X1; X2; : : : ; Xn is a random sample from the Exponential() distribution.
(a) Find the likelihood ratio test statistic for testing H0 : = 0. Verify that the likelihood
ratio statistic takes on large values if ~ > 0 or ~ < 0.
(b) Suppose x = 6 and n = 25. Use the likelihood ratio test statistic to test H0 : = 5.
(c) From Example 6.6.3 we have
Q(X; ) =
2
nP
i=1
Xi

=
2n~

2 (2n)
is a pivotal quantity. Explain how this pivotal quantity could be used to test H0 : = 0
if (i) HA : < 0, (ii) HA : > 0, and (iii) HA : 6= 0.
(d) Suppose x = 6 and n = 25. Use the test statistic from (c) for HA : 6= 0 to test
H0 : = 5. Compare the answer with the answer in (b).
Solution
(a) From Example 6.2.8 we have the likelihood function
L() = nenx= for > 0
and maximum likelihood estimate ^ = x. The relative likelihood function can be written
as
R () =
L()
L(^)
=

^

!n
en(1^)= for 0
The likelihood ratio test statistic for H0 : = 0 is
(0;X) = 2 logR (0;X)
= 2 log
"
~

!n
en(1~)=
#
= 2n
"
~
0
1
!
log

~
0
!#
To verify that the likelihood ratio statistic takes on large values if ~ > 0 or ~ < 0 or
equivalently if
~
0
< 1 and
~
0
> 1 we note that (0;X) is of the form 8.3 so an argument
similar to Example 8.2.2(a) can be used with t =
~
0
.
(b) If x = 6, n = 25, and H0 : = 5 then the observed value of the likelihood ratio test
statistic is
(5;x) = 2 logR (0;X)
= 2 log
"
6
5
25
e25(16)=5
#
= 0:8839222
262 8. HYPOTHESIS TESTING
The parameter space is
= f : > 0g which has dimension 1 and thus k = 1. The
approximate p-value is
p-value P (W 0:8839222) where W 2 (1)
= 2
h
1 P

Z
p
0:8839222
i
where Z N (0; 1)
= 0:3471
calculated using R. Since p-value > 0:1 there is no evidence against H0 : = 5 based on
the data.
(c) (i) If HA : > 0 we could let D =
~
0
. If H0 : = 0 is true then since E(~) = 0 we
would expect observed values of D =
~
0
to be close to 1. However if HA : > 0 is true
then E(~) = > 0 and we would expect observed values of D =
~
0
to be larger than 1
and therefore large values of D provide evidence against H0 : = 0. The corresponding
p-value would be
p-value = P

~
0
^
0
;H0
!
= P

W 2n^
0
!
where W 2 (2n)
(ii) If HA : < 0 we could still let D =
~
0
. If H0 : = 0 is true then since E(~) = 0 we
would expect observed values of D =
~
0
to be close to 1. However if HA : < 0 is true
then E(~) = < 0 and we would expect observed values of D =
~
0
to be smaller than
1 and therefore small values of D provide evidence against H0 : = 0. The corresponding
p-value would be
p-value = P

~
0
^
0
;H0
!
= P

W 2n^
0
!
where W 2 (2n)
(iii) If HA : 6= 0 we could still let D = ~0 . If H0 : = 0 is true then since E(~) = 0 we
would expect observed values of D =
~
0
to be close to 1. However if HA : 6= 0 is true
then E(~) = 6= 0 and we would expect observed values of D = ~0 to be either larger
or smaller than 1 and therefore both large and small values of D provide evidence against
H0 : = 0. If a large (small) value of D is observed it is not simple to determine exactly
which small (large) values should also be considered. Since we are not that concerned about
the exact p-value, the p-value is usually calculated more simply as
p-value = min

2P

W 2n^
0
!
; 2P

W 2n^
0
!!
where W 2 (2n)
8.2. LIKELIHOOD RATIO TESTS FOR SIMPLE HYPOTHESES 263
(d) If x = 6, n = 25, and H0 : = 5 then the observed value of the D =
~
0
is d = 65 with
p-value = min

2P

W (50) 6
5

; 2P

W (50) 6
5

where W 2 (50)
= min (1:6855; 0:3145)
= 0:3145
calculated using R. Since p-value > 0:1 there is no evidence against H0 : = 5 based on
the data.
We notice the test statistic D =
~
0
and (0;X) = 2n
h
~
0
1

log

~
0
i
are both
functions of
~
0
. For this example the p-values are similar and the conclusions are the same.
8.2.4 Example
The following table gives the observed frequencies of the six faces in 100 rolls of a die:
Face: j 1 2 3 4 5 6 Total
Observed Frequency: xj 16 15 14 20 22 13 100
Are these observations consistent with the hypothesis that the die is fair?
Solution
The model for these data is (X1; X2; : : : ; X6) Multinomial(100; 1; 2; : : : ; 6) and the
hypothesis of interest is H0 : 1 = 2 = = 6 = 16 . Since the model and parameters are
completely speci…ed this is a simple hypothesis. Since
6P
j=1
j = 1 there are really only k = 5
parameters. The relative likelihood function for (1; 2; : : : ; 5) is
L (1; 2; : : : ; 5) =
n!
x1!x2! x5!x6!
x1
1
x2
2 x55 (1 1 2 5)x6
or more simply
L (1; 2; : : : ; 5) =
x1
1
x2
2 x55 (1 1 2 5)x6
for 0 j 1 for j = 1; 2; : : : ; 5 and
5P
j=1
j 1. The log likelihood function is
l (1; 2; : : : ; 5) =
5P
j=1
xj log j + x6 log (1 1 2 5)
Now
@l
@j
=
xj
j
n x1 x2 x5
1 1 2 5 for j = 1; 2; : : : ; 5
=
xj (1 1 2 5) j (n x1 x2 x5)
j (1 1 2 5)
264 8. HYPOTHESIS TESTING
since x6 = nx1x2 x5. We could solve @l@j = 0 for j = 1; 2; : : : ; 5 simultaneously.
In the Binomial case we know ^ = xn . It seems reasonable that the maximum likelihood
estimate of j is ^j =
xj
n for j = 1; 2; : : : ; 5. To verify this is true we substitute j =
xj
n for
j = 1; 2; : : : ; 5 into @l@j to obtain
xj xj
n
P
i 6=j
xi xj + xj
n
P
i 6=j
xi = 0
Therefore the maximum likelihood estimator of j is ~j =
Xj
n for j = 1; 2; : : : ; 5. Note also
that by the invariance property of maximum likelihood estimators ~6 = 1
5P
j=1
~j =
X6
n .
Therefore we can write
l(~1; ~2; ~3; ~4; ~5;X) =
6P
i=1
Xj log

Xj
n

Since the null hypothesis is H0 : 1 = 2 = = 6 = 16
l (0;X) =
6P
i=1
Xj log

1
6

so the likelihood ratio test statistic is
(X;0) = 2
h
l(~;X) l (0;X)
i
= 2

6P
i=1
Xj log

Xj
n


6P
i=1
Xj log

1
6

= 2
6P
i=1
Xj log

Xj
Ej

where Ej = n=6 is the expected frequency for outcome j. This test statistic is the likelihood
ratio Goodness of Fit test statistic introduced in your previous statistics course.
For these data the observed value of the likelihood ratio test statistic is
(x;0) = 2
6P
i=1
xj log

xj
100=6

= 2

16 log

16
100=6

+ 15 log

15
100=6

+ + 13 log

13
100=6

= 3:699649
The approximate p-value is
p-value P (W 3:699649) where W 2(5)
= 0:5934162
calculated using R. Since p-value> 0:1 there is no evidence based on the data against the
hypothesis of a fair die.
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES 265
Note:
(1) In this example the data (X1; X2; : : : ; X6) are not a random sample. The conditions
for (8.1) hold by thinking of the experiment as a sequence of n independent trials with 6
outcomes on each trial.
(2) You may recall from your previous statistics course that the 2 approximation is rea-
sonable if the expected frequency Ej in each category is at least 5.
8.2.5 Exercise
In a long-term study of heart disease in a large group of men, it was noted that 63 men who
had no previous record of heart problems died suddenly of heart attacks. The following
table gives the number of such deaths recorded on each day of the week:
Day of Week Mon. Tues. Wed. Thurs. Fri. Sat. Sun.
No. of Deaths 22 7 6 13 5 4 6
Test the hypothesis of interest that the deaths are equally likely to occur on any day of the
week.
8.3 Likelihood Ratio Tests for Composite Hypotheses
Suppose X = (X1; X2; : : : ; Xn) is a random sample from f(x;) where 2
and
has
dimension k. Suppose we wish to test H0 : 2
0 where
0 is a subset of
of dimension
q where 0 < q < k. The hypothesis H0 is a composite hypothesis since all the values of
the unknown parameters are not speci…ed. For testing composite hypotheses we use the
likelihood ratio test statistic
(X;
0) = 2 log
24max2
0 L(;X)
max
2

L(;X)
35
= 2

l(~;X)max
2
0
l(;X)

where ~ = ~ (X1; X2; : : : ; Xn) is the maximum likelihood estimator of . Note that this
test statistic implicitly assumes that the alternative hypothesis is HA : =2
0.
Let the observed value of the likelihood ratio test statistic be
(x;
0) = 2

l(^;x)max
2
0
l (;x)

where x = (x1; x2; : : : ; xn) are the observed data. The p-value is
p-value = P [(X;
0) (x;
0);H0]
Note that the p-value is calculated assuming H0 : 2
0 is true. In general this p-value
is di¢ cult to determine exactly since the distribution of the random variable (X;
0) is
266 8. HYPOTHESIS TESTING
usually intractable. Under certain (regularity) conditions it can be shown that, assuming
the hypothesis H0 : 2
0 is true,
(X;
0) = 2

l(~n;Xn)max
2
0
l(;Xn)

!D W 2(k q)
where Xn = (X1; X2; : : : ; Xn) and ~n = ~n (X1; X2; : : : ; Xn). The approximate p-value is
given by
p-value P [W (x;
0)]
where x = (x1; x2; : : : ; xn) are the observed data.
Note:
The number of degrees of freedom is the di¤erence between the dimension of
and the
dimension of
0. The degrees of freedom can also be determined as the number of para-
meters estimated under the model with no restrictions minus the number of parameters
estimated under the model with restrictions imposed by the null hypothesis H0.
8.3.1 Example
(a) Suppose X1; X2; : : : ; Xn is a random sample from the Gamma(; ) distribution. Find
the likelihood ratio test statistic for testing H0 : = 0 where is unknown. Indicate how
to …nd the approximate p-value.
(b) For the data in Example 7.1.14 test the hypothesis H0 : = 2.
Solution
(a) From Example 8.1.2(b) we have
= f(; ) : > 0; > 0g which has dimension k = 2
and
0 = f(0; ) : > 0g which has dimension q = 1 and the hypothesis is composite.
From Exercise 7.1.11 the likelihood function is
L (; ) = [ ()]n

nQ
i=1
xi

exp

1

nP
i=1
xi

for > 0; > 0
where and the log likelihood function is
l (; ) = logL (; )
= n log () n log +
nP
i=1
log xi 1

nP
i=1
xi for > 0; > 0
The maximum likelihood estimators cannot be found explicitly so we write
l(~; ~;X) = n log (~) n~ log ~ + ~
nP
i=1
logXi 1~
nP
i=1
Xi
If = 0 then the log likelihood function is
l (0; ) = n log (0) n0 log + 0
nP
i=1
log xi
nP
i=1
xi

for > 0
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES 267
which is only a function of . To determine max
(;)2
0
l(; ;X) we note that
d
d
l (0; ) =
n0

+
nP
i=1
xi
2
and dd l (0; ) = 0 for =
1
n0
nP
i=1
xi =
x
0
and therefore
max
(;)2
0
l(; ;X) = n log (0) n0 log
X
0

+ 0
nP
i=1
logXi n0
The likelihood ratio test statistic is
(X;
0)
= 2

l(~; ~;X) max
(;)2
0
l(; ;X)

= n log (~) n~ log ~ + ~
nP
i=1
logXi 1~
nP
i=1
Xi
+n log (0) + n0 log
X
0

0
nP
i=1
logXi + n0
= n

log

(0)
(~)

+
(~ 0)
n
nP
i=1
logXi + 0 log
X
0

~ log ~ + 0
X
~

with corresponding observed value
(x;
0)
= n

log

(0)
(^)

+
(^ 0)
n
nP
i=1
log xi + 0 log

x
0

^ log ^ + 0 x
^

Since k q = 2 1 = 1
p-value P [W (x;
0)] where W 2(1)
= 2
h
1 P

Z
p
(x;
0)
i
where Z N (0; 1)
The degrees of freedom can also be determined by noticing that, under the full model two
parameters ( and ) were estimated, and under the null hypothesis H0 : = 0 only one
parameter () was estimated. Therefore 2 1 = 1 are the degrees of freedom.
(b) For H0 : = 2 and the data in Example 7.1.14 we have n = 30, x = 6:824333,
1
n
nP
i=1
log xi = 1:794204, ^ = 4:118407, ^ = 1:657032. The observed value of the likelihood
ratio test statistic is (x;
0) = 3:443073 with
p-value 2
h
1 P

Z
p
3:443073
i
where Z N (0; 1)
= 0:06352
calculated using R. Since 0:05 < p-value 0:1 there is some evidence against H0 : = 2
based on the data.
268 8. HYPOTHESIS TESTING
8.3.2 Example
(a) Suppose X1; X2; : : : ; Xn is a random sample from the Exponential(1) distribution and
independently Y1; Y2; : : : ; Ym is a random sample from the Exponential(2) distribution.
Find the likelihood ratio test statistic for testing H0 : 1 = 2. Indicate how to …nd the
approximate p-value.
(b) Find the approximate p-value if the observed data are n = 10,
10P
i=1
xi = 22, m = 15,
15P
i=1
yi = 40. What would you conclude?
Solution
(a) From Example 8.1.2(c) we have
= f(1; 2) : 1 > 0; 2 > 0g which has dimension
k = 2 and
0 = f(1; 2) : 1 = 2; 1 > 0; 2 > 0g which has dimension q = 1 and the
hypothesis is composite.
From Example 6.2.8 the likelihood function for an observed random sample x1; x2; : : : ; xn
from an Exponential(1) distribution is
L1(1) =
n
1 e
nx=1 for 1 > 0
with maximum likelihood estimate ^1 = x.
Similarly the likelihood function for an observed random sample y1; y2; : : : ; ym from an
Exponential(2) distribution is
L2(2) =
m
2 e
my=2 for 2 > 0
with maximum likelihood estimate ^2 = y. Since the samples are independent the likelihood
function for (1; 2) is
L (1; 2) = L1(1)L2(2) for 1 > 0; 2 > 0
and the log likelihood function
l (1; 2) = n log 1 nx
1
m log 2 my
2
for 1 > 0; 2 > 0
The independence of the samples also implies the maximum likelihood estimators are still
~1 = X and ~2 = Y . Therefore
l(~1; ~2;X;Y) = n log X m log Y (n+m)
If 1 = 2 = then the log likelihood function is
l () = (n+m) log (nx+my)

for > 0
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES 269
which is only a function of . To determine max
(1;2)2
0
l(1; 2;X;Y) we note that
d
d
l () =
(n+m)

+
(nx+my)
2
and dd l () = 0 for =
nx+my
n+m and therefore
max
(1;2)2
0
l(1; 2;X;Y) = (n+m) log

n X +m Y
n+m

(n+m)
The likelihood ratio test statistic is
(X;Y;
0)
= 2

l(~1; ~2;X;Y) max
(1;2)2
0
l(1; 2;X;Y)

= 2

n log X m log Y (n+m) + (n+m) log

n X +m Y
n+m

+ (n+m)

= 2

(n+m) log

n X +m Y
n+m

n log X m log Y

with corresponding observed value
(x;y;
0) = 2

(n+m) log

nx+my
n+m

n log xm log y

Since k q = 2 1 = 1
p-value P [W (x;y;
0)] where W 2(1)
= 2
h
1 P

Z
p
(x;y;
0)
i
where Z N (0; 1)
(b) For n = 10,
10P
i=1
xi = 22, m = 15,
15P
i=1
yi = 40 the observed value of the likelihood ratio
test statistic is (x;y;
0) = 0:2189032 and
p-value P [W 0:2189032] where W 2(1)
= 2
h
1 P

Z
p
0:2189032
i
where Z N (0; 1)
= 0:6398769
calculated using R. Since p-value > 0:5 there is no evidence against H0 : 1 = 2 based on
the observed data.
8.3.3 Exercise
(a) Suppose X1; X2; : : : ; Xn is a random sample from the Poisson(1) distribution and in-
dependently Y1; Y2; : : : ; Ym is a random sample from the Poisson(2) distribution. Find the
270 8. HYPOTHESIS TESTING
likelihood ratio test statistic for testing H0 : 1 = 2. Indicate how to …nd the approximate
p-value.
(b) Find the approximate p-value if the observed data are n = 10,
10P
i=1
xi = 22, m = 15,
15P
i=1
yi = 40. What would you conclude?
8.3.4 Example
In a large population of males ages 40 50, the proportion who are regular smokers is
where 0 1 and the proportion who have hypertension (high blood pressure) is
where 0 1. Suppose that n men are selected at random from this population and
the observed data are
Category SH S H SH S H
Frequency x11 x12 x21 x22
where S is the event the male is a smoker and H is the event the male has hypertension.
Find the likelihood ratio test statistic for testing H0 : events S and H are independent.
Indicate how to …nd the approximate p-value.
Solution
The model for these data is (X1; X2; : : : ; X6) Multinomial(100; 11; 12; 21; 22) with
parameter space

=
(
(11; 12; 21; 22) : 0 ij 1 for i; j = 1; 2
2P
j=1
2P
i=1
ij 1
)
which has dimension k = 3.
Let P (S) = and P (H) = then the hypothesis of interest can be written asH0 : 11 = ,
12 = (1 ), 21 = (1 ), 22 = (1 ) (1 ) and

0 = f(11; 12; 21; 22) : 11 = ; 12 = (1 ) ;
21 = (1 ); 22 = (1 ) (1 ) ; 0 1; 0 1g
which has dimension q = 2 and the hypothesis is composite.
From Example 8.2.4 we can see that the relative likelihood function for (11; 12; 21; 22) is
L (11; 12; 21; 22) =
x11
11
x12
12
x21
21
x22
22
The log likelihood function is
l (11; 12; 21; 22) =
2P
j=1
2P
i=1
xij log ij
8.3. LIKELIHOOD RATIO TESTS FOR COMPOSITE HYPOTHESES 271
and the maximum likelihood estimate of ij is ^ij =
xij
n for i; j = 1; 2. Therefore
l(~11; ~12; ~21; ~22;X) =
2P
j=1
2P
i=1
Xij log

Xij
n

If the events S and H are independent events then from Chapter 7, Problem 4 we have
that the likelihood function is
L(; ) = ()x11 [ (1 )]x12 [(1 )]x21 [(1 ) (1 )]x22
= x11+x12 (1 )x21+x22 x11+x21 (1 )x12+x22 for 0 1, 0 1
The log likelihood is
l(; ) = (x11 + x12) log+ (x21 + x22) log(1 )
+ (x11 + x21) log + (x12 + x22) log(1 )
for 0 < < 1, 0 < < 1
and the maximum likelihood estimates are
^ =
x11 + x12
n
and ^ =
x11 + x21
n
therefore
max
(11;12;21;22)2
0
l(11; 12; 21; 22;X)
= (X11 +X12) log

X11 +X12
n

+ (X21 +X22) log

X21 +X22
n

+ (X11 +X21) log

X11 +X21
n

+ (X12 +X22) log

X12 +X22
n

The likelihood ratio test statistic can be written as
(X;
0)
= 2

l(~11; ~12; ~21; ~22;X) max
(11;12;21;22)2
0
l(11; 12; 21; 22;X)

=
2P
j=1
2P
i=1
Xij log

Xij
Eij

where Eij =
RiCj
n , Ri = Xi1 + Xi2, Cj = X1j + X2j for i; j = 1; 2. Eij is the expected
frequency if the hypothesis of independence is true.
The corresponding observed value is
(x;
0) =
2P
j=1
2P
i=1
xij log

xij
eij

where eij =
ricj
n , ri = xi1 + xi2, cj = x1j + x2j for i; j = 1; 2.
272 8. HYPOTHESIS TESTING
Since k q = 3 2 = 1
p-value P [W (x;
0)] where W 2(1)
= 2
h
1 P

Z
p
(x;
0)
i
where Z N (0; 1)
This of course is the usual test of independence in a two-way table which was discussed in
your previous statistics course.
8.4. CHAPTER 8 PROBLEMS 273
8.4 Chapter 8 Problems
1. Suppose X1; X2; : : : ; Xn is a random sample from the distribution with probability
density function
f(x; ) = x1 for 0 x 1 for > 0
(a) Find the likelihood ratio test statistic for testing H0 : = 0.
(b) If n = 20 and
20P
i=1
log xi = 25 …nd the approximate p-value for testingH0 : = 1
using the asymptotic distribution of the likelihood ratio statistic. What would
you conclude?
2. Suppose X1; X2; : : : ; Xn is a random sample from the Pareto(1; ) distribution.
(a) Find the likelihood ratio test statistic for testing H0 : = 0. Indicate how to
…nd the approximate p-value.
(b) For the data n = 25 and
25P
i=1
log xi = 40 …nd the approximate p-value for testing
H0 : 0 = 1. What would you conclude?
3. Suppose X1; X2; : : : ; Xn is a random sample from the distribution with probability
density function
f (x; ) =
x
2
e
1
2
(x=)2 for x > 0; > 0
(a) Find the likelihood ratio test statistic for testing H0 : = 0. Indicate how to
…nd the approximate p-value.
(b) If n = 20 and
20P
i=1
x2i = 10 …nd the approximate p-value for testing H : = 0:1.
What would you conclude?
4. Suppose X1; X2; : : : ; Xn is a random sample from the Weibull(2; 1) distribution and
independently Y1; Y2; : : : ; Ym is a random sample from the Weibull(2; 2) distribution.
Find the likelihood ratio test statistic for testing H0 : 1 = 2. Indicate how to …nd
the approximate p-value.
5. Suppose (X1; X2; X3) Multinomial(n; 1; 2; 3).
(a) Find the likelihood ratio test statistic for testing H0 : 1 = 2 = 3. Indicate
how to …nd the approximate p-value.
(b) Find the likelihood ratio test statistic for testing H0 : 1 = 2; 2 = 2(1 );
3 = (1 )2. Indicate how to …nd the approximate p-value.
274 8. HYPOTHESIS TESTING
6. Suppose X1; X2; : : : ; Xn is a random sample from the N(1;
2
1) distribution and in-
dependently Y1; Y2; : : : ; Ym is a random sample from the N(2;
2
2) distribution. Find
the likelihood ratio test statistic for testing H0 : 1 = 2;
2
1 =
2
2. Indicate how to
…nd the approximate p-value.
9. Solutions to Chapter Exercises
275
276 9. SOLUTIONS TO CHAPTER EXERCISES
9.1 Chapter 2
Exercise 2.1.5
(a) P (A) 0 follows from De…nition 2.1.3(A1). From Example 2.1.4(c) we have P A =
1 P (A). But from De…nition 2.1.3(A1) P A 0 and therefore P (A) 1.
(b) Since A = (A \B) [ A \ B and (A \B) \ A \ B = ? then by Example 2.1.4(b)
P (A) = P (A \B) + P A \ B
or
P

A \ B = P (A) P (A \B)
as required.
(c) Since
A [B = A \ B [ (A \B) [ A \B
is the union of three mutually exclusive events then by De…nition 2.1.3(A3) and Example
2.1.4(a) we have
P (A [B) = P A \ B+ P (A \B) + P A \B (9.1)
By the result proved in (b)
P

A \ B = P (A) P (A \B) (9.2)
and similarly
P

A \B = P (B) P (A \B) (9.3)
Substituting (9.2) and (9.3) into (9.1) gives
P (A [B) = P (A) P (A \B) + P (A \B) + P (B) P (A \B)
= P (A) + P (B) P (A \B)
as required.
9.1. CHAPTER 2 277
Exercise 2.3.7
By 2.11.8
log (1 + x) = x x
2
2
+
x3
3
for 1 < x 1
Let x = p 1 to obtain
log p = (p 1) (p 1)
2
2
+
(p 1)3
3
(9.4)
which holds for 0 < p 2 and therefore also hold for 0 < p < 1. Now (9.4) can be written
as
log p = (1 p) (1 p)
2
2
(1 p)
3
3

=
1P
x=1
(1 p)x
x
for 0 < p < 1
Therefore
1P
x=1
f (x) =
1P
x=1
(1 p)x
x log p
=
1
log p
1P
x=1
(1 p)x
x
=
log p
log p
= 1
which holds for 0 < p < 1.
Exercise 2.4.11
(a)
1Z
1
f (x) dx =
1Z
0
x1

e(x=)

dx
Let y = (x=). Then dy = x
1
dx. When x = 0, y = 0 and as x ! 1, y ! 1.
Therefore
1Z
1
f (x) dx =
1Z
0
x1

e(x=)

dx
=
1Z
0
eydy = (1) = 0! = 1
(b) If = 1 then
f (x) =
1

ex= for x > 0
which is the probability density function of an Exponential() random variable.
(c) See Figure 9.1
278 9. SOLUTIONS TO CHAPTER EXERCISES
0 0.5 1 1.5 2 2.5 3
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
x
f(x)
a=1,b=0.5
a=2,b=0.5
a=3,b=1
a=2,b=1
Figure 9.1: Graphs of Weibull probability density function
Exercise 2.4.12
(a)
1Z
1
f (x) dx =
1Z


x+1
dx = lim
b!1
bZ
1
x1dx
= lim
b!1
h
xjb
i
= 1 lim
b!1
h
b
i
= 1 lim
b!1
1
b
= 1 since > 0
(b) See Figure 9.2
0 0.5 1 1.5 2 2.5 3
0
0.5
1
1.5
2
2.5
3
3.5
4
x
f(x)
a=1, b=1
a=1, b=2a=0.5, b=1
a=0.5, b=2
Figure 9.2: Graphs of Pareto probability density function
9.1. CHAPTER 2 279
Exercise 2.5.4
(a) For X Cauchy(; 1) the probability density function is
f (x; ) =
1

h
1 + (x )2
i for x 2 <; 2 <
and 0 otherwise. See Figure 9.3 for a sketch of the probability density function for
= 1; 0; 1.
-6 -4 -2 0 2 4 6
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
x
f(x)
q=-1 q=0 q=1
Figure 9.3: Cauchy(; 1) probability density functions for = 1; 0; 1
Let
f0 (x) = f (x; = 0) =
1
(1 + x2)
for x 2 <
and 0 otherwise. Then
f (x; ) =
1

h
1 + (x )2
i = f0 (x ) for x 2 <; 2 <
and therefore is a location parameter of this distribution.
(b) For X Cauchy(0; ) the probability density function is
f (x; ) =
1

h
1 + (x=)2
i for x 2 <; > 0
and 0 otherwise. See Figure 9.4 for a sketch of the probability density function for
= 0:5; 1; 2.
Let
f1 (x) = f (x; = 1) =
1
(1 + x2)
for x 2 <
280 9. SOLUTIONS TO CHAPTER EXERCISES
-5 0 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
x
f(x)
q=0.5
q=1
q=2
Figure 9.4: Cauchy(0; ) probability density function for = 0:5; 1; 2
and 0 otherwise. Then
f (x; ) =
1

h
1 + (x=)2
i = 1

f1
x


for x 2 <; > 0
and therefore is a scale parameter of this distribution.
9.1. CHAPTER 2 281
Exercise 2.6.11
If X Exponential(1) then the probability density function of X is
f(x) = ex for x 0
Y = X1= = h (X) for > 0, > 0 is an increasing function with inverse function
X = (Y=) = h1 (Y ). Since the support set of X is A = fx : x 0g, the support set of
Y is B = fy : y 0g.
Since
d
dy
h1 (y) =
d
dy

y


=
y1

then by Theorem 2.6.8 the probability density function of Y is
g(y) = f(h1(y))
ddyh1(y)

=
y1

e(y=)

for y 0
which is the probability density function of a Weibull(; ) random variable as required.
Exercise 2.6.12
X is a random variable with probability density function
f(x) = x1 for 0 < x < 1; > 0
and 0 otherwise. Y = logX is a decreasing function with inverse function X = eY =
h1 (Y ). Since the support set of X is A = fx : 0 < x < 1g, the support set of Y is
B = fy : y > 0g.
Since
d
dy
h1 (y) =
d
dy

ey

= ey
then by Theorem 2.6.8 the probability density function of Y is
g(y) = f(h1(y))
ddyh1(y)

=

ey
1 ey
= ey for y 0
which is the probability density function of a Exponential

1


random variable as required.
282 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 2.7.4
Using integration by parts with u = 1 F (x) and dv = dx gives du = f (x) dx, v = x and
1Z
0
[1 F (x)] dx = x [1 F (x)] j10 +
1Z
0
xf (x) dx
= lim
x!1x
1Z
x
f (t) dt+ E (X)
The desired result holds if
lim
x!1x
1Z
x
f (t) dt = 0 (9.5)
Since
0 x
1Z
x
f (t) dt
1Z
x
tf (t) dt
then (9.5) holds by the Squeeze Theorem if
lim
x!1
1Z
x
tf (t) dt = 0
Since E (X) =
1R
0
xf (x) dx exists thenG (x) =
xR
0
tf (t) dt exists for all x > 0 and lim
x!1G (x) =
E (X).
By the First Fundamental Theorem of Calculus 2.11.9 and the de…nition of an improper
integral 2.11.11
1Z
x
tf (t) dt = lim
b!1
G (b)G (x) = E (X)G (x)
Therefore
lim
x!1
1Z
x
tf (t) dt = lim
x!1 [E (X)G (x)] = E (X) limx!1G (x)
= E (X) E (X)
= 0
and (9.5) holds.
9.1. CHAPTER 2 283
Exercise 2.7.9
(a) If X Poisson() then
E(X(k)) =
1P
x=0
x(k)
xe
x!
= ek
1P
x=k
xk
(x k)! let y = x k
= ek
1P
y=0
y
y!
= eke by 2.11.7
= k for k = 1; 2; : : :
Letting k = 1 and k = 2 we have
E

X(1)

= E (X) =
and
E

X(2)

= E [X (X 1)] = 2
so
V ar (X) = E [X (X 1)] + E (X) [E (X)]2
= 2 + 2 =
(b) If X Negative Binomial(k; p) then
E(X(j)) =
1P
x=0
x(j)
k
x

pk (p 1)x
= pk (p 1)j (k)(j)
1P
x=j
k j
x j

(p 1)xj by 2.11.4(1)
= pk (p 1)j (k)(j)
1P
x=j
k j
x j

(p 1)xj let y = x j
= pk (p 1)j (k)(j)
1P
y=0
k j
y

(p 1)y
= pk (p 1)j (k)(j) (1 + p 1)kj by 2.11.3(1)
= (k)(j)

p 1
p
j
for j = 1; 2; : : :
Letting k = 1 and k = 2 we have
E

X(1)

= E (X) = (k)(1)

p 1
p
1
=
k (1 p)
p
and
E

X(2)

= E [X (X 1)] = (k)(2)

p 1
p
2
= k (k + 1)

1 p
p
2
284 9. SOLUTIONS TO CHAPTER EXERCISES
so
V ar (X) = E [X (X 1)] + E (X) [E (X)]2
= k (k + 1)

1 p
p
2
+
k (1 p)
p


k (1 p)
p
2
=
k (1 p)2
p2
+
k (1 p)
p
=
k (1 p)
p2
(1 p+ p)
=
k (1 p)
p2
(c) If X Gamma(; ) then
E(Xp) =
1Z
0
xp
x1ex=
()
dx =
1Z
0
x+p1ex=
()
dx let y =
x

=
1
()
1Z
0
(y)+p1 eydy
=
+p
()
1Z
0
y+p1eydy which converges for + p > 0
= p
(+ p)
()
for p >
Letting k = 1 and k = 2 we have
E (X) =
(+ 1)
()
=
()
()
=
and
E

X2

= E

X2

= 2
(+ 2)
()
= 2
(+ 1) ()
()
= (+ 1)2
so
V ar (X) = E

X2
[E (X)]2 = (+ 1)2 ()2
= 2
9.1. CHAPTER 2 285
(c) If X Weibull(; ) then
E(Xk) =
1Z
0
xk
x1e(x=)


dx let y =

x


=
1Z
0

y1=
k
eydy = k
1Z
0
yk=eydy
= k

k

+ 1

for k = 1; 2; : : :
Letting k = 1 and k = 2 we have
E (X) =

1

+ 1

and
E

X2

= E

X2

= 2

2

+ 1

so
V ar (X) = E

X2
[E (X)]2 = 2 2

+ 1





1

+ 1
2
= 2
(


2

+ 1





1

+ 1
2)
Exercise 2.8.3
From Markov’s inequality we know
P (jY j c)
E

jY jk

ck
for all k; c > 0 (9.6)
Since we are given that X is a random variable with …nite mean and …nite variance 2
then
2 = E
h
(X )2
i
= E

jX j2

Substituting Y = X , k = 1, and c = k into (9.6) we obtain
P (jX j k)
E

jX j2

(k)2
=
2
(k)2
=
1
k2
or
P (jX j k) 1
k2
for all k > 0 as required.
286 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 2.9.2
For a Exponential() random variable
p
V ar (X) = () = . For g (X) = logX,
g0 (X) = 1X . Therefore by (2.5), the variance of Y = g (X) = logX is approximately

g0 () ()
2
=

1

()
2
= 1
which is a constant.
Exercise 2.10.3
(a) If X Binomial(n; p) then
M (t) =
1P
x20
etx

n
x

px (1 p)nx
=
1P
x20

n
x

etp
x
(1 p)nx which converges for t 2 <
= (pet+ 1 p)k by 2.11.3(1)
(b) If X Poisson() then
M (t) =
1P
x=0
etx
xe
x!
= e
1P
x=0

et
x
x!
which converges for t 2 <
= eee
t
by 2.11.7
= e+e
t
for t 2 <
Exercise 2.10.6
If X Negative Binomial(k; p) then
MX (t) =

p
1 qet
k
for t < log q
By Theorem 2.10.4 the moment generating function of Y = X + k is
MY (t) = e
ktMX(t)
=

pet
1 qet
k
for t < log q
9.1. CHAPTER 2 287
Exercise 2.10.14
If X Gamma(; ) then
M(t) = (1 t) for t < 1

By Theorem 2.10.4 the moment generating function of Y = 2X= is
MY (t) = MX

2t


for
2t

<
1

= (1 2t)(2)=2 for t < 1
2
By examining the list of moment generating functions in Chapter 11 we see that this is the
moment generating function of a 2 (2) random variable. Therefore by the Uniqueness
Theorem for Moment Generating Functions, Y has a 2 (2) distribution.
Exercise 2.10.15
(a) By the Exponential series 2.11.7
M (t) = et
2=2 = 1 +

t2=2

1!
+

t2=2
2
2!
+
= 1 +
1
2
t2 +
1
2!22
t4 for t 2 <
Since E(Xk) = k!coe¢ cient of tk in the Maclaurin series for M (t) we have
E (X) = 1! (0) = 0 and E

X2

= 2!

1
2

= 1
and so
V ar (X) = E

X2
[E (X)]2 = 1
(b) By Theorem 2.10.4 the moment generating function of Y = 2X 1 is
MY (t) = e
tMX (2t) for t 2 <
= ete(2t)
2=2
= et+(2t)
2=2 for t 2 <
By examining the list of moment generating functions in Chapter 11 we see that this is the
moment generating function of a N(1; 4) random variable. Therefore by the Uniqueness
Theorem for Moment Generating Functions, Y has a N(1; 4) distribution.
288 9. SOLUTIONS TO CHAPTER EXERCISES
9.2 Chapter 3
Exercise 3.2.5
(a) The joint probability function of X and Y is
f(x; y) = P (X = x; Y = y)
=
n!
x!y! (n x y)!

2
x
[2 (1 )]y
h
(1 )2
inxy
for x = 0; 1; : : : ; n; y = 0; 1; : : : ; n; x+ y n
which is a Multinomial

n; 2; 2 (1 ) ; (1 )2

distribution or the trinomial distribution.
(b) The marginal probability function of X is
f1 (x) = P (X = x) =
nxX
y=0
n!
x!y! (n x y)!

2
x
[2 (1 )]y
h
(1 )2
inxy
=
n!
x! (n x)!

2
x 1X
y=0
(n x)!
y! (n x y)! [2 (1 )]
y
h
(1 )2
inxy
=

n
x

2
x h
2 (1 ) + (1 )2
inx
by the Binomial Series 2.11.3(1)
=

n
x

2
x
1 2nx for x = 0; 1; : : : ; n
and so X Binomialn; 2.
(c) In a similar manner to (b) the marginal probability function of Y can be shown to be
Binomial(n; 2 (1 )) since P (Aa) = 2 (1 ).
(d)
P (X + Y = t) =
P
(x;y):
P
x+y=t
f(x; y) =
tP
x=0
f(x; t x)
=
tX
x=0
n!
x! (t x)! (n t)!

2
x
[2 (1 )]tx
h
(1 )2
int
=

n
t
h
(1 )2
int tX
x=0

t
x

2
x
[2 (1 )]tx
=

n
t
h
(1 )2
int
2 + 2 (1 )t by the Binomial Series 2.11.3(1)
=

n
t

2 + 2 (1 )t h(1 )2int for t = 0; 1; : : : ; n
Thus X + Y Binomialn; 2 + 2 (1 ) which makes sense since X + Y is counting the
number of times an AA or Aa type occurs in n trials and the probability of success is
P (AA) + P (Aa) = 2 + 2 (1 ).
9.2. CHAPTER 3 289
Exercise 3.3.6
(a)
1 =
1Z
1
1Z
1
f(x; y)dxdy = k
1Z
0
1Z
0
1
(1 + x+ y)3
dxdy
=
k
2
1Z
0

lim
a!1
1
(1 + x+ y)2
ja0

dy
=
k
2
1Z
0

lim
a!1
1
(1 + a+ y)2
+
1
(1 + y)2

dy
=
k
2
1Z
0
1
(1 + y)2
dy
=
k
2
lim
a!1
1
(1 + y)
ja0

=
k
2

lim
a!1
1
(1 + a)
+ 1

=
k
2
Therefore k = 2. A graph of the joint probability density function is given in Figure 9.5.
0
1
2
3
4
0
1
2
3
4
0
0.5
1
1.5
2
Figure 9.5: Graph of joint probability density function for Exercise 3.3.6
290 9. SOLUTIONS TO CHAPTER EXERCISES
(b) (i)
P (X 1; Y 2) =
2Z
0
1Z
0
2
(1 + x+ y)3
dxdy =
2Z
0
1
(1 + x+ y)2
j10

dy
=
2Z
0
1
(2 + y)2
+
1
(1 + y)2

dy =

1
(2 + y)
1
(1 + y)

j20
=
1
4
1
3
1
2
+ 1 =
1
12
(3 4 6 + 12) = 5
12
(ii) Since f(x; y) and the support set A = f(x; y) : x 0; y 0g are both symmetric in x
and y, P (X Y ) = 0:5
(iii)
P (X + Y 1) =
1Z
0
1yZ
0
2
(1 + x+ y)3
dxdy =
1Z
0
1
(1 + x+ y)2
j1y0

dy
=
1Z
0

1
(2)2
+
1
(1 + y)2

dy =

1
4
yj10
1
(1 + y)
j10

=

1
4
+ 0 1
2
+ 1

=
1
4
(c) Since
1Z
1
f(x; y)dy =
1Z
0
2
(1 + x+ y)3
dy = lim
a!1
1
(1 + x+ y)2
ja0

=
1
(1 + x)2
for x 0
the marginal probability density function of X is
f1 (x) =
8<: 0 x < 01
(1+x)2
x 0
By symmetry the marginal probability density function of Y is
f2 (y) =
8<: 0 y < 01
(1+y)2
y 0
9.2. CHAPTER 3 291
(d) Since
P (X x; Y y) =
yZ
0
xZ
0
2
(1 + s+ t)3
dsdt =
yZ
0
1
(1 + s+ t)2
jx0

dt
=
yZ
0
1
(1 + x+ t)2
+
1
(1 + t)2

dt
=

1
1 + x+ t
1
1 + t

jy0
=
1
1 + x+ y
1
1 + y
1
1 + x
+ 1 for x 0, y 0
the joint cumulative distribution function of X and Y is
F (x; y) = P (X x; Y y) =
8<: 0 x < 0 or y < 01 + 11+x+y 11+y 11+x x 0, y 0
(e) Since
lim
y!1F (x; y) = limy!1

1 +
1
1 + x+ y
1
1 + y
1
1 + x

= 1 1
1 + x
for x 0
the marginal cumulative distribution function of X is
F1 (x) = P (X x) =
(
0 x < 0
1 11+x x 0
Check:
d
dx
F1 (x) =
d
dx

1 1
1 + x

=
1
(1 + x)2
= f1 (x) for x 0
By symmetry the marginal cumulative distribution function of Y is
F2 (y) = P (Y y) =
8<: 0 y < 01 11+y y 0
292 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 3.3.7
(a) Since the support set is A = f(x; y) : 0 < x < y <1g
1 =
1Z
1
1Z
1
f(x; y)dxdy = k
1Z
0
yZ
0
exydxdy
= k
1Z
0
exyjy0 dy
= k
1Z
0
e2y + ey dy
= k

lim
a!1

1
2
e2y ey

ja0

= k

lim
a!1

1
2
e2a ea

1
2
+ 1

=
k
2
and therefore k = 2. A graph of the joint probability density function is given in Figure 9.6
0
0.5
1
1.5
0
0.5
1
1.5
0
0.5
1
1.5
2
Figure 9.6: Graph of joint probability density function for Exercise 3.3.7
9.2. CHAPTER 3 293
(b) (i) The region of integration is shown in Figure 9.7.
P (X 1; Y 2) = 2
1Z
x=0
2Z
y=x
exydydx = 2
1Z
x=0
ex
eyj2x dx
= 2
1Z
x=0
ex
e2 + ex dx = 2 1Z
x=0
e2ex + e2x dx
= 2

e2ex 1
2
e2x

j10 = 2

e3 1
2
e2 e2 + 1
2

= 1 + 2e3 3e2
x
y
0 1
2
y=x
(x,x)
B
Figure 9.7: Region of integration for Exercise 3.3.7(b)(i)
(ii) Since the support set A = f(x; y) : 0 < x < y <1g contains only values for which
x < y then P (X Y ) = 1.
(iii) The region of integration is shown in Figure 9.8
P (X + Y 1) = 2
1=2Z
x=0
1xZ
y=x
exydydx = 2
1Z
x=0
ex
eyj1xx dx
= 2
1Z
x=0
ex
ex1 + ex dx = 2 1Z
x=0
e1 + e2x dx
= 2

e1x 1
2
e2x

j1=20 = 2

e1

1
2

1
2
e1 + 0 +
1
2

= 1 2e1
294 9. SOLUTIONS TO CHAPTER EXERCISES
x
y
0 1
1
1/2
y=x
y=1-x
(x,x)
(x,1-x)
C
Figure 9.8: Region of integration for Exercise 3.3.7(b)(iii)
(c) The marginal probability density function of X is
f1 (x) =
1Z
1
f(x; y)dy = 2
1Z
x
exydy = 2ex lim
a!1
eyjax = 2e2x for x > 0
and 0 otherwise which we recognize is an Exponential(1=2) probability density function.
The marginal probability density function of Y is
1Z
1
f(x; y)dy = 2
yZ
0
exydx = 2ey
exjy0 = 2ey 1 ey for y > 0
and 0 otherwise.
(d) Since
P (X x; Y y) = 2
xZ
s=0
yZ
t=s
estdtds = 2
xZ
0
es
etjys ds
= 2
xZ
0
es
ey + es ds
= 2
xZ
0
eyes + e2s ds = 2eyes 1
2
e2s

jx0
= 2exy e2x 2ey + 1 for x 0, x < y
9.2. CHAPTER 3 295
P (X x; Y y) = 2
yZ
s=0
yZ
t=s
estdtds
= e2y 2ey + 1 for x > y > 0
and
P (X x; Y y) = 0 for x 0 or y x
therefore the joint cumulative distribution function of X and Y is
F (x; y) = P (X x; Y y) =
8>><>>:
2exy e2x 2ey + 1 x > 0, y > x
e2y 2ey + 1 x > y > 0
0 x 0 or y x
(e) Since the support set is A = f(x; y) : 0 < x < y <1g and
lim
y!1F (x; y) = limy!1

2exy e2x 2ey + 1
= 1 e2x for x > 0
marginal cumulative distribution function of X is
F1 (x) = P (X x) =
(
0 x 0
1 e2x x > 0
which we also recognize as the cumulative distribution function of an Exponential(1=2)
random variable.
Since
lim
x!1F (x; y) = limx!y

2exy e2x 2ey + 1 = 2e2y e2y 2ey + 1
= e2y 2ey + 1 for y > 0
the marginal cumulative distribution function of Y is
F2 (y) = P (Y y) =
(
0 y 0
1 + e2y 2ey y 0
Check:
P (Y y) =
yZ
0
f2 (t) dt = 2
yZ
0

et e2t dt = 2et + 1
2
e2t

jy0
= 2

ey + 1
2
e2y + 1 1
2

= 1 + e2y 2ey for y > 0
or
d
dy
F2 (y) =
d
dy

e2y 2ey + 1 = 2e2y + 2ey = f2 (y) for y > 0
296 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 3.4.4
From the solution to Exercise 3.2.5 we have
f(x; y) =
n!
x!y! (n x y)!

2
x
[2 (1 )]y
h
(1 )2
inxy
for x = 0; 1; : : : ; n; y = 0; 1; : : : ; n; x+ y n
f1 (x) =

n
x

2
x
1 2nx for x = 0; 1; : : : ; n
and
f2 (y) =

n
y

[2 (1 )]y [1 2 (1 )]ny for y = 0; 1; : : : ; n
Since
f (0; 0) = (1 )2n 6= f1 (0) f2 (0) =

1 2n [1 2 (1 )]n
therefore X and Y are not independent random variables.
From the solution to Exercise 3.3.6 we have
f (x; y) =
2
(1 + x+ y)3
for x 0, y 0
f1 (x) =
1
(1 + x)2
for x 0
and
f2 (y) =
1
(1 + y)2
for y 0
Since
f (0; 0) = 2 6= f1 (0) f2 (0) = (1) (1)
therefore X and Y are not independent random variables.
From the solution to Exercise 3.3.7 we have
f (x; y) = 2exy for 0 < x < y <1
f1 (x) = 2e
2x for x > 0
and
f2 (y) = 2e
y 1 ey for y > 0
Since
f (1; 2) = 2e3 6= f1 (1) f2 (2) =

2e1

2e2

1 e2
therefore X and Y are not independent random variables.
9.2. CHAPTER 3 297
Exercise 3.5.3
P (Y = yjX = x) = P (X = x; Y = y)
P (X = x)
=
n!
x!y!(nxy)!

2
x
[2 (1 )]y
h
(1 )2
inxy
n!
x!(nx)!

2
x
1 2nx
=
(n x)!
y! (n x y)!
[2 (1 )]y
h
(1 )2
inxy

1 2y 1 2nxy
=

n x
y
"
2 (1 )
1 2
#y "
(1 )2
1 2
#(nx)y
=

n x
y
"
2 (1 )
1 2
#y "
1 2 + 2
1 2
#(nx)y
=

n x
y
"
2 (1 )
1 2
#y "
1 2 2 + 22
1 2
#(nx)y
=

n x
y
"
2 (1 )
1 2
#y "
1 2 (1 )
1 2
#(nx)y
for y = 0; 1; : : : ; (n x) which is the probability function of a Binomial

n x; 2(1)
(12)

as
required.
We are given that there are x genotypes of type AA. Therefore there are only n x
members (trials) whose genotype must be determined. The genotype can only be of type
Aa or type aa. In a population with only these two types the proportion of type Aa would
be
2 (1 )
2 (1 ) + (1 )2 =
2 (1 )
1 2
and the proportion of type aa would be
(1 )2
2 (1 ) + (1 )2 =
(1 )2
1 2 = 1 2 (1 )1 2
Since we have (n x) independent trials with probability of Success (type Aa) equal to
2(1)
(12) then it follows that the number of Aa types, given that there are x members of type
AA, would follow a Binomial

n x; 2(1)
(12)

distribution.
298 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 3.5.4
For Example 3.3.3 the conditional probability density function of X given Y = y is
f1 (xjy) = f (x; y)
f2 (y)
=
x+ y
y + 12
for 0 < x < 1 for each 0 < y < 1
Check:
1Z
1
f1 (xjy) dx =
1Z
0
x+ y
y + 12
dx =
1
y + 12

1
2
x2 + xyj10

=
1
y + 12

1
2
+ y 0

= 1
By symmetry the conditional probability density function of Y given X = x is
f2 (yjx) = x+ y
x+ 12
for 0 < y < 1 for each 0 < x < 1
and 1Z
1
f2 (yjx) dy = 1
For Exercise 3.3.6 the conditional probability density function of X given Y = y is
f1 (xjy) = f (x; y)
f2 (y)
=
2
(1+x+y)3
1
(1+y)2
=
2 (1 + y)2
(1 + x+ y)3
for x > 0 for each y > 0
Check:
1Z
1
f1 (xjy) dx =
1Z
0
2 (1 + y)2
(1 + x+ y)3
dx = (1 + y)2 lim
a!1
1
(1 + x+ y)2
j10

= (1 + y)2 lim
a!1
1
(1 + a+ y)2
+
1
(1 + y)2

= (1 + y)2
1
(1 + y)2
= 1
By symmetry the conditional probability density function of Y given X = x is
f2 (yjx) = 2 (1 + x)
2
(1 + x+ y)3
for y > 0 for each x > 0
and 1Z
1
f2 (yjx) dy = 1
For Exercise 3.3.7 the conditional probability density function of X given Y = y is
f1 (xjy) = f (x; y)
f2 (y)
=
2exy
2ey (1 ey) =
ex
1 ey for 0 < x < y for each y > 0
9.2. CHAPTER 3 299
Check:
1Z
1
f1 (xjy) dx =
yZ
0
ex
1 ey dx =
1
1 ey
exjy0 = 1 ey1 ey = 1
The conditional probability density function of Y given X = x is
f2 (yjx) = f (x; y)
f1 (x)
=
2exy
2e2x
= exy for y > x for each x > 0
Check:
1Z
1
f2 (yjx) dy =
1Z
x
exydy = ex lim
a!1
eyjax = ex 0 + ex = 1
Exercise 3.6.10
E (XY ) =
1Z
1
1Z
1
xyf(x; y)dxdy = 2
1Z
0
yZ
0
xyexydxdy
= 2
1Z
0
yey
0@ yZ
0
xexdx
1A dy = 2 1Z
0
yey
xex ex jy0 dy
= 2
1Z
0
yey
yey ey + 1 dy = 2 1Z
0

y2e2y + ye2y yey dy
= 2
1Z
0
y2e2ydy
1Z
0
(2y) e2ydy + 2
1Z
0
yeydy
= 2
1Z
0
y2e2ydy
1Z
0
(2y) e2ydy + 2 (2) but (2) = 1! = 1
= 2 2
1Z
0
y2e2ydy
1Z
0
(2y) e2ydy let u = 2y or y =
u
2
so dy =
1
2
du
= 2 2
1Z
0
u
2
2
eu
1
2
du
1Z
0
ueu
1
2
du
= 2 1
4
1Z
0
u2eudu 1
2
1Z
0
ueudu = 2 1
4
(3) 1
2
(2) but (3) = 2! = 2
= 2 1
2
1
2
= 1
300 9. SOLUTIONS TO CHAPTER EXERCISES
Since X Exponential(1=2), E (X) = 1=2 and V ar (X) = 1=4.
E (Y ) =
1Z
1
yf2(y)dy = 2
1Z
0
yey
ey + 1 dy
=
1Z
0
2ye2ydy + 2
1Z
0
yeydy =
1Z
0
2ye2ydy + 2 (2)
= 2
1Z
0
2ye2ydy let u = 2y
= 2
1Z
0
ueu
1
2
du = 2 1
2
1Z
0
ueudu = 2 1
2
(2) = 2 1
2
=
3
2
E

Y 2

=
1Z
1
y2f2(y)dy = 2
1Z
0
y2ey
ey + 1 dy
= 2
1Z
0
y2e2ydy + 2
1Z
0
y2eydy = 2
1Z
0
y2e2ydy + 2 (3)
= 4 2
1Z
0
y2e2ydy let u = 2y
= 4 2
1Z
0
u
2
2
eu
1
2
du = 4 1
4
1Z
0
u2eudu = 4 1
4
(3) = 4 1
2
=
7
2
V ar (Y ) = E

Y 2
[E (Y )]2 = 7
2


3
2
2
=
14 9
4
=
5
4
Therefore
Cov (X;Y ) = E (XY ) E (X)E (Y ) = 1

1
2

3
2

= 1 3
4
=
1
4
and
(X;Y ) =
Cov(X;Y )
XY
=
1
4q
1
4

5
4
= 1p5
9.2. CHAPTER 3 301
Exercise 3.7.12
Since Y Gamma(; 1 )
E (Y ) =

1


=


V ar (Y ) =

1

2
=

2
and
E

Y k

=
1Z
0
yk
y1ey
()
dy let u = y
=

()
1Z
0
u

k+1
eu
1

du =
k
()
1Z
0
uk+1eudu
=
k (+ k)
()
for + k > 0
Since XjY = y Weibull(p; y1=p)
E (Xjy) = y1=p

1 +
1
p

, E (XjY ) = Y 1=p

1 +
1
p

and
V ar (Xjy) =

y1=p
2


1 +
2
p

2

1 +
1
p

V ar (XjY ) = Y 2=p



1 +
2
p

2

1 +
1
p

Therefore
E (X) = E [E (XjY )] =

1 +
1
p

E

Y 1=p

=

1 +
1
p
1=p 1p
()
= 1=p


1 + 1p



1p

()
302 9. SOLUTIONS TO CHAPTER EXERCISES
and
V ar(X) = E[V ar(XjY )] + V ar[E(XjY )]
= E

Y 2=p



1 +
2
p

2

1 +
1
p

+ V ar

Y 1=p

1 +
1
p

=



1 +
2
p

2

1 +
1
p

E

Y 2=p

+ 2

1 +
1
p

V ar

Y 1=p

=



1 +
2
p

2

1 +
1
p

E

Y 2=p

+2

1 +
1
p

E

Y 1=p
2 hE Y 1=pi2
=

1 +
2
p

E

Y 2=p

2

1 +
1
p

E

Y 2=p

+2

1 +
1
p

E

Y 2=p

2

1 +
1
p
h
E

Y 1=p
i2
=

1 +
2
p

E

Y 2=p

2

1 +
1
p
h
E

Y 1=p
i2
=

1 +
2
p
2=p 2p
()
2

1 +
1
p
241p

1p

()
352
= 2=p
8><>:


1 + 2p



2p

()

24

1 + 1p



1p

()
352
9>=>;
Exercise 3.7.13
Since P Beta(a; b)
E (P ) =
a
a+ b
, V ar (P ) =
ab
(a+ b+ 1) (a+ b)2
and
E

P k

=
1Z
0
pk
(a+ b)
(a) (b)
pa1 (1 p)b1 dp
=
(a+ b)
(a) (b)
1Z
0
pa+k1 (1 p)b1 dp
=
(a+ b) (a+ k)
(a) (a+ k + b)
1Z
0
(a+ k + b)
(a+ k) (b)
pk+a1 (1 p)b1 dp
=
(a+ b) (a+ k)
(a) (a+ k + b)
(1) =
(a+ b) (a+ k)
(a) (a+ k + b)
9.2. CHAPTER 3 303
provided a+ k > 0 and a+ k + b > 0.
Since Y jP = p Geometric(p)
E (Y jp) = 1 p
p
, E (Y jP ) = 1 P
P
=
1
P
1
and
V ar (Y jp) = 1 p
p2
, V ar (Y jP ) = 1 P
P 2
=
1
P 2
1
P
Therefore
E (Y ) = E [E (Y jP )] = E

1
P
1

= E

P1
1
=
(a+ b) (a 1)
(a) (a 1 + b) 1
=
(a 1 + b) (a 1 + b) (a 1)
(a 1) (a 1) (a 1 + b) 1
=
a 1 + b
a 1 1 =
b
a 1
provided a > 1 and a+ b > 1.
Now
V ar(Y ) = E[V ar(Y jP )] + V ar[E(Y jP )]
= E

1
P 2
1
P

+ V ar

1
P
1

= E

1
P 2

E

1
P

+ E
"
1
P
1
2#


E

1
P
1
2
= E

1
P 2

E

1
P

+ E

1
P 2

2E

1
P

+ 1

E

1
P
2
+ 2E

1
P

1
= 2E

P2
E P1 E P12
= 2
(a+ b) (a 2)
(a) (a 2 + b)
(a+ b) (a 1)
(a) (a 1 + b)

(a+ b) (a 1)
(a) (a 1 + b)
2
= 2
(a+ b 2) (a+ b 1)
(a 2) (a 1)
a+ b 1
a 1

a+ b 1
a 1
2
=

a+ b 1
a 1

2

a+ b 2
a 2

1 a+ b 1
a 1

=
ab (a+ b 1)
(a 1)2 (a 2)
provided a > 2 and a+ b > 2.
304 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 3.9.4
If T = Xi +Xj ; i 6= j; then
T Binomial(n; pi + pj)
The moment generating function of T = Xi +Xj is
M (t) = E

etT

+ E

et(Xi+Xj)

= E

etXi+tXj

= M (0; : : : ; 0; t; 0; : : : ; 0; t; 0; : : : ; 0)
=

p1 + + piet + + pjet + + pk1 + pk
n
for t 2 <
=

(pi + pj) e
t + (1 pi pj)
n
for t 2 <
which is the moment generating function of a Binomial(n; pi + pj) random variable. There-
fore by the Uniqueness Theorem for Moment Generating Functions, T has a Binomial(n; pi + pj)
distribution.
9.3. CHAPTER 4 305
9.3 Chapter 4
Exercise 4.1.2
The support set of (X;Y ) is A = f(x; y) : 0 < x < y < 1g which is the union of the regions
E and F shown in Figure 9.3For s > 1
x
y
0 1
1 x=y
x=y/s
EF
G (s) = P (S s) = P

Y
X
s

= P (Y sX) = P (Y sX 0)
=
Z
(x;y)
Z
2 E
3ydxdy =
1Z
y=0
yZ
x=y=s
3ydxdy =
1Z
0
3y

xjyy=s

dy
=
1Z
0
3y

y y
s

dy =

1 1
s
1Z
0
3y2dy =

1 1
s

y3j10

= 1 1
s
The cumulative distribution function for S is
G (s) =
(
0 s 1
1 1s s > 1
As a check we note that lim
s!1+

1 1s

= 0 = G (1) and lim
s!1

1 1s

= 1 so G (s) is a
continuous function for all s 2 <.
For s > 1
g (s) =
d
ds
G (s) =
d
ds

1 1
s

=
1
s2
The probability density function of S is
g (s) =
1
s2
for s > 1
and 0 otherwise.
306 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 4.2.5
Since X Exponential(1) and Y Exponential(1) independently, the joint probability
density function of X and Y is
f (x; y) = f1 (x) f2 (y) = e
xey
= exy
with support set RXY = f(x; y) : x > 0; y > 0g which is shown in Figure 9.9.
x
y
0
. . .
.
.
. .
.
.
Figure 9.9: Support set RXY for Example 4.2.5
The transformation
S : U = X + Y , V = X Y
has inverse transformation
X =
U + V
2
, Y =
U V
2
Under S the boundaries of RXY are mapped as
(k; 0) ! (k; k) for k 0
(0; k) ! (k;k) for k 0
and the point (1; 2) is mapped to and (3;1). Thus S maps RXY into
RUV = f(u; v) : u < v < u; u > 0g
as shown in Figure 9.10.
9.3. CHAPTER 4 307
u
v
0
v=u
v=-u
. . .
. . .
Figure 9.10: Support set RUV for Example 4.2.5
The Jacobian of the inverse transformation is
@ (x; y)
@ (u; v)
=

@x
@u
@x
@v
@y
@u
@y
@v
=
12 121
2 12
= 12
The joint probability density function of U and V is given by
g (u; v) = f

u+ v
2
;
u v
2
12

=
1
2
eu for (u; v) 2 RUV
and 0 otherwise.
To …nd the marginal probability density functions for U we note that the support set
RUV is not rectangular and the range of integration for v will depend on u.
g1 (u) =
1Z
1
g (u; v) dv
=
1
2
eu
uZ
v=u
dv
=
1
2
ueu (2)
= ueu for u > 0
and 0 otherwise which is the probability density function of a Gamma(2; 1) random variable.
Therefore U Gamma(2; 1).
308 9. SOLUTIONS TO CHAPTER EXERCISES
To …nd the marginal probability density functions for V we need to consider the two
cases v 0 and v < 0. For v 0
g2 (v) =
1Z
1
g (u; v) du
=
1
2
1Z
u=v
eudu
=
1
2

lim
b!1
eujbv

=
1
2

lim
b!1
eb ev

=
1
2
ev
For v < 0
g2 (v) =
1Z
1
g (u; v) du
=
1
2
1Z
u=v
eudu
=
1
2

lim
b!1
eujbv

=
1
2

lim
b!1
eb ev

=
1
2
ev
Therefore the probability density function of V is
g2 (v) =
(
1
2e
v v < 0
1
2e
v v 0
which is the probability density function of Double Exponential(0; 1) random variable.
Therefore V Double Exponential(0; 1).
9.3. CHAPTER 4 309
Exercise 4.2.7
Since X Beta(a; b) and Y Beta(a + b; c) independently, the joint probability density
function of X and Y is
f (x; y) =
(a+ b)
(a) (b)
xa1 (1 x)b1 (a+ b+ c)
(a+ b) (c)
ya+b1 (1 y)c1
=
(a+ b+ c)
(a) (b) (c)
xa1 (1 x)b1 ya+b1 (1 y)c1
with support set RXY = f(x; y) : 0 < x < 1; 0 < y < 1g as shown in Figure 9.11.
x
y
0 1
1
Figure 9.11: Support RXY for Exercise 4.2.5
The transformation
S : U = XY , V = X
has inverse transformation
X = V , Y = U=V
Under S the boundaries of RXY are mapped as
(k; 0) ! (0; k) 0 k 1
(0; k) ! (0; 0) 0 k 1
(1; k) ! (k; 1) 0 k 1
(k; 1) ! (k; k) 0 k 1
and the point

1
2 ;
1
3

is mapped to and

1
6 ;
1
2

. Thus S maps RXY into
RUV = f(u; v) : 0 < u < v < 1g
as shown in Figure 9.12
310 9. SOLUTIONS TO CHAPTER EXERCISES
u
v
0 1
1
Figure 9.12: Support set RUV for Exercise 4.2.5
The Jacobian of the inverse transformation is
@ (x; y)
@ (u; v)
=
0 11
v
@y
@v
= 1v
The joint probability density function of U and V is given by
g (u; v) = f

v;
u
v
1v

=
(a+ b+ c)
(a) (b) (c)
va1 (1 v)b1
u
v
a+b1
1 u
v
c1 1
v
=
(a+ b+ c)
(a) (b) (c)
ua+b1 (1 v)b1 vb1

1 u
v
c1
for (u; v) 2 RUV
and 0 otherwise.
To …nd the marginal probability density functions for U we note that the support set
RUV is not rectangular and the range of integration for v will depend on u.
g (u) =
1Z
1
g (u; v) dv =
(a+ b+ c)
(a) (b) (c)
ua+b1
1Z
v=u
(1 v)b1 vb1

1 u
v
c1
dv
=
(a+ b+ c)
(a) (b) (c)
ua+b1
1Z
u
vb+1 (1 v)b1

1 u
v
c1 1
v2
dv
=
(a+ b+ c)
(a) (b) (c)
ua1
1Z
u
ub1

1
v
1
b1
1 u
v
c1 u
v2
dv
=
(a+ b+ c)
(a) (b) (c)
ua1
1Z
u
u
v
u
b1
1 u
v
c1 u
v2
dv
9.3. CHAPTER 4 311
To evaluate this integral we make the substitution
t =
u
v u
1 u
Then
u
v
u = (1 u) t
1 u
v
= 1 u (1 u) t = (1 u) (1 t)
u
v2
= (1 u) dt
When v = u then t = 1 and when v = 1 then t = 0. Therefore
g1 (u) =
(a+ b+ c)
(a) (b) (c)
ua1
1Z
0
[(1 u) t]b1 [(1 u) (1 t)]c1 (1 u) dt
=
(a+ b+ c)
(a) (b) (c)
ua1 (1 u)b+c1
1Z
0
tb1 (1 t)c1 dt
=
(a+ b+ c)
(a) (b) (c)
ua1 (1 u)b+c1 (b) (c)
(b+ c)
=
(a+ b+ c)
(a) (b+ c)
ua1 (1 u)b+c1 for 0 < u < 1
and 0 otherwise which is the probability density function of a Beta(a; b+ c) random variable.
Therefore U Beta(a; b+ c).
Exercise 4.2.12
(a) Consider the transformation
S : U =
X=n
Y=m
, V = Y
This transformation has inverse transformation
X =
n
m

UV , Y = V
and therefore is a one-to-one transformation.
SinceX 2(n) independently of Y 2(m) then the joint probability density function
of X and Z is
f (x; y) = f1 (x) f2 (y) =
1
2n=2 (n=2)
xn=21ex=2
1
2m=2 (m=2)
ym=21ey=2
=
1
2(n+m)=2 (n=2) (m=2)
xn=21ex=2ym=21ey=2
312 9. SOLUTIONS TO CHAPTER EXERCISES
with support set RXY = f(x; y) : x > 0; y > 0g. The transformation S maps RXY into
RUV = f(u; v) : u > 0; v > 0g.
The Jacobian of the inverse transformation is
@ (x; y)
@ (u; v)
=

@x
@u
@x
@v
@y
@u
@y
@v
=


n
m

v @x@v
0 1
= nm v
The joint probability density function of U and V is given by
g (u; v) = f
n
m

uv; v
n
m

v

=
1
2(n+m)=2 (n=2) (m=2)
n
m
n=21
(uv)n=21 e(
n
m)uv=2vm=21ev=2
n
m

v
=

n
m
n=2
un=21
2(n+m)=2 (n=2) (m=2)
v(n+m)=21ev(
nu
m
+1)=2 for (u; v) 2 RUV
and 0 otherwise.
To determine the distribution of U we still need to …nd the marginal probability density
function for U .
g1 (u) =
1Z
1
g (u; v) dv
=

n
m
n=2
un=21
2(n+m)=2 (n=2) (m=2)
1Z
0
v(n+m)=21ev(
nu
m
+1)=2dv
Let y = v2

1 + nmu

so that v = 2y

1 + nmu
1 and dv = 2 1 + nmu1 dy. Note that when
v = 0 then y = 0, and when v !1 then y !1. Therefore
g1 (u) =

n
m
n=2
un=21
2(n+m)=2 (n=2) (m=2)
1Z
0

2y

1 +
n
m
u
1(n+m)=21
ey

2

1 +
n
m
u
1
dy
=

n
m
n=2
un=212(n+m)=2
2(n+m)=2 (n=2) (m=2)

1 +
n
m
u
1(n+m)=2 1Z
0
y(n+m)=21eydy
=

n
m
n=2
un=21
(n=2) (m=2)

1 +
n
m
u
(n+m)=2


n+m
2

=

n
m
n=2


n+m
2

(n=2) (m=2)
un=21

1 +
n
m
u
(n+m)=2
for u > 0
and 0 otherwise which is the probability density function of a random variable with a
F(n;m) distribution. Therefore U = X=nY=m F(n;m).
(b) To …nd E (U) we use
E (U) = E

X=n
Y=m

=
m
n
E (X)E

Y 1

9.3. CHAPTER 4 313
since X 2(n) independently of Y 2(m). From Example 4.2.10 we know that if
W 2(k) then
E (W p) =
2p (k=2 + p)
(k=2)
for
k
2
+ p > 0
Therefore
E (X) =
2 (n=2 + 1)
(n=2)
= n
E

Y 1

=
21 (m=2 1)
(m=2)
=
1
2 (m=2 1) =
1
m 2 if m > 2
and
E (U) =
m
n
(n)E

1
m 2

=
m
m 2 if m > 2
To …nd V ar (U) we need
E

U2

= E
"
(X=n)2
(Y=m)2
#
=
m2
n2
E

X2

E

Y 2

Now
E

X2

=
22 (n=2 + 2)
(n=2)
= 4
n
2
+ 1
n
2

= n (n+ 2)
E

Y 2

=
22 (m=2 2)
(m=2)
=
1
4

m
2 1

m
2 2
= 1
(m 2) (m 4) for m > 4
and
E

U2

=
m2
n2
n (n+ 2)
1
(m 2) (m 4) =
n+ 2
n
m2
(m 2) (m 4) for m > 4
Therefore
V ar (U) = E

U2
[E (U)]2
=
n+ 2
n
m2
(m 2) (m 4)

m
m 2
2
=
m2
m 2

n+ 2
n (m 4)
1
m 2

=
m2
m 2

(n+ 2) (m 2) n (m 4)
n (m 4) (m 2)

=
m2
m 2

2 (n+m 2)
n (m 4) (m 2)

=
2m2 (n+m 2)
n (m 2)2 (m 4) for m > 4
314 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 4.3.3
X1; X2; : : : ; Xn are independent and identically distributed random variables with moment
generating function M (t), E (Xi) = , and V ar (Xi) = 2 <1.
The moment generating function of Z =
p
n

X = is
MZ (t) = E

etZ

= E

et
p
n( X)=

= e
p
n=tE

exp

t
p
n

1
n
nP
i=1
Xi

= e
p
n=tE

exp

t

p
n
nP
i=1
Xi

= e
p
n=t
nQ
i=1
E

exp

t

p
n
Xi

since X1; X2; : : : ; Xn are independent
= e
p
n=t
nQ
i=1
M

t

p
n

since X1; X2; : : : ; Xn are identically distributed
= e
p
n=t

M

t

p
n
n
Exercise 4.3.7
nP
i=1
(Xi )2 =
nP
i=1

Xi X + X
2
=
nP
i=1

Xi X
2
+ 2

X nP
i=1

Xi X

+
nP
i=1

X 2
=
nP
i=1

Xi X
2
+ n

X 2
since
nP
i=1

Xi X

=
nP
i=1
Xi
nP
i=1
X =
nP
i=1
Xi n X
=
nP
i=1
Xi n

1
n
nP
i=1
Xi

=
nP
i=1
Xi
nP
i=1
Xi
= 0
9.3. CHAPTER 4 315
Exercise 4.3.11
Since X1; X2; : : : ; Xn are independent N

1;
2
1

random variables then by Theorem
4.3.8
U =
(n 1)S21
21
=
nP
i=1

Xi X
2
21
2(n 1)
Since Y1; Y2; : : : ; Ym are independent N

2;
2
2

random variables then by Theorem 4.3.8
V =
(m 1)S22
22
=
mP
i=1

Yi Y
2
22
2(m 1)
U and V are independent random variables since X1; X2; : : : ; Xn are independent of
Y1; Y2; : : : ; Ym.
Now
S21=
2
1
S22=
2
2
=
(n1)S21
21
n1
(m1)S22
22
m1
=
U= (n 1)
V= (m 1)
Therefore by Theorem 4.2.11
S21=
2
1
S22=
2
2
F (n 1;m 1)
316 9. SOLUTIONS TO CHAPTER EXERCISES
9.4 Chapter 5
Exercise 5.4.4
If Xn Binomial(n; p) then
Mn (t) = E

etXn

=

pet + q
n
for t 2 < (9.7)
If = np then
p =

n
and q = 1
n
(9.8)
Substituting 9.8 into 9.7 and simplifying gives
Mn (t) =

n
et + 1
n
n
=
"
1 +


et 1
n
#n
for t 2 <
Now
lim
n!1
"
1 +


et 1
n
#n
= e(e
t1) for t <1
by Corollary 5.1.3. Since M (t) = e(e
t1) for t 2 < is the moment generating function of
a Poisson() random variable then by Theorem 5.4.1, Xn !D X Poisson().
By Theorem 5.4.2
P (Xn = x) =

n
x

px (q)nx (np)
x enp
x!
for x = 0; 1; : : :
Exercise 5.4.7
Let Xi Binomial(1; p), i = 1; 2; : : : independently. Since X1; X2; : : : are independent and
identically distributed random variables with E (Xi) = p and V ar (Xi) = p (1 p), then
p
n

Xn p
p
p (1 p) !D Z N(0; 1)
by the Central Limit Theorem. Let Sn =
nP
i=1
Xi. Then
Sn npp
np (1 p) =
n

1
n
nP
i=1
Xi p

p
n
p
p (1 p) =
p
n

Xn p
p
p (1 p) !D Z N(0; 1)
Now by 4.3.2(1) Sn Binomial(n; p) and therefore Yn and Sn have the same distribution.
It follows that
Zn =
Yn npp
np (1 p) !D Z N(0; 1)
9.4. CHAPTER 5 317
Exercise 5.5.3
(a) Let g (x) = x2 which is a continuous function for all x 2 <. Since Xn !p a then by
5.5.1(1), X2n = g (Xn)!p g (a) = a2 or X2n !p a2.
(b) Let g (x; y) = xy which is a continuous function for all (x; y) 2 <2. Since Xn !p a and
Yn !p b then by 5.5.1(2), XnYn = g (Xn; Yn)!p g (a; b) = ab or XnYn !p ab.
(c) Let g (x; y) = x=y which is a continuous function for all (x; y) 2 <2; y 6= 0. Since
Xn !p a and Yn !p b 6= 0 then by 5.5.1(2), Xn=Yn = g (Xn; Yn) !p g (a; b) = a=b , b 6= 0
or Xn=Yn !p a=b, b 6= 0.
(d) Let g (x; z) = x 2z which is a continuous function for all (x; z) 2 <2. Since Xn !p a
and Zn !D Z N(0; 1) then by Slutsky’s Theorem, Xn 2Zn = g (Xn; Zn)!D g (a; Z) =
a 2Z or Xn 2Zn !D a 2Z where Z N(0; 1). Since a 2Z N(a; 4), therefore
Xn 2Zn !D a 2Z N(a; 4)
(e) Let g (x; z) = 1=z which is a continuous function for all (x; z) 2 <2; z 6= 0. Since
Zn !D Z N(0; 1) then by Slutsky’s Theorem, 1=Zn = g (Xn; Zn) !D g (a; z) = 1=Z
or 1=Zn !D 1=Z where Z N(0; 1). Since h (z) = 1=z is a decreasing function for all z 6= 0
then by Theorem 2.6.8 the probability density function of W = 1=Z is
f (w) =
1p
2
e1=(2w
2)

1
w2

for z 6= 0
Exercise 5.5.8
By (5.9) p
n

Xn

p

!D Z N(0; 1)
and by Slutsky’s Theorem
Un =
p
n

Xn
!D pZ N(0; ) (9.9)
Let g (x) =
p
x, a = , and b = 1=2. Then g0 (x) = 1
2
p
x
and g0 (a) = g0 () = 12p . By (9.9)
and the Delta Method
n1=2
p
Xn p

!D 1
2
p

(
p
Z) =
1
2
Z N

0;
1
4

318 9. SOLUTIONS TO CHAPTER EXERCISES
9.5 Chapter 6
Exercise 6.4.6
The probability density function of a Exponential(1; ) random variable is
f (x; ) = e(x) for x and 2 <
and zero otherwise. The support set of the random variable X is [;1) which depends on
the unknown parameter .
The likelihood function is
L() =
nQ
i=1
f(xi; )
=
nQ
i=1
e(xi) if xi , i = 1; 2; : : : ; n and 2 <
=

nQ
i=1
exi

en if xi and 2 <
or more simply
L() =
8<:0 if > x(1)en if x(1)
where x(1) = min (x1; x2; : : : ; xn) is the maximum of the sample. (Note: In order to observe
the sample x1; x2; : : : ; xn the value of must be smaller than all the observed xi’s.) L()
is a increasing function of on the interval (1; x(1)]]. L() is maximized at = x(1).
The maximum likelihood estimate of is ^ = x(1) and the maximum likelihood estimator
is ~ = X(1).
Note that in this example there is no solution to dd l () =
d
d (n) = 0 and the maximum
likelihood estimate of is not found by solving dd l () = 0.
If n = 12 and x(1) = 2
L() =
8<:0 if > 2e12 if 2
The relative likelihood function is
R() =
8<:0 if > 2e12(2) if 2
is graphed in Figure 9.13 along with lines for determining 10% and 50% likelihood intervals.
To determine the value of at which the horizontal line R = p intersects the graph of
R() we solve e12(2) = p to obtain = 2 + log p=12. Since R() = 0 if > 2 then a
100p% likelihood interval for is of the form [2 + log (p) =12; 2]. For p = 0:1 we obtain the
10% likelihood interval [2 + log (0:1) =12; 2] = [1:8081; 2]. For p = 0:5 we obtain the 50%
likelihood interval [2 + log (0:5) =12; 2] = [1:9422; 2].
9.5. CHAPTER 6 319
1.5 1.6 1.7 1.8 1.9 2 2.1 2.2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
q
R
( q
)
Figure 9.13: Relative likelihood function for Exercise 6.4.6
Exercise 6.7.3
By Chapter 5, Problem 7 we have
Q(Xn; ) =
p
n

Xn
n
q
Xn
n

1 Xnn
!D Z N(0; 1) (9.10)
and therefore Q(Xn; ) is an asymptotic pivotal quantity.
Let a be the value such that P (Z a) = (1 + p) =2 where Z N(0; 1). Then by (9.10) we
have
p P
0@a pn Xnn q
Xn
n

1 Xnn
a
1A
= P

Xn
n
ap
n
s
Xn
n

1 Xn
n

Xn
n
+
ap
n
s
Xn
n

1 Xn
n
!
and an approximate 100p% equal tail con…dence interval for is
xn
n
ap
n
r
xn
n

1 xn
n

;
xn
n
+
ap
n
r
xn
n

1 xn
n

or 2664^ a
vuut ^ 1 ^
n
; ^ + a
vuut ^ 1 ^
n
3775
where ^ = xn=n.
320 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 6.7.12
(a) The likelihood function is
L() =
nQ
i=1
f(xi; ) =
nQ
i=1
e(x)
1 + e(x)
2
= exp

nP
i=1
xi

en
nQ
i=1
1
1 + e(xi)
2 for 2 <
or more simply
L() = en
nQ
i=1
1
1 + e(xi)
2 for 2 <
The log likelihood function is
l() = n 2
nP
i=1
log

1 + e(xi)

for 2 <
The score function is
S () =
d
d
l () = n 2
nP
i=1
e(xi)
1 + e(xi)
= n 2
nP
i=1
1
1 + exi
for 2 <
Notice that S() = 0 cannot be solved explicitly. The maximum likelihood estimate can
only be determined numerically for a given sample of data x1; x2; : : : ; xn. Note that since
d
d
S () =
"
2
nP
i=1
exi
(1 + exi)2
#
for 2 <
is negative for all values of > 0 then we know that S () is always decreasing so there
is only one solution to S() = 0. Therefore the solution to S() = 0 gives the maximum
likelihood estimate.
The information function is
I() = d
2
d2
l()
= 2
nP
i=1
exi
(1 + exi)2
for 2 <
(b) If X Logistic(; 1) then the cumulative distribution function of X is
F (x; ) =
1
1 + e(x)
for x 2 <; 2 <
Solving
u =
1
1 + e(x)
9.5. CHAPTER 6 321
gives
x = log

1
u
1

Therefore the inverse cumulative distribution function is
F1 (u) = log

1
u
1

for 0 < u < 1
If u is an observation from the Uniform(0; 1) distribution then log 1u 1 is an obser-
vation from the Logistic(; 1) distribution by Theorem 2.6.6.
(c) The generated data are
0:18 0:05 0:32 0:78 1:04 1:11 1:26 1:41 1:50 1:57
1:58 1:60 1:68 1:68 1:71 1:89 1:93 2:02 2:2:5 2:40
2:47 2:59 2:76 2:78 2:87 2:91 4:02 4:52 5:25 5:56
(d) Here is R code for calculating and graphing the likelihood function for these data.
# function for calculating Logistic information for data x and theta=th
LOLF<-function(th,x)
{n<-length(x)
L<-exp(n*th)*(prod(1+exp(th-x)))^(-2)
return(L)}
th<-seq(1,3,0.01)
L<-sapply(th,LOLF,x)
plot(th,L,"l",xlab=expression(theta),
ylab=expression(paste("L(",theta,")")),lwd=3)
The graph of the likelihood function is given in Figure 9.5.(e) Here is R code for Newton’s
1.0 1.5 2.0 2.5 3.0
0
10
00
0
30
00
0
50
00
0
q
L(
q)
322 9. SOLUTIONS TO CHAPTER EXERCISES
Method. It requires functions for calculating the score and information
# function for calculating Logistic score for data x and theta=th
LOSF<-function(th,x)
{n<-length(x)
S<-n-2*sum(1/(1+exp(x-th)))
return(S)}
#
# function for calculating Logistic information for data x and theta=th
LOIF<-function(th,x)
{n<-length(x)
I<-2*sum(exp(x-th)/(1+exp(x-th))^2)
return(I)}
#
# Newton’s Method for Logistic Example
NewtonLO<-function(th,x)
{thold<-th
thnew<-th+0.1
while (abs(thold-thnew)>0.00001)
{thold<-thnew
thnew<-thold+LOSF(thold,x)/LOIF(thold,x)
print(thnew)}
return(thnew)}
# use Newton’s Method to find the maximum likelihood estimate
# use the mean of the data to begin Newton’s Method
# since theta is the mean of the distribution
thetahat<-NewtonLO(mean(x),x)
cat("thetahat = ",thetahat)
The maximum estimate found using Newton’s Method is
^ = 2:018099
(f) Here is R code for determining the values of S(^) and I(^).
# calculate Score(thetahat) and the observed information
Sthetahat<-LOSF(thetahat,x)
cat("S(thetahat) = ",Sthetahat)
Ithetahat<-LOIF(thetahat,x)
cat("Observed Information = ",Ithetahat)
The values of S(^) and I(^) are
S(^) = 3:552714 1015
I(^) = 11:65138
9.5. CHAPTER 6 323
(g) Here is R code for plotting the relative likelihood function for based on these data.
# function for calculating Logistic relative likelihood function
LORLF<-function(th,thetahat,x)
{R<-LOLF(th,x)/LOLF(thetahat,x)
return(R)}
#
# plot the Logistic relative likelihood function
#plus a line to determine the 15% likelihood interval
th<-seq(1,3,0.01)
R<-sapply(th,LORLF,thetahat,x)
plot(th,R,"l",xlab=expression(theta),
ylab=expression(paste("R(",theta,")")),lwd=3)
abline(a=0.15,b=0,col="red",lwd=2)
The graph of the relative likelihood function is given in Figure 9.5.
1.0 1.5 2.0 2.5 3.0
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
q
R
( q
)
(h) Here is R code for determining the 15% likelihood interval and the approximate 95%
con…dence interval (6.18)
# determine a 15% likelihood interval using uniroot
uniroot(function(th) LORLF(th,thetahat,x)-0.15,lower=1,upper=1.8)$root
uniroot(function(th) LORLF(th,thetahat,x)-0.15,lower=2.2,upper=3)$root
# calculate an approximate 95% confidence intervals for theta
L95<-thetahat-1.96/sqrt(Ithetahat)
U95<-thetahat+1.96/sqrt(Ithetahat)
cat("Approximate 95% confidence interval = ",L95,U95) # display values
324 9. SOLUTIONS TO CHAPTER EXERCISES
The 15% likelihood interval is
[1:4479; 2:593964]
The approximate 95% con…dence interval is
[1:443893; 2:592304]
which are very close due to the symmetric nature of the likelihood function.
9.6. CHAPTER 7 325
9.6 Chapter 7
Exercise 7.1.11
If x1; x2; :::; xn is an observed random sample from the Gamma(; ) distribution then the
likelihood function for ; is
L (; ) =
nQ
i=1
f (xi;; )
=
nQ
i=1
x1i
()
exi= for > 0; > 0
= [ ()]n

nQ
i=1
xi
1
exp

t2


for > 0; > 0
where
t2 =
nP
i=1
xi
or more simply
L (; ) = [ ()]n

nQ
i=1
xi

exp

t2


for > 0; > 0
The log likelihood function is
l (; ) = logL (; )
= n log () n log + t1 t2

for > 0; > 0
where
t1 =
nP
i=1
log xi
The score vector is
S (; ) =
h
@l
@
@l
@
i
=
h
n () + t1 n log t22 n
i
where
(z) =
d
dz
log (z)
is the digamma function.
The information matrix is
I (; ) =
24 @2l@2 @2l@@
@2l@@ @
2l
@2
35
=
24 n 0 () n
n

2t2
3
n
2
35
= n
24 0 () 1
1

2x
3

2
35
326 9. SOLUTIONS TO CHAPTER EXERCISES
where
0 (z) =
d
dz
(z)
is the trigamma function.
The expected information matrix is
J (; ) = E [I (; ) ;X1; :::; Xn]
= n
24 0 () 1
1

2E( X;;)
3

2
35
= n
24 0 () 1
1

2
2

2
35
since E

X;;

= .
S (; ) = (0 0) must be solved numerically to …nd the maximum likelihood estimates
of and .
9.6. CHAPTER 7 327
Exercise 7.1.14
The data are
1:58 2:78 2:81 3:29 3:45 3:64 3:81 4:69 4:89 5:37
5:47 5:52 5:87 6:07 6:11 6:12 6:26 6:42 6:74 7:49
7:93 7:99 8:14 8:31 8:72 9:26 10:10 12:82 15:22 17:82
The maximum likelihood estimates of and can be found using Newton’s Methodh
(i+1) (i+1)
i
=
h
(i) (i)
i
+ S

(i); (i)
h
I

(i); (i)
i1
for i = 0; 1; : : :
Here is R code for Newton’s Method for the Gamma Example.
# function for calculating Gamma score for a and b and data x
GASF<-function(a,b,x))
{t1<-sum(log(x))
t2<-sum(x)
n<-length(x)
S<-c(t1-n*(digamma(a)+log(b)),t2/b^2-n*a/b)
return(S)}
#
# function for calculating Gamma information for a and b and data x
GAIF<-function(a,b,x)
{I<-length(x)*cbind(c(trigamma(a),1/b),c(1/b,2*mean(x)/b^3-a/b^2))
return(I)}
#
# Newton’s Method for Gamma Example
NewtonGA<-function(a,b,x)
{thold<-c(a,b)
thnew<-thold+0.1
while (sum(abs(thold-thnew))>0.0000001)
{thold<-thnew
thnew<-thold+GASF(thold[1],thold[2],x)%*%solve(GAIF(thold[1],thold[2]))
print(thnew)}
return(thnew)}
thetahat<-NewtonGA(2,2,x)
328 9. SOLUTIONS TO CHAPTER EXERCISES
The maximum likelihood estimates are ^ = 4:118407 and ^ = 1:657032.
The score vector evaluated at (^; ^) is
S(^; ^) =
h
0 1:421085 1014
i
which indicates we have obtained a local extrema.
The observed information matrix is
I(^; ^) =
"
8:239505 18:10466
18:10466 44:99752
#
Note that since det[I(^; ^)] = (8:239505) (44:99752) (18:10466)2 > 0 and
[I(^; ^)]11 = 8:239505 > 0 then by the second derivative test we have found the maximum
likelihood estimates.
9.6. CHAPTER 7 329
Exercise 7.2.3
(a) The following R code generates the required likelihood regions.
# function for calculating Gamma relative likelihood for parameters a and b and
data x
GARLF<-function(a,b,that,x)
{t<-prod(x)
t2<-sum(x)
n<-length(x)
ah<-that[1]
bh<-that[2]
L<-((gamma(ah)*bh^ah)/(gamma(a)*b^a))^n
*t^(a-ah)*exp(t2*(1/bh-1/b))
return(L)}
a<-seq(1,8.5,0.02)
b<-seq(0.2,4.5,0.01)
R<-outer(a,b,FUN = GARLF,thetahat,x)
contour(a,b,R,levels=c(0.01,0.05,0.10,0.50,0.9),xlab="a",ylab="b",lwd=2)
The 1%, 5%, 10%, 50%, and 90% likelihood regions for (; ) are shown in Figure 9.14.
a
b
2 4 6 8
1
2
3
4
Figure 9.14: Likelihood regions for Gamma for 30 observations
330 9. SOLUTIONS TO CHAPTER EXERCISES
The likelihood contours are not very elliptical in shape. The contours suggest that large
values of together with small values of or small values of together with large values
of are plausible given the observed data.
(b) Since R (3; 2:7) = 0:14 the point (3; 2:7) lies inside a 10% likelihood region so it is a
plausible value of (; ).
(d) The 1%, 5%, 10%, 50%, and 90% likelihood regions for (; ) for 100 observations are
shown in Figure 9.15. We note that for a larger number of observations the likelihood
regions are more elliptical in shape.
a
b
2.0 2.5 3.0 3.5 4.0 4.5 5.0
1.
5
2.
0
2.
5
3.
0
3.
5
4.
0
4.
5
Figure 9.15: Likelihood regions for Gamma for 100 observations
9.6. CHAPTER 7 331
Exercise 7.4.4
The following R code graphs the approximate con…dence regions. The function ConfRegion
was used in Example 7.4.3.
# graph approximate confidence regions
c<-outer(a,b,FUN = ConfRegion,thetahat,Ithetahat)
contour(a,b,c,levels=c(4.61,5.99,9.21),xlab="a",ylab="b",lwd=2)
a
b
2 4 6 8
1
2
3
4
Figure 9.16: Approximate con…dence regions for Gamma for 30 observations
These approximate con…dence regions which are ellipses are very di¤erent than the
likelihood regions in Figure 9.14. In particular we note that (; ) = (3; 2:7) lies inside a
10% likelihood region but outside a 99% approximate con…dence region.
There are only 30 observations and these di¤erences suggest the Normal approximation
is not very good. The likelihood regions are a better summary of the uncertainty in the
estimates.
332 9. SOLUTIONS TO CHAPTER EXERCISES
Exercise 7.4.7
Let h
I(^; ^)
i1
=
"
v^11 v^12
v^12 v^22
#
Since h
~ ~
i
[J(~; ~)]1=2 !D Z BVN
h
0 0
i
;
"
1 0
0 1
#!
then for large n, V ar(~) v^11, V ar(~) v^22 and Cov(~; ~) v^12. Therefore an approxi-
mate 95% con…dence interval for a is given by
[^ 1:96
p
v^11; ^+ 1:96
p
v^11]
and an approximate 95% con…dence interval for b is given by
[^ 1:96
p
v^22; ^ + 1:96
p
v^22]
For the data in Exercise 7.1.14, ^ = 4:118407 and ^ = 1:657032 andh
I(^; ^)
i1
=
"
8:239505 18:10466
18:10466 44:99752
#1
=
"
1:0469726 0:4212472
0:4212472 0:1917114
#
An approximate 95% marginal con…dence interval for is
[4:118407 + 1:96
p
1:0469726; 4:118407 1:96
p
1:0469726] = [2:066341; 6:170473]
An approximate 95% con…dence interval for is
[1:657032 1:96
p
0:1917114; 1:657032 + 1:96
p
0:1917114] = [1:281278; 2:032787]
9.6. CHAPTER 7 333
To obtain an approximate 95% marginal con…dence interval for + we note that
V ar(~+ ~) = V ar(~) + V ar(~) + 2Cov(~; ~)
v^11 + v^22 + 2v^12 = v^
so that an approximate 95% con…dence interval for + is given by
[^+ ^ 1:96
p
v^; ^+ ^ + 1:96
p
v^]
For the data in Example 7.1.13
^+ ^ = 4:118407 + 1:657032 = 5:775439
v^ = v^11 + v^22 + 2v^12 = 1:0469726 + 0:1917114 + 2(0:4212472) = 0:3961895
and an approximate 95% marginal con…dence interval for + is
[5:775439 + 1:96
p
0:3961895; 5:775439 1:96
p
0:3961895] = [4:998908; 6:551971]
334 9. SOLUTIONS TO CHAPTER EXERCISES
9.7 Chapter 8
Exercise 8.1.7
(a) Let X = number of successes in n trails. Then X Binomial(n; ) and E (X) = n.
If the null hypothesis is H0 : = 0 and the alternative hypothesis is HA : 6= 0 then a
suitable test statistic is D = jX n0j.
For n = 100, x = 42, and 0 = 0:5 the observed value of D is d = jx n0j = j42 50j = 8.
The p-value is
P (jX n0j jx n0j ;H0 : = 0)
= P (jX 50j 8) where X Binomial (50; 0:5)
= P (X 42) + P (X 58)
= 0:06660531 + 0:06660531
= 0:1332106
calculated using R. Since p-value > 0:1 there is no evidence based on the data against
H0 : = 0:5.
(b) If the null hypothesis is H0 : = 0 and the alternative hypothesis is HA : < 0 then
a suitable test statistic is D = n0 X.
For n = 100, x = 42, and 0 = 0:5 the observed value of D is d = n0 x = 50 42 = 8.
The p-value is
P (n0 X n0 x;H0 : = 0)
= P (50X 8) where X Binomial (50; 0:5)
= P (X 42)
= 0:06660531
calculated using R. Since 0:05 < p-value < 0:1 there is weak evidence based on the data
against H0 : = 0:5.
Exercise 8.2.5
The model for these data is (X1; X2; : : : ; X7) Multinomial(63; 1; 2; : : : ; 7) and the
hypothesis of interest is H0 : 1 = 2 = = 7 = 17 . Since the model and parameters
are completely speci…ed this is a simple hypothesis. Since
7P
j=1
j = 1 there are only k = 6
parameters.
The likelihood ratio test statistic, which can be derived in the same way as Example 8.2.4,
is
(X;0) = 2
7P
i=1
Xj log

Xj
Ej

where Ej = 63=7 is the expected frequency for outcome j.
9.7. CHAPTER 8 335
For these data the observed value of the likelihood ratio test statistic is
(x;0) = 2
7P
i=1
xj log

xj
63=7

= 2

22 log

22
63=7

+ 7 log

7
63=7

+ + 6 log

6
63=7

= 23:27396
The approximate p-value is
p-value P (W 23:27396) where W 2(6)
= 0:0007
calculated using R. Since p-value < 0:001 there is strong evidence based on the data against
the hypothesis that the deaths are equally likely to occur on any day of the week.
Exercise 8.3.3
(a)
= f(1; 2) : 1 > 0; 2 > 0g which has dimension k = 2 and

0 = f(1; 2) : 1 = 2; 1 > 0; 2 > 0g which has dimension q = 1 and the hypothesis
H0 : 1 = 2 is composite.
From Example 6.2.5 the likelihood function for an observed random sample x1; x2; : : : ; xn
from an Poisson(1) distribution is
L1(1) =
nx
1 e
n1 for 1 0
with maximum likelihood estimate ^1 = x.
Similarly the likelihood function for an observed random sample y1; y2; : : : ; ym from an
Poisson(2) distribution is
L2(2) =
ny
2 e
n2 for 2 0
with maximum likelihood estimate ^2 = y. Since the samples are independent the likelihood
function for (1; 2) is
L (1; 2) = L1(1)L2(2) for 1 0; 2 0
and the log likelihood function
l (1; 2) = nx log 1 n1 +my log 2 m2 for 1 > 0; 2 > 0
The independence of the samples also implies the maximum likelihood estimators are
~1 = X and ~2 = Y . Therefore
l(~1; ~2;X;Y) = n X log X n X +m Y log Y m Y
= n X log X +m Y log Y n X +m Y
336 9. SOLUTIONS TO CHAPTER EXERCISES
If 1 = 2 = then the log likelihood function is
l () = (nx+my) log (n+m) for > 0
which is only a function of . To determine max
(1;2)2
0
l(1; 2;X;Y) we note that
d
d
l () =
nx+my

(n+m)
and dd l () = 0 for =
nx+my
n+m and therefore
max
(1;2)2
0
l(1; 2;X;Y) =

n X +m Y

log

n X +m Y
n+m

n X +m Y
The likelihood ratio test statistic is
(X;Y;
0)
= 2

l(~1; ~2;X;Y) max
(1;2)2
0
l(1; 2;X;Y)

= n X log X +m Y log Y n X +m Y n X +m Y logn X +m Y
n+m

+

n X +m Y

= n X log X +m Y log Y n X +m Y logn X +m Y
n+m

with corresponding observed value
(x;y;
0) = nx log x+my log y (nx+my) log

nx+my
n+m

Since k q = 2 1 = 1
p-value P [W (x;y;
0)] where W 2(1)
= 2
h
1 P

Z
p
(x;y;
0)
i
where Z N (0; 1)
(b) For n = 10,
10P
i=1
xi = 22, m = 15,
15P
i=1
yi = 40 the observed value of the likelihood ratio
test statistic is (x;y;
0) = 0:2672013 and
p-value P [W 0:2672013] where W 2(1)
= 2
h
1 P

Z
p
0:2672013
i
where Z N (0; 1)
= 0:6052154
calculated using R. Since p-value > 0:5 there is no evidence against H0 : 1 = 2 based on
the data.
10. Solutions to Selected End of
Chapter Problems
337
338 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
10.1 Chapter 2
1(a) Starting with
1P
x=0
x =
1
1 for jj < 1
it can be shown that
1P
x=1
xx =

(1 )2 for jj < 1 (10.1)
1P
x=1
x2x1 =
1 +
(1 )3 for jj < 1 (10.2)
1P
x=1
x3x1 =
1 + 4 + 2
(1 )4 for jj < 1 (10.3)
(1)
1
k
=
1P
x=1
xx =

(1 )2 using (10.1) gives k =
(1 )2

and therefore
f(x) = (1 )2 xx1 for x = 1; 2; :::; 0 < < 1
The graph of f (x) in Figure 10.1 is for = 0:3.
1 2 3 4 5 6 7 8
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
x
f(x)
Figure 10.1: Graph of f (x)
(2)
F (x) =
8><>:
0 if x < 1
xP
t=1
(1 )2 tt1 = 1 (1 + x x) x for x = 1; 2; :::
10.1. CHAPTER 2 339
Note that F (x) is speci…ed by indicating its value at each jump point.
(3)
E (X) =
1P
x=1
x (1 )2 xx1 = (1 )2
1P
x=1
x2x1
= (1 )2 (1 + )
(1 )3 using (10.2)
=
(1 + )
(1 )
E

X2

=
1P
x=1
x2 (1 )2 xx1 = (1 )2
1P
x=1
x3x1
= (1 )2 (1 + 4 +
2)
(1 )4 using (10.3)
=
(1 + 4 + 2)
(1 )2
V ar(X) = E(X2) [E (X)]2 = (1 + 4 +
2)
(1 )2
(1 + )2
(1 )2 =
2
(1 )2
(4) Using = 0:3,
P (0:5 < X 2) = P (X = 1) + P (X = 2)
= 0:49 + (0:49)(2)(0:3) = 0:784
P (X > 0:5jX 2) = P (X > 0:5; X 2)
P (X 2)
=
P (X = 1) + P (X = 2)
P (X 2) = 1
1:(b) (1)
1
k
=
1Z
1
1
1 + (x=)2
dx = 2
1Z
0
1
1 + (x=)2
dx because of symmetry
= 2
1Z
0
1
1 + y2
dy let y =
1

x; dy = dx
= 2 lim
b!1
arctan (b) = 2

2

=
Thus k = 1 and
f(x) =
1

h
1 + (x=)2
i for x 2 <; > 0
340 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
The graphs for = 0:5, 1 and 2 are plotted in Figure 10.2. The graph for each di¤erent
value of is obtained from the graph for = 1 by simply relabeling the x and y axes. That
is, on the x axis, each point x is relabeled x and on the y axis, each point y is relabeled
y=. The graph of f (x) below is for = 1: Note that the graph of f (x) is symmetric about
the y axis.
-5 0 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
x
f (x)
q=0.5
q=0
q=2
Figure 10.2: Cauchy probability density functions
(2)
F (x) =
xZ
1
1

h
1 + (t=)2
idt = 1


lim
b!1
arctan

t


jxb

=
1

arctan
x


+
1
2
for x 2 <
(3) Consider the integral
1Z
0
x

h
1 + (x=)2
idx = 1Z
0
t
1 + t2
dt = lim
b!1
1
2
ln

1 + b2

= +1
Since the integral
1Z
0
x

h
1 + (x=)2
idx
does not converge, the integral
1Z
1
x

h
1 + (x=)2
idx
10.1. CHAPTER 2 341
does not converge absolutely and E (X) does not exist. Since E (X) does not exist V ar (X)
does not exist.
(4) Using = 1,
P (0:5 < X 2) = F (2) F (0:5)
=
1

[arctan (2) arctan (0:5)]
0:2048
P (X > 0:5jX 2) = P (X > 0:5; X 2)
P (X 2)
=
P (0:5 < X 2)
P (X 2) =
F (2) F (0:5)
F (2)
=
arctan (2) arctan (0:5)
arctan (2) + 2
0:2403
1:(c) (1)
1
k
=
1Z
1
ejxjdx let y = x , then dy = dx
=
1Z
1
ejyjdy = 2
1Z
0
eydy by symmetry
= 2 (1) = 2 (0!) = 2
Thus k = 12 and
f (x) =
1
2
ejxj for x 2 <; 2 <
The graphs for = 1, 0 and 2 are plotted in Figure 10.3. The graph for each di¤erent
value of is obtained from the graph for = 0 by simply shifting the graph for = 0 to
the right units if is positive and shifting the graph for = 0 to the left units if is
negative. Note that the graph of f (x) is symmetric about the line x = .
(2)
F (x) =
8>>><>>>:
xR
1
1
2e
tdt x
R
1
1
2e
tdt+
xR

1
2e
t+dt x >
=
8<:
1
2e
x x
1
2 +
1
2 e
t+jx

= 1 12ex+ x >
342 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
-5 0 5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
f(x)
q=-1 q=0 q=2
Figure 10.3: Double Exponential probability density functions
(3) Since the improper integral
1Z

(x ) e(x)dx =
1Z
0
yeydy = (2) = 1! = 1
converges, the integral 12
1R
1
xejxjdx converges absolutely and by the symmetry of f(x)
we have E (X) = .
E

X2

=
1
2
1Z
1
x2ejxjdx let y = x , then dy = dx
=
1
2
1Z
1
(y + )2 ejyjdy =
1
2
1Z
1

y2 + 2y + 2

ejyjdy
=
1Z
0
y2eydy + 0 + 2
1Z
0
eydy using the properties of even/odd functions
= (3) + 2 (1) = 2! + 2
= 2 + 2
Therefore
V ar(X) = E(X2) [E (X)]2 = 2 + 2 2 = 2
(4) Using = 0,
P (0:5 < X 2) = F (2) F (0:5) = 1
2
(e0:5 e2) 0:2356
10.1. CHAPTER 2 343
P (X > 0:5jX 2) = P (X > 0:5; X 2)
P (X 2) =
P (0:5 < X 2)
P (X 2) =
F (2) F (0:5)
F (2)
=
1
2(e
0:5 e2)
1 12e2
0:2527
1:(e) (1)
1
k
=
1Z
0
x2exdx let y = x;
1

dy = dx
=
1
3
1Z
0
y2eydy =
1
3
(3) =
2!
3
=
2
3
Thus k =
3
2 and
f (x) =
1
2
3x2ex for x 0; > 0
The graphs for = 0:5, 1 and 2 are plotted in Figure 10.4. The graph for each di¤erent
value of is obtained from the graph for = 1 by simply relabeling the x and y axes. That
is, on the x axis, each point x is relabeled x= and on the y axis, each point y is relabeled
y.
0 2 4 6 8 10
0
0.05
0.1
0.15
0.2
0.25
0.3
x
f(x)
q=0.5 q=1 q=2
Figure 10.4: Gamma probabilty density functions
(2)
F (x) =
8><>:
0 if x 0
1
2
xR
0
3t2etdt if x > 0
344 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Using integration by parts twice we have
1
2
xZ
0
3t2etdt =
1
2
24 (t)2 etjx0 + 2 xZ
0
tetdt
35
=
1
2
24 (x)2 ex + 2
8<: (t) etjx0 +
xZ
0
etdt
9=;
35
=
1
2
h
(x)2 ex 2 (x) ex 2
n
etjx0
oi
=
1
2
h
(x)2 ex 2 (x) ex 2ex + 2
i
= 1 1
2
ex

2x2 + 2x+ 2

for x > 0
Therefore
F (x) =
8<:0 if x 01 12ex 2x2 + 2x+ 2 if x > 0
(3)
E (X) =
1
2
1Z
0
3x3exdx let y = x;
1

dy = dx
=
1
2
1Z
0
y3eydy =
1
2
(4) =
3!
2
=
3

E

X2

=
1
2
1Z
0
3x4exdx let y = x;
1

dy = dx
=
1
22
1Z
0
y4eydy =
1
22
(5) =
4!
22
=
12
2
and
V ar (X) = E

X2
[E (X)]2 = 12
2


3

2
=
3
2
10.1. CHAPTER 2 345
(4) Using = 1,
P (0:5 < X 2) = F (2) F (0:5)
=

1 1
2
ex

x2 + 2x+ 2
j20:5
=

1 1
2
e2 (4 + 4 + 2)



1 1
2
e0:5

1
4
+ 1 + 2

=
1
2

e0:5

13
4

e2 (10)

0:3089
P (X > 0:5jX 2) = P (X > 0:5; X 2)
P (X 2)
=
P (0:5 < X 2)
P (X 2) =
F (2) F (0:5)
F (2)
=
1
2

e0:5

13
4
e2 (10)
1 12e2 (10)
0:9555
1:(g) (1) Since
f (x) = e
x=


1 + ex=
2 = ex=
e2x=

1 + ex=
2 = ex=


1 + ex=
2 = f (x)
therefore f is an even function which is symmetric about the y axis.
1
k
=
1Z
1
f(x)dx =
1Z
1
ex=
1 + ex=
2dx
= 2
1Z
0
ex=
1 + ex=
2dx by symmetry
= 2

lim
b!1
1
1 + ex=
jb0

= 2

lim
b!1
1
1 + eb=
1
2

= 2

1
2

=
Therefore k = 1=.
(2)
F (x) =
xZ
1
et=
1 + et=
2dt = lima!1 11 + et= jxa
=
1
1 + ex=
lim
a!1
1
1 + ea=
=
1
1 + ex=
for x 2 <
346 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
-10 -5 0 5 10
0
0.05
0.1
0.15
0.2
0.25
x
f(x)
Figure 10.5: Graph of f (x) for = 2
(3) Since f is a symmetric function about x = 0 then if E (jXj) exists then E (X) = 0.
Now
E (jXj) = 2
1Z
0
xex=
1 + ex=
2dx
Since
1Z
0
x

ex=dx =
1Z
0
yeydy = (2)
converges and
x

ex= xe
x=
1 + ex=
2 for x 0
therefore by the Comparison Test for Integrals the improper integral
E (jXj) = 2
1Z
0
xex=
1 + ex=
2dx
converges and thus E (X) = 0.
10.1. CHAPTER 2 347
By symmetry
E

X2

=
1Z
1
x2ex=
1 + ex=
2dx
= 2
0Z
1
x2ex=
1 + ex=
2dx let y = x=
= 22
0Z
1
y2ey
(1 + ey)2
dy
Using integration by parts with
u = y2; du = 2ydy; dv =
ey
(1 + ey)2
; v =
1
(1 + ey)
we have
0Z
1
y2ey
(1 + ey)2
dy = lim
a!1
y2
(1 + ey)
j0a
0Z
1
2y
1 + ey
dy
= lim
a!1
a2
(1 + ea)
2
0Z
1
y
1 + ey
dy lim
a!1 e
a =1
= lim
a!1
2a
ea 2
0Z
1
y
1 + ey
dy by L’Hospital’s Rule
= lim
a!1
2
ea
2
0Z
1
y
1 + ey
dy by L’Hospital’s Rule
= 0 2
0Z
1
y
1 + ey
dy multiply by
ey
ey
= 2
0Z
1
yey
1 + ey
dy
Let
u = ey; du = eydy; log u = y
to obtain
0Z
1
yey
1 + ey
dy =
1Z
0
log u
1 + u
du =
2
12
348 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
This de…nite integral can be found at https://en.wikipedia.org/wiki/List_of_de…nite_integrals.
1Z
0
log u
1 + u
du =
2
12
Therefore
E

X2

= 22

2


2
12

=
22
3
and
V ar (X) = E

X2
[E (X)]2 = 22
3
02
=
22
3
(4)
P (0:5 < X 2) = F (2) F (0:5)
=
1
1 + e2=2
1
1 + e0:5=2
using = 2
=
1
1 + e1
1
1 + e0:25
0:1689
P (X > 0:5jX 2) = P (X > 0:5; X 2)
P (X 2)
=
P (0:5 < X 2)
P (X 2)
=
F (2) F (0:5)
F (2)
=

1
1 + e1
1
1 + e0:25

1 + e1

0:2310
10.1. CHAPTER 2 349
2:(a) Since
f(x; ) = (1 )2 xx1
therefore
f0 (x) = f(x; = 0) = 0
and
f1 (x) = f(x; = 1) = 0
Since
f(x; ) 6= f0 (x ) and f(x; ) 6= 1

f1
x


therefore is neither a location nor scale parameter.
2: (b) Since
f(x; ) =
1

h
1 + (x=)2
i for x 2 <; > 0
therefore
f1 (x) = f(x; = 1) =
1
(1 + x2)
for x 2 <
Since
f(x; ) =
1

f1
x


for all x 2 <; > 0
therefore is a scale parameter for this distribution.
2: (c) Since
f (x; ) =
1
2
ejxj for x 2 <; 2 <
therefore
f0 (x) = f(x; = 0) =
1
2
ejxj for x 2 <
Since
f(x; ) = f0 (x ) for x 2 <; 2 <
therefore is a location parameter for this distribution.
2: (e) Since
f(x; ) =
1
2
3x2ex for x 0; > 0
therefore
f1 (x) = f(x; = 1) =
1
2
x2ex for x 0
Since
f(x; ) = f1 (x) for x 0; > 0
therefore 1= is a scale parameter for this distribution.
350 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
4: (a) Note that f (x) can be written as
f(x) =
8>><>>:
kec
2=2ec(x) x < c
ke(x)
2=2 c x + c
kec
2=2ec(x) x > + c
Therefore
1
k
= ec
2=2
cZ
1
ec(x)dx+
+cZ
c
e(x)
2=2dx+ ec
2=2
1Z
+c
ec(x)dx
= 2ec
2=2
1Z
c
ecudu+
p
2
cZ
c
1p
2
ez
2=2dz
= 2ec
2=2

lim
b!1

1
c
ecujbc

+
p
2P (jZj c) where Z N (0; 1)
=
2
c
ec
2=2ec
2
+
p
2 [2 (c) 1] where is the N (0; 1) c.d.f.
=
2
c
ec
2=2 +
p
2 [2 (c) 1]
as required.
4: (b) If x < c then
F (x) = kec
2=2
xZ
1
ec(u)du; let y = u
= kec
2=2
xZ
1
ecydy
= kec
2=2

lim
a!1

1
c
ecyjxa

=
k
c
ec
2=2+c(x)
and F ( c) = kc ec
2=2.
10.1. CHAPTER 2 351
If c x + c then
F (x) =
k
c
ec
2=2 + k
p
2
xZ
c
1p
2
e(u)
2=2du let z = u
=
k
c
ec
2=2 + k
p
2
xZ
c
1p
2
ez
2=2dz
=
k
c
ec
2=2 + k
p
2 [ (x ) (c)]
=
k
c
ec
2=2 + k
p
2 [ (x ) + (c) 1] :
If x > + c then
F (x) = 1 kec2=2
1Z
x
ec(u)du let y = u
= 1 kec2=2
1Z
x
ecydy
= 1 kec2=2

lim
b!1

1
c
ecyjbx

= 1 k
c
ec
2=2c(x)
Therefore
F (x) =
8>><>>:
k
c e
c2=2+c(x) x < c
k
c e
c2=2 + k
p
2 [ (x ) + (c) 1] c x + c
1 kc ec
2=2c(x) x > + c
Since is a location parameter (see part (c))
E

Xk

=
1Z
1
xkf (x) dx =
1Z
1
xkf0 (x ) dx let y = u (10.4)
=
1Z
1
(y + )k f0 (y) dy
352 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
In particular
E (X) =
1Z
1
(y + ) f0 (y) dy =
1Z
1
yf0 (y) dy +
1Z
1
f0 (y) dy
=
1Z
1
yf0 (y) dy + (1) since f0 (y) is a p.d.f.
=
1Z
1
yf0 (y) dy +
Now
1
k
1Z
1
yf0 (y) dy = e
c2=2
cZ
1
yecydy +
cZ
c
yey
2=2dy + ec
2=2
1Z
c
yecydy
let y = u in the …rst integral
= ec2=2
1Z
c
uecudu+
cZ
c
yey
2=2dy + ec
2=2
1Z
c
yecydy
By integration by parts
1Z
c
yecydy = lim
b!1
241
c
yecyjbc +
1
c
bZ
c
ecydy
35
= lim
b!1

1
c
yecyjbc
1
c2
ecyjbc

=

1 +
1
c2

ec
2
(10.5)
Also since g (y) = yey2=2 is a bounded odd function and [c; c] is a symmetric interval
about 0
+cZ
c
yey
2=2dy = 0
Therefore
1
k
1Z
1
yf0 (y) dy =

1 +
1
c2

ec
2
+ 0 +

1 +
1
c2

ec
2
= 0
and
E (X) =
1Z
1
yf0 (y) dy + = 0 + =
10.1. CHAPTER 2 353
To determine V ar (X) we note that
V ar (X) = E

X2
[E (X)]2
=
1Z
1
(y + )2 f0 (y) dy 2 using (10:4)
=
1Z
1
y2f0 (y) dy + 2
1Z
1
yf0 (y) dy +
2
1Z
1
f0 (y) dy 2
=
1Z
1
y2f0 (y) dy + 2 (0) +
2 (1) 2
=
1Z
1
y2f0 (y) dy
Now
1
k
1Z
1
y2f0 (y) dy = e
c2=2
cZ
1
y2ecydy +
cZ
c
y2ey
2=2dy + ec
2=2
1Z
c
y2ecydy
(let y = u in the …rst integral)
= ec
2=2
1Z
c
u2ecudu+
cZ
c
y2ey
2=2dy + ec
2=2
1Z
c
y2ecydy
= 2ec
2=2
1Z
c
y2ecydy +
cZ
c
y2ey
2=2dy
By integration by parts and using (10.5) we have
1Z
c
y2ecydy = lim
b!1
241
c
y2ecyjbc +
2
c
bZ
c
yecydy
35
= cec
2
+
2
c

1 +
1
c2

ec
2

=

c+
2
c
+
2
c3

ec
2
354 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Also
+cZ
c
y2ey
2=2dy = 2
cZ
0
y2ey
2=2dy
= 2
24yey2=2jc0 +p2 cZ
0
1p
2
ey
2=2dy
35 using integration by parts
= 2
n
cec2=2 +
p
2 [ (c) 0:5]
o
=
p
2 [2 (c) 1] 2cec2=2
Therefore
V ar (X) =
1Z
1
y2f0 (y) dy
= k

2

c+
2
c
+
2
c3

ec
2
+
p
2 [2 (c) 1] 2cec2=2

=
(
1
2
ce
c2=2 +
p
2 [2 (c) 1]
)

2

c+
2
c
+
2
c3

ec
2
+
p
2 [2 (c) 1] 2cec2=2

4: (c) Let
f0 (x) = f (x; = 0)
=
(
kex2=2 if jxj c
kecjxj+c2=2 if jxj > c
Since
f0 (x ) =
(
ke(x)
2=2 if jx j c
kecjxj+c2=2 if jx j > c
= f (x)
therefore is a location parameter for this distribution.
10.1. CHAPTER 2 355
4: (d) On the graph in Figure 10.6 we have graphed f(x) for c = 1, = 0 (red), f(x) for
c = 2, = 0 (blue) and the N(0; 1) probability density function (black).
-5 0 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
x
f(x)
N(0,1) p.d.f. (black)
c=1 (red)
c=2 (blue)
Figure 10.6: Graphs of f(x) for c = 1, = 0 (red), c = 2, = 0 and the N(0; 1) p.d.f.
(black).
We note that there is very little di¤erence between the graphs of the N(0; 1) probability
density function and the graph for f(x) for c = 2, = 0 however as c becomes smaller
(c = 1) the “tails”of the probability density function become much “fatter”relative to the
N(0; 1) probability density function.
356 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
5: (a) Since X Geometric(p)
P (X k) =
1P
x=k
p (1 p)x = p (1 p)
k
1 (1 p) by the Geometric Series
= (1 p)k for k = 0; 1; : : : (10.6)
Therefore
P (X k + jjX k) = P (X k + j;X k)
P (X k) =
P (X k + j)
P (X k)
=
(1 p)k+j
(1 p)k by (10:6)
= (1 p)j
= P (X j) for j = 0; 1; : : : (10.7)
Suppose we have a large number of items which are to be tested to determine if they are
defective or not. Suppose a proportion p of these items are defective. Items are tested
one after another until the …rst defective item is found. If we let the random variable X
be the number of good items found before observing the …rst defective item then X
Geometric(p) and (10.6) holds. Now P (X j) is the probability we …nd at least j good
items before observing the …rst defective item and P (X k + jjX k) is the probability
we …nd at least j more good items before observing the …rst defective item given that
we have already observed at least k good items before observing the …rst defective item.
Since these probabilities are the same for all nonnegative integers by (10.7), this implies
that, no matter how many good items we have already observed before observing the …rst
defective item, the probability of …nding at least j more good items before observing the
…rst defective item is the same as when we …rst began testing. It is like we have “forgotten”
that we have already observed at least k good items before observing the …rst defective
item. In other words, conditioning on the event that we have already observed at least
k good items before observing the …rst defective item does not a¤ect the probability of
observing at least j more good items before observing the …rst defective item.
5: (b) If Y Exponential() then
P (Y a) =
1Z
a
1

ey=dy = ea= for a > 0 (10.8)
Therefore
P (Y a+ bjY a) = P (Y a+ b; Y a)
P (Y a) =
P (Y a+ b)
P (Y a)
=
e(a+b)=
ea=
by (10.8)
= eb= = P (Y b) for all a; b > 0
as required.
10.1. CHAPTER 2 357
6: Since f1 (x) ; f2 (x) ; : : : ; fk (x) are probability density functions with support sets A1;
A2; : : : ; Ak then we know that fi (x) > 0 for all x 2 Ai; i = 1; 2; : : : ; k. Also since
0 < p1; p2; : : : ; pk 1 with
kP
i=1
pi = 1, we have that g (x) =
kP
i=1
pifi (x) > 0 for all
x 2 A =
kS
i=1
Ai and A = support set of X. Also
1Z
1
g(x)dx =
kX
i=1
pi
1Z
1
fi (x) dx =
kX
i=1
pi (1) =
kX
i=1
pi = 1
Therefore g (x) is a probability density function.
Now the mean of X is given by
E (X) =
1Z
1
xg(x)dx =
kX
i=1
pi
1Z
1
xfi (x) dx
=
kX
i=1
pii
As well
E

X2

=
1Z
1
x2g(x)dx =
kX
i=1
pi
1Z
1
x2fi (x) dx
=
kX
i=1
pi

2i +
2
i

since
1Z
1
x2fi (x) dx =
1Z
1
(x i)2 fi (x) dx+ 2i
1Z
1
xfi (x) dx 2i
1Z
1
fi (x) dx
= 2i + 2
2
i 2i
= 2i +
2
i
Thus the variance of X is
V ar (X) =
kX
i=1
pi

2i +
2
i
kX
i=1
pii
!2
358 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
7:(a) Since X Gamma(; ) the probability density function of X is
f(x) =
x1ex=
()
for x > 0
and 0 otherwise. Let A = fx : f(x) > 0g = fx : x > 0g. Now y = ex = h(x) is a one-to-one
function on A and h maps the set A to the set B = fy : y > 1g. Also
x = h1 (y) = log y and
d
dy
h1 (y) =
1
y
The probability density function of Y is
g (y) = f

h1 (y)
ddyh1 (y)

=
(log y)1 e log y=
()

1
y

=
(log y)1 y1=1
()
for y 2 B
and 0 otherwise.
7:(b) Since X Gamma(; ) the probability density function of X is
f(x) =
x1ex=
()
for x > 0
and 0 otherwise. Let A = fx : f(x) > 0g = fx : x > 0g. Now y = 1=x = h(x) is a one-to-one
function on A and h maps the set A to the set B = fy : y > 0g. Also
x = h1 (y) =
1
y
and
d
dy
h1 (y) =
1
y2
The probability density function of Y is
g (y) = f

h1 (y)
ddyh1 (y)

=
y1e1=(y)
()
for y 2 B
and 0 otherwise. This is the probability density function of an Inverse Gamma(; ) random
variable. Therefore Y = X1 Inverse Gamma(; ).
7:(c) Since X Gamma(k; ) the probability density function of X is
f(x) =
xk1ex=
(k)k
for x > 0
10.1. CHAPTER 2 359
and 0 otherwise. Let A = fx : f(x) > 0g = fx : x > 0g. Now y = 2x= = h(x) is a
one-to-one function on A and h maps the set A to the set B = fy : y > 0g. Also
x = h1 (y) =
y
2
and
d
dy
h1 (y) =

2
The probability density function of Y is
g (y) = f

h1 (y)
ddyh1 (y)

=
yk1ey=2
(k) 2k
for y 2 B
and 0 otherwise for k = 1; 2; : : : which is the probability density function of a 2 (2k) random
variable. Therefore Y = 2X= 2 (2k).
7:(d) Since X N; 2 the probability density function of X is
f(x) =
1p
2
e
1
22
(x)2 for x 2 <
Let A = fx : f(x) > 0g = <. Now y = ex = h(x) is a one-to-one function on A and h maps
the set A to the set B = fy : y > 0g. Also
x = h1 (y) = log y and
d
dy
h1 (y) =
1
y
The probability density function of Y is
g (y) = f

h1 (y)
ddyh1 (y)

=
1
y
p
2
e
1
22
(log y)2 y 2 B
Note this distribution is called the Lognormal distribution.
7:(e) Since X N; 2 the probability density function of X is
f(x) =
1p
2
e
1
22
(x)2 for x 2 <
Let A = fx : f(x) > 0g = <. Now y = x1 = h(x) is a one-to-one function on A and h
maps the set A to the set B = fy : y 6= 0; y 2 x = h1 (y) =
1
y
and
d
dy
h1 (y) =
1
y2
The probability density function of Y is
g (y) = f

h1 (y)
ddyh1 (y)

=
1p
2y2
e
1
22
h
1
y


i2
for y 2 B
360 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
7:(f) Since X Uniform2 ; 2 the probability density function of X is
f (x) =
1

for

2
< x <

2
and 0 otherwise. Let A = fx : f(x) > 0g = x : 2 < x < 2 . Now y = tan(x) = h(x)
is a one-to-one function on A and h maps A to the set B = fy : 1 < y <1g. Also
x = h1 (y) = arctan(y) and ddyh
1 (y) = 1
1+y2
. The probability density function of Y is
g (y) = f

h1 (y)
ddyh1 (y)

=
1

1
1 + y2
for y 2 <
and 0 otherwise. This is the probability density function of a Cauchy(1; 0) random variable.
Therefore Y = tan(X) Cauchy(1; 0).
7:(g) Since X Pareto(; ) the probability density function of X is
f (x) =

x+1
for x ; ; > 0
and 0 otherwise. Let A = fx : f(x) > 0g = fx : x > 0g. Now y = log (x=) = h(x)
is a one-to-one function on A and h maps A to the set B = fy : 0 y <1g. Also x =
h1 (y) = ey= and ddyh
1 (y) = e
y=. The probability density function of Y is
g (y) = f

h1 (y)
ddyh1 (y)

=


ey=

ey=
+1 = ey for y 2 B
and 0 otherwise. This is the probability density function of a Exponential(1) random
variable. Therefore Y = log (X=) Exponential(1).
7:(h) If X Weibull(2; ) the probability density function of X is
f (x) =
2xe(x=)
2
2
for x > 0; > 0
and 0 otherwise. Let A = fx : f(x) > 0g = fx : x > 0g. Now y = x2 = h(x) is a one-to-one
function on A and h maps the set A to the set B = fy : y > 0g. Also
x = h1 (y) = y1=2 and
d
dy
h1 (y) =
1
2y1=2
The probability density function of Y is
g (y) = f

h1 (y)
ddyh1 (y)

=
ey=
2
2
for y 2 B
10.1. CHAPTER 2 361
and 0 otherwise for k = 1; 2; : : : which is the probability density function of a Exponential

2

random variable. Therefore Y = X2 Exponential(2)
7:(i) Since X Double Exponential(0; 1) the probability density function of X is
f(x) =
1
2
ejxj for x 2 <
The cumulative distribution function of Y = X2 is
G (y) = P (Y y) = P X2 y = P (py X py)
=
Z py
py
1
2
ejxjdx
=
Z py
0
exdx by symmetry for y > 0
By the First Fundamental Theorem of Calculus and the chain rule the probability density
function of Y is
g (y) =
d
dy
G (y) = e
p
y d
dy
p
y
=
1
2
p
y
e
p
y for y > 0
and 0 otherwise.
7:(j) Since X t(k) the probability density function of X is
f(x) =


k+1
2



k
2
1p
k

1 +
x2
k
( k+12 )
for x 2 <
which is an even function. The cumulative distribution function of Y = X2 is
G (y) = P (Y y) = P X2 y = P (py X py)
=
p
yZ
py


k+1
2



k
2
1p
k

1 +
x2
k
( k+12 )
dx
= 2
p
yZ
0


k+1
2



k
2
1p
k

1 +
x2
k
( k+12 )
dx by symmetry for y > 0
By the First Fundamental Theorem of Calculus and the chain rule the probability density
362 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
function of Y is
g (y) =
d
dy
G (y)
= 2


k+1
2



k
2
1p
k

1 +
y
k
( k+12 ) d
dy
p
y for y > 0
= 2


k+1
2



k
2
p

1p
k

1 +
y
k
( k+12 ) 1
2
p
y
for y > 0
=


k+1
2



k
2



1
2
1
k
1=2
y
1
2
1

1 +
1
k
y
( k+12 )
for y > 0 since

1
2

=
p

and 0 otherwise. This is the probability density function of a F(1; k) random variable.
Therefore Y = X2 F(1; k).
10.1. CHAPTER 2 363
8:(a) Let
cn =


n+1
2



n
2
1p
n
If T t (n) then T has probability density function
f (t) = cn

1 +
t2
n
(n+1)=2
for t 2 <; n = 1; 2; :::
Since f (t) = f (t), f is an even function whose graph is symmetric about the y axis.
Therefore if E (jT j) exists then due to symmetry E (T ) = 0. To determine when E (jT j)
exists, again due to symmetry, we only need to determine for what values of n the integral
1Z
0
t

1 +
t2
n
(n+1)=2
dt
converges.
There are two cases to consider: n = 1 and n > 1.
For n = 1 we have
1Z
0
t

1 + t2
1
dt = lim
b!1
1
2
ln

1 + t2
jb0 = lim
b!1
1
2
ln

1 + b2

=1
and therefore E (T ) does not exist.
For n > 1
1Z
0
t

1 +
t2
n
(n+1)=2
dt = lim
b!1
n
n 1

1 +
t2
n
(n1)=2
jb0
=
n
n 1
"
1 lim
b!1

1 +
b2
n
(n1)=2#
=
n
n 1
and the integral converges. Therefore E (T ) = 0 for n > 1.
8:(b) To determine whether V ar(T ) = E

T 2

(since E (T ) = 0) exists we need to determine
for what values of n the integral
1Z
0
t2

1 +
t2
n
(n+1)=2
dt
converges.
364 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Now
1Z
0
t2

1 +
t2
n
(n+1)=2
dt
=
p
nZ
0
t2

1 +
t2
n
(n+1)=2
dt+
1Z
p
n
t2

1 +
t2
n
(n+1)=2
dt
The …rst integral is …nite since it is the integral of a …nite function over the …nite interval
[0;
p
n]. We will show that the second integral
1Z
p
n
t2

1 +
t2
n
(n+1)=2
dt
diverges for n = 1; 2.
Now 1Z
p
n
t2

1 +
t2
n
(n+1)=2
dt let y = t=
p
n
= n3=2
1Z
1
y2

1 + y2
(n+1)=2
dy (10.9)
For n = 1
y2
(1 + y2)
y
2
(y2 + y2)
=
1
2
for y 1
and since 1Z
1
1
2
dy
diverges, therefore by the Comparison Test for Improper Integrals, (10.9) diverges for n = 1.
(Note: For n = 1 we could also argue that V ar(T ) does not exist since E (T ) does not exist
for n = 1.)
For n = 2,
y2
(1 + y2)3=2
y
2
(y2 + y2)3=2
=
1
23=2y
for y 1
and since
1
23=2
1Z
1
1
y
dy
diverges, therefore by the Comparison Test for Improper Integrals, (10.9) diverges for n = 2.
10.1. CHAPTER 2 365
Now for n > 2,
E

T 2

=
1Z
1
cnt
2

1 +
t2
n
(n+1)=2
dt
= 2cn
1Z
0
t2

1 +
t2
n
(n+1)=2
dt
since the integrand is an even function. Integrate by parts using
u = t, dv = t

1 +
t2
n
(n+1)=2
dt
du = dt, v =
n
n 1

1 +
t2
n
(n1)=2
Then
E

T 2

= 2cn
"
lim
b!1
t
n
n 1

1 +
t2
n
(n+1)=2
jb0
#
+2cn

n
n 1
1Z
0

1 +
t2
n
(n1)=2
dt
= 2cn

n
n 1

lim
b!1
b
1 + b
2
n
(n+1)=2 + cn nn 1
1Z
1

1 +
t2
n
(n1)=2
dt
where we use symmetry on the second integral.
Now
lim
b!1
b
1 + b
2
n
(n+1)=2 = limb!1 1n+1
n

b

1 + b
2
n
(n1)=2 = 0
by L’Hopital’s Rule. Also
cn

n
n 1
1Z
1

1 +
t2
n
(n1)=2
dt let
yp
n 2 =
tp
n
=
cn
cn2

n
n 1

n
n 2
1=2 1Z
1
cn2

1 +
y2
n 2
(n2+1)=2
dy
=
cn
cn2

n
n 1

n
n 2
1=2
where the integral equals one since the integrand is the p.d.f. of a t (n 2) random variable.
366 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Finally
cn
cn2

n
n 1

n
n 2
1=2
=


n+1
2



n
2
p
n


n2
2
p
(n 2)


n2+1
2
n
n 1

n
n 2
1=2
=


n+1
2



n+1
2 1
n2 1


n
2
n 2
n
1=2 n
n 1

n
n 2
1=2
=

n+1
2 1



n+1
2 1



n+1
2 1
n2 1
n
2 1



n
2 1
n
n 1

=
(n 1)
2
1
(n 2) =2

n
n 1

=
n
n 2
Therefore for n > 2
V ar(T ) = E

T 2

=
n
n 2
10.1. CHAPTER 2 367
9: (a) To …nd E

Xk

we …rst note that since
f (x) =
(a+ b)
(a) (b)
xa1 (1 x)b1 for 0 < x < 1
and 0 otherwise then
1Z
0
(a+ b)
(a) (b)
xa1 (1 x)b1 dx = 1
or
1Z
0
xa1 (1 x)b1 dx = (a) (b)
(a+ b)
(10.10)
for a > 0, b > 0. Therefore
E

Xk

=
(a+ b)
(a) (b)
1Z
0
xkxa1 (1 x)b1 dx
=
(a+ b)
(a) (b)
1Z
0
xa+k1 (1 x)b1 dx
=
(a+ b)
(a) (b)
(a+ k) (b)
(a+ b+ k)
by (10:10)
=
(a+ k)
(a)
(a+ b)
(a+ b+ k)
for k = 1; 2; : : :
For k = 1 we have
E

Xk

= E (X) =
(a+ 1)
(a)
(a+ b)
(a+ b+ 1)
=
a (a)
(a)
(a+ b)
(a+ b) (a+ b)
=
a
a+ b
For k = 2 we have
E

X2

=
(a+ 2)
(a)
(a+ b)
(a+ b+ 2)
=
(a+ 1) (a) (a)
(a)
(a+ b)
(a+ b+ 1) (a+ b) (a+ b)
=
a (a+ 1)
(a+ b) (a+ b+ 1)
368 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore
V ar (X) = E

X2
[E (X)]2
=
a (a+ 1)
(a+ b) (a+ b+ 1)


a
a+ b
2
=
a (a+ 1) (a+ b) a2 (a+ b+ 1)
(a+ b)2 (a+ b+ 1)
=
a

a2 + ab+ a+ b a2 + ab+ a
(a+ b)2 (a+ b+ 1)
=
ab
(a+ b)2 (a+ b+ 1)
9: (b)
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
x
f (x)
a=1,b=3
a=0.7,b=0.7
a=3,b=1
a=2,b=4
a=2,b=2
Figure 10.7: Graphs of Beta probability density functions
9: (c) If a = b = 1 then
f (x) = 1 for 0 < x < 1
and 0 otherwise. This is the Uniform(0; 1) probability density function.
10.1. CHAPTER 2 369
10: We will prove this result assuming X is a continuous random variable. The proof for
X a discrete random variable follows in a similar manner with integrals replaced by sums.
Suppose X has probability density function f (x) and E

jXjk

exists for some integer
k > 1. Then the improper integral
1Z
1
jxjk f (x) dx
converges. Let A = fx : jxj 1g. Then
1Z
1
jxjk f (x) dx =
Z
A
jxjk f (x) dx+
Z
A
jxjk f (x) dx
Since
0 jxjk f (x) f (x) for x 2 A
we have
0
Z
A
jxjk f (x) dx
Z
A
f (x) dx = P

X 2 A 1 (10.11)
Convergence of
1R
1
jxjk f (x) dx and (10.11) imply the convergence of R
A
jxjk f (x) dx.
Now
1Z
1
jxjj f (x) dx =
Z
A
jxjj f (x) dx+
Z
A
jxjj f (x) dx for j = 1; 2; :::; k 1 (10.12)
and
0
Z
A
jxjj f (x) dx 1
by the same argument as in (10.11). Since
R
A
jxjk f (x) dx converges and
jxjk f (x) jxjj f (x) for x 2 A, j = 1; 2; :::; k 1
then by the Comparison Theorem for Improper Integrals
R
A
jxjj f (x) dx converges. Since
both integrals on the right side of (10.12) exist, therefore
E

jXjj

=
1Z
1
jxjj f (x) dx exists for j = 1; 2; :::; k 1
370 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
11: If X Binomial(n; ) then
E (X) = n
and p
V ar (X) =
p
n (1 )
Let W = X=n. Then
E (W ) =
and
V ar (W ) =
(1 )
n
From the result in Section 2:9 we wish to …nd a transformation Y = g (W ) = g (X=n) such
that V ar (Y ) constant then we need g such that
dg
d
=
kp
(1 )
where k is chosen for convenience. We need to solve the separable di¤erential equationZ
dg = k
Z
1p
(1 )d (10.13)
Since
d
dx
arcsin
p
x

=
1q
1 (px)2
d
dx
p
x

=
1p
1 x

1
2
p
x

=
1
2
p
x (1 x)
then the solution to (10.13) is
g () = k 2 arcsin(
p
) + C
Letting k = 1=2 and C = 0 we have g () = arcsin(
p
).
Therefore if X Binomial(n; ) and Y = arcsin(pX=n) then
V ar (Y ) = V ar
h
arcsin(
p
X=n)
i
constant.
10.1. CHAPTER 2 371
13:(b)
M(t) = E(etx) =
1P
x=0
etx
xe
x!
= e
1P
x=0

et
x
x!
= eee
t
by the Exponential Series
= e(e
t1) for t 2 <
M 0 (t) = e(e
t1)et
E(X) = M 0(0) =
M 00(t) = e(e
t1)(et)2 + e(e
t1)et
E(X2) = M 00(0) = 2 +
V ar(X) = E(X2) [E(X)]2 = 2 + 2 =
13:(c)
M(t) = E(etx) =
1Z

etx
1

e(x)=dx
=
e=

1Z

e
x

1

t

dx which converges for

1

t

> 0 or t <
1

Let
y =

1

t

x; dy =

1

t

dx
to obtain
M(t) =
e=

1Z

e
x

1

t

dx =
e=

1Z

e
x

1

t

dx
=
e=


1
t
1Z


1

t
e
ydy =
e=
(1 t) limb!1

eyjb


1

t

=
e=
(1 t) limb!1

e


1

t

eb

=
e=
(1 t)

e


1

t

=
et
(1 t) for t <
1

372 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
M 0(t) =
et
(1 t) +
et
(1 t)2
=
et
(1 t)2 [ (1 t) + ]
E(X) = M 0(0) = +
M 00(t) =
et
(1 t)2 () +

et
(1 t)2 +
2et
(1 t)3

[ (1 t) + ]
E(X2) = M 00(0) = + ( + 2) ( + )
= + 2 + 3 + 22 = 2 + 2 + 22
= ( + )2 + 2
V ar(X) = E(X2) [E(X)]2
= ( + )2 + 2 ( + )2
= 2
13:(d)
M(t) = E(etx) =
1
2
1Z
1
etxejxjdx
=
1
2
24 Z
1
etxexdx+
1Z

etxe(x)dx
35
=
1
2
24e Z
1
ex(t+1)dx+ e
1Z

ex(1t)dx
35
=
1
2
"
e

e(t+1)
t+ 1
!
+ e

e(1t)
1 t
!#
for t+ 1 > 0 and 1 t > 0
=
1
2

et
t+ 1
+
et
1 t

for t 2 (1; 1)
=
et
1 t2 for t 2 (1; 1)
M 0(t) =
et
1 t2 +
et (2t)
(1 t2)2
=
et
(1 t2)2 [

1 t2 + 2t]
E(X) = M 0(0) =
10.1. CHAPTER 2 373
M 00(t) =
et
(1 t2)2 [2t + 2] +

et
(1 t2)2 +
et(4t)
(1 t2)4

(1 t2) + 2t
E(X2) = M 00(0) = 2 + 2
V ar(X) = E(X2) [E(X)]2
= 2 + 2 2
= 2
13:(e)
M (t) = E

etX

=
1Z
0
2xetxxdx
Since Z
xetxdx =
1
t

x 1
t

etx + C
M (t) =
2
t

x 1
t

etxj10
=
2
t

1 1
t

et 2
t

1
t

=
2

(t 1) et + 1
t2
; if t 6= 0
For t = 0, M (0) = E (1) = 1. Therefore
M (t) =
8<:1 if t = 02[(t1)et+1]
t2
if t 6= 0
Note that
lim
t!0
M (t) = lim
t!0
2

(t 1) et + 1
t2
indeterminate of the form

0
0

= lim
t!0
2

et + (t 1) et
2t
by Hospital’s Rule
= lim
t!0
et + (t 1) et
t
indeterminate of the form

0
0

= lim
t!0

et + et + (t 1) et by Hospital’s Rule
= 1 + 1 1 = 1
= M (0)
Therefore M (t) exists and is continuous for all t 2 <.
374 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Using the Exponential series we have for t 6= 0
2

(t 1) et + 1
t2
=
2
t2

(t 1)
1P
i=0
ti
i!
+ 1

=
2
t2
1P
i=0
ti+1
i!

1P
i=0
ti
i!
+ 1

=
2
t2

t+
1P
i=1
ti+1
i!


1 + t+
1P
i=2
ti
i!

+ 1

=
2
t2
1P
i=1
ti+1
i!

1P
i=2
ti
i!

=
2
t2
1P
i=1
ti+1
i!

1P
i=1
ti+1
(i+ 1)!

=
2
t2
1P
i=1

1
i!
1
(i+ 1)!

ti+1
and since
2
t2
1P
i=1

1
i!
1
(i+ 1)!

ti+1jt=0 = 1
therefore M (t) has a Maclaurin series representation for all t 2 < given by
2
t2
1P
i=1

1
i!
1
(i+ 1)!

ti+1
= 2
1P
i=1

1
i!
1
(i+ 1)!

ti1
=
1P
i=0
2

1
(i+ 1)!
1
(i+ 2)!

ti
Since E

Xk

= k! the coe¢ cient of tk in the Maclaurin series for M (t) we have
E (X) = (1!) (2)

1
(1 + 1)!
1
(1 + 2)!

= 2

1
2
1
6

=
2
3
and
E

X2

= (2!) (2)

1
(2 + 1)!
1
(2 + 2)!

= 4

1
6
1
24

=
1
2
Therefore
V ar (X) = E

X2
[E (X)]2 = 1
2


2
3
2
=
1
18
10.1. CHAPTER 2 375
Alternatively we could …nd E (X) = M 0 (0) using the limit de…nition of the derivative
M 0 (0) = lim
t!0
M (t)M (0)
t
= lim
t!0
2[(t1)et+1]
t2
1
t
= lim
t!0
2

(t 1) et + 1 t2
t3
=
2
3
using L’Hospital’s Rule
Similarly E

X2

= M 00 (0) could be found using
M 00 (0) = lim
t!0
M 0 (t)M 0 (0)
t
where
M 0 (t) =
d
dt

2

(t 1) et + 1
t2
!
for t 6= 0.
13:(f)
M (t) = E

etX

=
1Z
0
etxxdx+
2Z
1
etx (2 x) dx
=
1Z
0
xetxdx+ 2
2Z
1
etxdx
2Z
1
xetxdx:
Since Z
xetxdx =
1
t

x 1
t

etx + C
M (t) =
1
t

x 1
t

etxj10+
2
t

etxj21
1
t

x 1
t

etxj21
=
1
t

1 1
t

et 1
t

1
t

+
2
t

e2t et 1
t

2 1
t

e2t

1 1
t

et

= e2t

2
t
+
1
t

1
t
2

+ et

1
t

1 1
t

2
t
+
1
t

1 1
t

+
1
t2
=
e2t 2et + 1
t2
for t 6= 0
For t = 0, M (0) = E (1) = 1. Therefore
M (t) =
(
1 if t = 0
e2t2et+1
t2
if t 6= 0
376 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Note that
lim
t!0
M (t) = lim
t!0
e2t 2et + 1
t2
, indeterminate of the form

0
0

, use Hospital’s Rule
= lim
t!0
2e2t 2et
2t
, indeterminate of the form

0
0

, use Hospital’s Rule
= lim
t!0
2e2t et
1
= 2 1 = 1
and therefore M (t) exists and is continuous for t 2 <.
Using the Exponential series we have for t 6= 0
e2t 2et + 1
t2
=
1
t2
f1 + 2t+ (2t)
2
2!
+
(2t)3
3!
+
(2t)4
4!
+
2

1 + t+
t2
2!
+
t3
3!
+
t4
4!
+

+ 1g
= 1 + t+
7
12
t2 + (10.14)
and since
1 + t+
7
12
t2 +

jt=0 = 1
(10.14) is the Maclaurin series representation for M (t) for t 2 <.
Since E

Xk

= k! the coe¢ cient of tk in the Maclaurin series for M (t) we have
E (X) = 1! 1 = 1 and E X2 = 2! 7
12
=
7
6
Therefore
V ar (X) = E

X2
[E (X)]2 = 7
6
1 = 1
6
Alternatively we could …nd E (X) = M 0 (0) using the limit de…nition of the derivative
M 0 (0) = lim
t!0
M (t)M (0)
t
= lim
t!0
e2t2et+1
t2
1
t
= lim
t!0
e2t 2et + 1 t2
t3
= lim
t!0

t2 + t3 + 712 t
4 + t2
t3
= lim
t!0

1 +
7
12
t+

= 1
Similarly E

X2

= M 00 (0) could be found using
M 00 (0) = lim
t!0
M 0 (t)M 0 (0)
t
where
M 00 (t) =
d
dt

e2t 2et + 1
t2

=
2

t

e2t et e2t + 2et 1
t3
for t 6= 0
10.1. CHAPTER 2 377
14:(a)
K(t) = logM(t)
K 0(t) =
M 0(t)
M(t)
K 0(0) =
M 0(0)
M(0)
=
E(X)
1
= E(X) since M(0) = 1
K 00(t) =
M(t)M 00(t) [M 0(t)]2
[M(t)]2
K 00(0) =
M(0)M 00(0) [M 0(0)]2
[M(0)]2
=
E(X2) [E(X)]2
1
= V ar(X)
14:(b) If X Negative Binomial(k; p) then
M(t) =

p
1 qet
k
for t < log q; q = 1 p
Therefore
K(t) = logM(t) = k log

p
1 qet

= k log p k log 1 qet for t < log q
K 0(t) = k
qet
1 qet =
kqet
1 qet
E(X) = K 0(0) =
kq
1 q =
kq
p
K 00(t) = kq
"
1 qet et et qet
(1 qet)2
#
V ar(X) = K 00(0) = kq

1 q + q
(1 q)2

=
kq
p2
378 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
15:(b)
M(t) =
1 + t
1 t
= (1 + t)
1P
k=0
tk for jtj < 1 by the Geometric series
=
1P
k=0
tk +
1P
k=0
tk+1
=

1 + t+ t2 + :::

t+ t2 + t3 + :::

= 1 +
1P
k=1
2tk for jtj < 1 (10.15)
Since
M(t) =
1P
k=0
M (k)(0)
k!
tk =
1P
k=0
E(Xk)
k!
tk (10.16)
then by matching coe¢ cients in the two series (10.15) and (10.16) we have
E(Xk)
k!
= 2 for k = 1; 2; :::
or
E(Xk) = 2k! for k = 1; 2; :::
15:(c)
M(t) =
et
1 t2
=
1P
i=0
ti
i!
1P
k=0
t2k

for jtj < 1 by the Geometric series
=

1 +
t
1!
+
t2
2!
+
t3
3!
+ :::

1 + t2 + t4 + :::

for jtj < 1
= 1 +

1
1!

t+

1 +
1
2!

t2 +

1
1!
+
1
3!

t3
+

1 +
1
2!
+
1
4!

t4 +

1
1!
+
1
3!
+
1
5!

t5 + ::: for jtj < 1
Since
M(t) =
1P
k=0
M (k)(0)
k!
tk =
1P
k=0
E(Xk)
k!
tk
then by matching coe¢ cients in the two series we have
E

X2k

= (2k)!
kP
i=0
1
(2i)!
for k = 1; 2; :::
E

X2k+1

= (2k + 1)!
kP
i=0
1
(2i+ 1)!
for k = 1; 2; :::
10.1. CHAPTER 2 379
16: (a)
MY (t) = E

etY

= E

etjZj

=
1Z
1
etjzj
1p
2
ez
2=2dz
=
2p
2
1Z
0
etzez
2=2dz since etjzjez
2=2 is an even function
=
2p
2
1Z
0
e(z
22zt)=2dz =
2et
2=2
p
2
1Z
0
e(z
22ztt2)=2dz
= 2et
2=2
1Z
0
1p
2
e(zt)
2=2dz let y = (z t) ; dy = dz
= 2et
2=2
tZ
1
1p
2
ey
2=2dy
= 2et
2=2 (t) for t 2 <
where is the N(0; 1) cumulative distribution function.
16: (b) To …nd E (Y ) = E (jZj) we …rst note that
(0) =
1
2
d
dt
(t) = (t) =
1p
2
et
2=2
and
(0) =
1p
2
Then
d
dt
MY (t) =
d
dt
h
2et
2=2 (t)
i
= 2tet
2=2 (t) + 2et
2=2 (t)
Therefore
E (Y ) = E (jZj)
=
d
dt
MY (t) jt=0
=
h
2tet
2=2 (t) + 2et
2=2 (t)
i
jt=0
= 0 +
2p
2
=
r
2

380 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
To …nd V ar (Y ) = V ar (jZj) we note that
d
dt
(t) = 0 (t)
=
d
dt

1p
2
et
2=2

=
tet2=2p
2
and
0 (0) = 0
Therefore
d2
dt2
MY (t) =
d
dt

d
dt
MY (t)

=
d
dt
h
2tet
2=2 (t) + 2et
2=2 (t)
i
= 2
d
dt

et
2=2

[t (t) + (t)]
= 2
n
et
2=2

t (t) + (t) + 0 (t)

+

tet
2=2

[t (t) + (t)]
o
and
E

Y 2

=
d2
dt2
MY (t) jt=0
= 2

(1)

0 + (0) + 0 (0)

+ (0) [(0) (0) + (0)]

= 2 (0)
= 2

1
2

= 1
Therefore
V ar (Y ) = V ar (jZj)
= E

Y 2
[E (Y )]2
= 1
r
2

!2
= 1 2

=
2

10.1. CHAPTER 2 381
18: Since
MX(t) = E(e
tx)
=
1P
j=0
ejtpj for jtj < h; h > 0
then
MX(log s) =
1P
j=0
ej log spj
=
1P
j=0
sjpj for j log sj < h; h > 0
which is a power series in s. Similarly
MY (log s) =
1P
j=0
sjqj for j log sj < h; h > 0
which is also a power series in s.
We are given that MX(t) = MY (t) for jtj < h; h > 0. Therefore
MX(log s) = MY (log s) for j log sj < h; h > 0
and 1P
j=0
sjpj =
1P
j=0
sjqj for j log sj < h; h > 0
Since two power series are equal if and only if their coe¢ cients are all equal we have pj = qj ,
j = 0; 1; ::: and therefore X and Y have the same distribution.
382 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
19: (a) Since X has moment generating function
M (t) =
et
1 t2 for jtj < 1
the moment generating function of Y = (X 1) =2 is
MY (t) = E

etY

= E
h
et(X1)=2
i
= et=2E

e(
t
2)X

= et=2M

t
2

= et=2
et=2
1 t22 ;= for
t2
< 1
=
1
1 14 t2
for jtj < 2
19: (b)
M 0(t) = (1)

1 1
4
t2
21
2
t

=
1
2
t

1 1
4
t2
2
E(X) = M 0(0) = 0
M 00(t) =
1
2
"
1 1
4
t2
2
+ t (2)

1 1
4
t2
31
2
t
#
E(X2) = M 00(0) =
1
2
V ar(X) = E(X2) [E(X)]2 = 1
2
0 = 1
2
19: (c) Since the moment generating function of a Double Exponential(; ) random variable
is
M (t) =
et
1 2t2 for jtj <
1

and the moment generating function of Y is
MY (t) =
1
1 14 t2
for jtj < 4
therefore by the Uniqueness Theorem for Moment Generating Functions Y has a Double
Exponential

0; 12

distribution.
10.2. CHAPTER 3 383
10.2 Chapter 3
1:(a)
1
k
=
1P
y=0
1P
x=0
q2px+y = q2
1P
y=0
py
1P
x=0
px

= q2
1P
y=0
py

1
1 p

by the Geometric Series since 0 < p < 1
= q
1P
y=0
py since q = 1 p
= q

1
1 p

by the Geometric Series
= 1
Therefore k = 1.
1:(b) The marginal probability function of X is
f1 (x) = P (X = x) =
P
y
f (x; y) =
1P
y=0
q2px+y = q2px

1P
y=0
py
!
= q2px

1
1 p

by the Geometric Series
= qpx for x = 0; 1; :::
By symmetry marginal probability function of Y is
f2 (y) = qp
y for y = 0; 1; :::
The support set of (X;Y ) is A = f(x; y) : x = 0; 1; :::; y = 0; 1; :::g. Since
f (x; y) = f1 (x) f2 (y) for (x; y) 2 A
therefore X and Y are independent random variables.
1:(c)
P (X = xjX + Y = t) = P (X = x;X + Y = t)
P (X + Y = t)
=
P (X = x; Y = t x)
P (X + Y = t)
Now
P (X + Y = t) =
P
(x;y):
P
x+y=t
q2px+y = q2
tP
x=0
px+(tx) = q2pt
tP
x=0
1
= q2pt (t+ 1) for t = 0; 1; :::
Therefore
P (X = xjX + Y = t) = q
2px+(tx)
q2pt (t+ 1)
=
1
t+ 1
for x = 0; 1; :::; t
384 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
2:(a)
f(x; y) =
e2
x! (y x)! for x = 0; 1; :::; y; y = 0; 1; :::
OR
f(x; y) =
e2
x! (y x)! for y = x; x+ 1; :::; x = 0; 1; :::
f1(x) =
P
y
f(x; y) =
1P
y=x
e2
x! (y x)!
=
e2
x!
1P
y=x
1
(y x)! let k = y x
=
e2
x!
1P
k=0
1
k!
=
e2e1
x!
=
e1
x!
x = 0; 1; ::: by the Exponential Series
Note that X Poisson(1).
f2(y) =
P
x
f(x; y) =
yP
x=0
e2
x! (y x)!
=
e2
y!
yP
x=0
y!
x!(y x)!
=
e2
y!
yP
x=0

y
x

1x
=
e2
y!
(1 + 1)y by the Binomial Series
=
2ye2
y!
for y = 0; 1; :::
Note that Y Poisson(2).
2:(b) Since (for example)
f(1; 2) = e2 6= f1(1)f2(2) = e1 2
2e2
2!
= 2e3
therefore X and Y are not independent random variables.
OR
The support set ofX is A1 = fx : x = 0; 1; :::g, the support set of Y is A2 = fy : y = 0; 1; :::g
and the support set of (X;Y ) is A = f(x; y) : x = 0; 1; :::; y; y = 0; 1; :::g. Since A 6= A1A2
therefore by the Factorization Theorem for Independence X and Y are not independent
random variables.
10.2. CHAPTER 3 385
3:(a) The support set of (X;Y )
A =

(x; y) : 0 < y < 1 x2; 1 < x < 1
=
n
(x; y) :
p
1 y < x <
p
1 y; 0 < y < 1
o
is pictured in Figure 10.8.
x
y
-1 0 1
y =1-x 2
A
Figure 10.8: Support set for Problem 3(a)
1 =
1Z
1
1Z
1
f(x; y)dxdy = k
Z
(x;y)
Z
2A

x2 + y

dydx = k
1Z
x=1
1x2Z
y=0

x2 + y

dydx
= k
1Z
1

x2y +
1
2
y2j1x20

dx = k
1Z
1

x2

1 x2+ 1
2

1 x22 dx
= k
1Z
0
h
2x2

1 x2+ 1 x22i dx by symmetry
= k
1Z
0

1 x4 dx = k x 1
5
x5j10

=
4
5
k and thus k =
5
4
Therefore
f (x; y) =
5
4

x2 + y

for (x; y) 2 A
386 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
3:(b) The marginal probability density function of X is
f1(x) =
1Z
1
f(x; y)dy
=
5
4
1x2Z
0

x2 + y

dy
=
5
8

1 x4 for 1 < x < 1
and 0 otherwise. The support set of X is A1 = fx : 1 < x < 1g.
The marginal probability density function of Y is
f2 (y) =
1Z
1
f(x; y)dx
=
5
4
p
1yZ
p1y

x2 + y

dx
=
5
2
p
1yZ
0

x2 + y

dx because of symmetry
=
5
2

1
3
x3 + yxj
p
1y
0

=
5
2

1
3
(1 y)3=2 + y (1 y)1=2

=
5
6
(1 y)1=2 [(1 y) + 3y]
=
5
6
(1 y)1=2 (1 + 2y) for 0 < y < 1
and 0 otherwise. The support set of Y is A2 = fy : 0 < y < 1g.
3:(c) The support set A of (X;Y ) is not rectangular. To show that X and Y are not
independent random variables we only need to …nd x 2 A1, and y 2 A2 such that (x; y) =2 A.
Let x = 34 and y =
1
2 . Since
f

3
4
;
1
2

= 0 6= f1

3
4

f2

1
2

> 0
therefore X and Y are not independent random variables.
10.2. CHAPTER 3 387
3:(d) The region of integration is pictured in Figure 10.9.
x
y
-1 0 1
y =1-x2
y =x+1
B
C
Figure 10.9: Region of integration for Problem 3 (d)
P (Y X + 1) =
Z
(x;y)
Z
2B
5
4

x2 + y

dydx = 1
Z
(x;y)
Z
2C
5
4

x2 + y

dydx
= 1
0Z
x=1
1x2Z
y=x+1
5
4

x2 + y

dydx
= 1 5
4
0Z
x=1

x2y +
1
2
y2j1x2x+1

dx
= 1 5
8
0Z
1
n
[2x2(1 x2) + 1 x22] [2x2 (x+ 1) + (x+ 1)2]o dx
= 1 5
8
0Z
1
x4 2x3 3x2 2x dx = 1 + 5
8

1
5
x5 +
1
2
x4 + x3 + x2j01

= 1 5
8

1
5
(1) + 1
2
+ (1) + 1

= 1 5
8
2 + 5
10

= 1 3
16
=
13
16
388 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
4:(a) The support set of (X;Y )
A =

(x; y) : x2 < y < 1; 1 < x < 1
= f(x; y) : py < x < py; 0 < y < 1g
is pictured in Figure 10.10.
| | x
y
-1 0 1
1
y=1
y=x2
A
Figure 10.10: Support set for Problem 4 (a)
1 =
1Z
1
1Z
1
f(x; y)dxdy = k
Z
(x;y)
Z
2A
x2ydydx
= k
1Z
x=1
1Z
y=x2
x2ydydx = k
1Z
x=1
x2

1
2
y2j1x2

dx
=
k
2
1Z
x=1
x2

1 x4 dx = k
2

1
3
x3 1
7
x7j11

=
k
2

1
3
1
7
1
3
(1) + 1
7
(1)

= k

1
3
1
7

=
4k
21
Therefore k = 21=4 and
f (x; y) =
21
4
x2y for (x; y) 2 A
and 0 otherwise.
10.2. CHAPTER 3 389
4:(b) The marginal probability density function of X is
f1 (x) =
21
4
1Z
x2
x2ydy
=
21x2
8
h
y2j1x2
i
=
21x2
8

1 x4 for 1 < x < 1
and 0 otherwise. The support set of X is A1 = fx : 1 < x < 1g.
The marginal probability density function of Y
f2 (y) =
21
4
p
yZ
x=py
x2ydx
=
21
2
Z py
0
x2ydx because of symmetry
=
7
2
y
h
x3j
p
y
0
i
=
7
2
y

y3=2

=
7
2
y5=2 for 0 < y < 1
and 0 otherwise. The support set of Y is is A2 = fy : 0 < y < 1g.
The support set A of (X;Y ) is not rectangular. To show that X and Y are not inde-
pendent random variables we only need to …nd x 2 A1, and y 2 A2 such that (x; y) =2 A.
Let x = 12 and y =
1
10 . Since
f

1
2
;
1
10

= 0 6= f1

1
2

f2

1
10

> 0
therefore X and Y are not independent random variables.
390 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
4:(c) The region of integration is pictured in Figure 10.11.
| | x
y
-1 0 1
1
y=x 2
y=x
y=1
B
Figure 10.11: Region of intergration for Problem 4 (c)
P (X Y ) =
Z
(x;y)
Z
2B
21
4
x2ydydx
=
1Z
x=0
xZ
y=x2
21
4
x2ydydx
=
21
8
1Z
0
x2

y2jxx2

dx
=
21
8
1Z
0
x2

x2 x4 dx
=
21
8
1Z
0

x4 x6 dx
=
21
8

1
5
x5 1
7
x7j10

=
21
8

7 5
35

=
3
20
10.2. CHAPTER 3 391
4:(d) The conditional probability density function of X given Y = y is
f1 (xjy) = f (x; y)
f2 (y)
=
21
4 x
2y
7
2y
5=2
=
3
2
x2y3=2 for py < x < py; 0 < y < 1
and 0 otherwise. Check:
1Z
1
f1 (xjy) dx = 1
2
y3=2
p
yZ
py
3x2dx
= y3=2
h
x3j
p
y
0
i
= y3=2y3=2
= 1
The conditional probability density function of Y given X = x is
f2 (yjx) = f (x; y)
f1 (x)
=
21
4 x
2y
21x2
8 (1 x4)
=
2y
(1 x4) for x
2 < y < 1; 1 < x < 1
and 0 otherwise. Check:
1Z
1
f2 (yjx) dy = 1
(1 x4)
1Z
x2
2ydy
=
1
(1 x4)

y2j1x2

=
1 x4
1 x4
= 1
392 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
6:(d) (i) The support set of (X;Y )
A = f(x; y) : 0 < x < y < 1g
is pictured in Figure 10.12.
| x
y
0
1
1
y=x
y=1
A
Figure 10.12: Graph of support set for Problem 6(d) (i)
1 =
1Z
1
1Z
1
f (x; y) dxdy = k
Z
(x;y)
Z
2A
(x+ y) dydx
=
1Z
y=0
yZ
x=0
k (x+ y) dxdy = k
1Z
y=0

1
2
x2 + xy

jyx=0

dy
= k
1Z
0

1
2
y2 + y2

dx
= k
1Z
0
3
2
y2dy =
k
2

y3j10

=
k
2
Therefore k = 2 and
f (x; y) = 2 (x+ y) for (x; y) 2 A
and 0 otherwise.
10.2. CHAPTER 3 393
6:(d) (ii) The marginal probability density function of X is
f1 (x) =
1Z
1
f (x; y) dy =
1Z
y=x
2 (x+ y) dy
=

2xy + y2
j1y=x
= (2x+ 1) 2x2 + x2
= 1 + 2x 3x2 for 0 < x < 1
and 0 otherwise. The support set of X is A1 = fx : 0 < x < 1g.
The marginal probability density function of Y is
f2 (y) =
1Z
1
f (x; y) dx =
yZ
x=0
2 (x+ y) dx
=

x2 + 2yx
jyx=0
= y2 + 2y2 0
= 3y2 for 0 < y < 1
and 0 otherwise. The support set of Y is A2 = fy : 0 < y < 1g.
6:(d) (iii) The conditional probability density function of X given Y = y is
f1 (xjy) = f (x; y)
f2 (y)
=
2 (x+ y)
3y2
for 0 < x < y < 1
Check: 1Z
1
f1 (xjy) dx =
yZ
x=0
2 (x+ y)
3y2
dx =
1
3y2
yZ
x=0
2 (x+ y) dx =
3y2
3y2
= 1
The conditional probability density function of Y given X = x is
f2 (yjx) = f (x; y)
f1 (x)
=
2 (x+ y)
1 + 2x 3x2 for 0 < x < y < 1
and 0 otherwise. Check:
1Z
1
f2 (yjx) dy =
1Z
y=x
2 (x+ y)
1 + 2x 3x2dy =
1
1 + 2x 3x2
1Z
y=x
2 (x+ y) dx =
1 + 2x 3x2
1 + 2x 3x2 = 1
394 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
6:(d) (iv)
E (Xjy) =
1Z
1
xf1 (xjy) dx
=
yZ
x=0
x

2 (x+ y)
3y2

dx
=
1
3y2
yZ
x=0
2

x2 + yx

dx
=
1
3y2

2
3
x3 + yx2jyx=0

=
1
3y2

2
3
y3 + y3

=
1
3y2

5
3
y3

=
5
9
y for 0 < y < 1
E (Y jx) =
1Z
1
yf2 (yjx) dy
=
1Z
y=x
y

2 (x+ y)
1 + 2x 3x2

dy
=
1
1 + 2x 3x2
24 1Z
y=x
2

xy + y2

dy
35
=
1
1 + 2x 3x2

xy2 +
2
3
y3j1y=x

=
1
1 + 2x 3x2

x+
2
3



x3 +
2
3
x3

=
2
3 + x 53x3
1 + 2x 3x2
=
2 + 3x 5x3
3 (1 + 2x 3x2) for 0 < x < 1
10.2. CHAPTER 3 395
6:(f) (i) The support set of (X;Y )
A = f(x; y) : 0 < y < 1 x; 0 < x < 1g
= f(x; y) : 0 < x < 1 y; 0 < y < 1g
is pictured in Figure 10.13.
x
y
0 1
1
y =1-x
A
Figure 10.13: Support set for Problem 6 (f) (i)
1 =
1Z
1
1Z
1
f (x; y) dxdx = k
Z
(x;y)
Z
2A
x2ydydx
= k
1Z
x=0
1xZ
y=0
x2ydydx = k
1Z
0
x2

1
2
y2j1x0

dx =
k
2
1Z
0
x2 (1 x)2 dx
=
k
2
1Z
0

x2 2x3 + x4 dx = k
2

1
3
x3 1
2
x4 +
1
5
x5j10

=
k
2

1
3
1
2
+
1
5

=
k
2

10 15 + 6
30

=
k
60
Therefore k = 60 and
f (x; y) = 60x2y for (x; y) 2 A
and 0 otherwise.
396 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
6:(f) (ii) The marginal probability density function of X is
f1 (x) =
1Z
1
f (x; y) dy
= 60x2
1xZ
0
ydy
= 30x2

y2j1x0

= 30x2 (1 x)2 for 0 < x < 1
and 0 otherwise. The support of X is A1 = fx : 0 < x < 1g. Note that X Beta(3; 3).
The marginal probability density function. of Y is
f2 (y) =
1Z
1
f (x; y) dx
= 60y
1yZ
0
x2dx
= 20y
h
x3j1y0
i
= 20y (1 y)3 for 0 < y < 1
and 0 otherwise. The support of Y is A2 = fy : 0 < y < 1g. Note that Y Beta(2; 4).
6:(f) (iii) The conditional probability density function of X given Y = y is
f1 (xjy) = f (x; y)
f2 (y)
=
60x2y
20y (1 y)3
=
3x2
(1 y)3 for 0 < x < 1 y; 0 < y < 1
Check:
1Z
1
f1 (xjy) dx = 1
(1 y)3
1yZ
0
3x2dx =
1
(1 y)3
h
x3j1y0
i
=
(1 y)3
(1 y)3 = 1
10.2. CHAPTER 3 397
The conditional probability density function of Y given X = x is
f2 (yjx) = f (x; y)
f1 (x)
=
60x2y
30x2 (1 x)2
=
2y
(1 x)2 for 0 < y < 1 x; 0 < x < 1
and 0 otherwise. Check:
1Z
1
f2 (yjx) dy = 1
(1 x)2
1xZ
0
2ydy =
1
(1 x)2

y2j1x0

=
(1 x)2
(1 x)2 = 1
6:(f) (iv)
E (Xjy) =
1Z
1
xf1 (xjy) dx
= 3 (1 y)3
1yZ
0
x

x2

dx
= 3 (1 y)3

1
4
x4j1y0

=
3
4
(1 y)3 (1 y)4
=
3
4
(1 y) for 0 < y < 1
E (Y jx) =
1Z
1
yf2 (yjx) dy
= 2 (1 x)2
1xZ
0
y (y) dy
= 2 (1 x)2

1
3
y3j1x0

=
2
3
(1 x)2 (1 x)3
=
2
3
(1 x) for 0 < x < 1
398 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
6:(g) (i) The support set of (X;Y )
A = f(x; y) : 0 < y < xg
is pictured in Figure 10.14.
x
y
y=x
. . .
. .
.
A
0
Figure 10.14: Support set for Problem 6 (g) (i)
1 =
1Z
1
1Z
1
f (x; y) dxdx = k
Z
(x;y)
Z
2A
ex2ydydx
= k
1Z
y=0
1Z
x=y
ex2ydxdy = k
1Z
0
e2y lim
b!1
h
exjby
i
dy
= k
1Z
0
e2y

ey lim
b!1
eb

dy = k
1Z
0
e3ydy
=
k
3
1Z
0
3e3ydy
But 3e3y is the probability density function of a Exponential

1
3

random variable and
therefore the integral is equal to 1. Therefore 1 = k=3 or k = 3.
Therefore
f (x; y) = 3ex2y for (x; y) 2 A
and 0 otherwise.
10.2. CHAPTER 3 399
6:(g) (ii) The marginal probability density function of X is
f1 (x) =
1Z
1
f (x; y) dy =
xZ
0
3ex2ydy
= 3ex
1
2
e2yjx0
=
3
2
ex

1 e2x for x > 0
and 0 otherwise. The support of X is A1 = fx : x > 0g.
The marginal probability density function of Y is
f2 (y) =
1Z
1
f (x; y) dx
=
1Z
y
3ex2ydx
= 3e2y

lim
b!1
exjby

= 3e2y

ey lim
b!1
eb

= 3e3y for y > 0
and 0 otherwise. The support of Y is A2 = fy : y > 0g. Note that Y Exponential

1
3

.
6:(g) (iii) The conditional probability density function of X given Y = y is
f1 (xjy) = f (x; y)
f2 (y)
=
3ex2y
3e3y
= e(xy) for x > y > 0
Note that XjY = y Two Parameter Exponential(y; 1).
The conditional probability density function of Y given X = x is
f2 (yjx) = f (x; y)
f1 (x)
=
3ex2y
3
2e
x (1 e2x)
=
2e2y
1 e2x for 0 < y < x
400 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and 0 otherwise. Check:
1Z
1
f2 (yjx) dy =
xZ
0
2e2y
1 e2xdy =
1
1 e2x
e2yjx0 = 1 e2x1 e2x = 1
6:(g) (iv)
E (Xjy) =
1Z
1
xf1 (xjy) dx
=
1Z
y
xe(xy)dx
= ey lim
b!1
h
(x+ 1) exjby
i
= ey

(y + 1) ey lim
b!1
b+ 1
eb

= y + 1 for y > 0
E (Y jx) =
1Z
1
yf2 (yjx) dy
=
xZ
0
2ye2y
1 e2xdy
=
1
1 e2x



y +
1
2

e2yjx0

=
1
1 e2x

1
2


x+
1
2

e2x

=
1 (2x+ 1) e2x
2 (1 e2x) for x > 0
10.2. CHAPTER 3 401
7:(a) Since X Uniform(0; 1)
f1 (x) = 1 for 0 < x < 1
The joint probability density function of X and Y is
f (x; y) = f2 (yjx) f1 (x)
=
1
1 x (1)
=
1
1 x for 0 < x < y < 1
and 0 otherwise.
7:(b) The marginal probability density function of Y is
f2 (y) =
1Z
1
f (x; y) dx
=
yZ
x=0
1
1 xdx
= log (1 x) jy0
= log (1 y) for 0 < y < 1
and 0 otherwise.
7:(c) The conditional probability density function of X given Y = y is
f1 (xjy) = f (x; y)
f2 (y)
=
1
1x
log (1 y)
=
1
(x 1) log (1 y) for 0 < x < y < 1
and 0 otherwise.
402 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
9: Since Y j Binomial(n; ) then
E (Y j) = n and V ar (Y j) = n (1 )
Since Beta(a; b) then
E () =
a
a+ b
, V ar () =
ab
(a+ b+ 1) (a+ b)2
and
E

2

= V ar () + [E ()]2
=
ab
(a+ b+ 1) (a+ b)2
+

a
a+ b
2
=
ab+ a2 (a+ b+ 1)
(a+ b+ 1) (a+ b)2
=
a [b+ a(a+ b) + a]
(a+ b+ 1) (a+ b)2
=
a (a+ b) (a+ 1)
(a+ b+ 1) (a+ b)2
=
a (a+ 1)
(a+ b+ 1) (a+ b)
Therefore
E (Y ) = E [E (Y j)] = E (n) = nE () = n

a
a+ b

and
V ar (Y ) = E [var (Y j)] + V ar [E (Y j)]
= E [n (1 )] + V ar (n)
= n

E () E 2+ n2V ar ()
= n

a
a+ b
a (a+ 1)
(a+ b+ 1) (a+ b)

+
n2ab
(a+ b+ 1) (a+ b)2
= na

(a+ b+ 1) (a+ 1)
(a+ b+ 1) (a+ b)

+
n2ab
(a+ b+ 1) (a+ b)2
= na

b (a+ b)
(a+ b+ 1) (a+ b)2

+
n2ab
(a+ b+ 1) (a+ b)2
=
nab (a+ b+ n)
(a+ b+ 1) (a+ b)2
10.2. CHAPTER 3 403
10: (a) Since Y j Poisson(), E (Y j) = and since Gamma(; ), E () = .
E (Y ) = E [E (Y j)] = E () =
Since Y j Poisson(), V ar (Y j) = and since Gamma(; ), V ar () = 2.
V ar (Y ) = E [V ar (Y j)] + V ar [E (Y j)]
= E () + V ar ()
= + 2
10: (b) Since Y j Poisson() and Gamma(; ) we have
f2 (yj) =
ye
y!
for y = 0; 1; : : : ; > 0
and
f1 () =
1e=
()
for > 0
and by the Product Rule
f (; y) = f2 (yj) f1 () =
ye
y!
1e=
()
for y = 0; 1; : : : ; > 0
The marginal probability function of Y is
f2 (y) =
1Z
1
f (; y) d
=
1
y! ()
1Z
0
y+1e(1+1=)d let x =

1 +
1


=

+ 1


=
1
y! ()


+ 1
y+ 1Z
0
xy+1exdx
=
y
(1 + )y+
(y + )
y! ()
=
y
(1 + )y+
(y + 1) (y + 2) () ()
y! ()
=

y + 1
y

1
1 +

1 1
1 +
y
for y = 0; 1; : : :
If is a nonnegative integer then we recognize this as the probability function of a Negative
Binomial

; 11+

random variable.
11: (a) First note that
@j+k
@tj1@t
k
2
et1x+t2y = xjyket1x+t2y
404 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore
@j+k
@tj1@t
k
2
M (t1; t2) =
1Z
1
1Z
1
@j+k
@tj1@t
k
2
et1x+t2yf (x; y) dxdx
(assuming the operations of integration and di¤erentiation can be interchanged)
=
1Z
1
1Z
1
xjyket1x+t2yf (x; y) dxdx
= E

XjY ket1X+t2Y

and
@j+k
@tj1@t
k
2
M (t1; t2) j(t1;t2)=(0;0) = E

XjY k

as required. Note that this proof is for the case of (X;Y ) continuous random variables.
The proof for (X;Y ) discrete random variables follows in a similar manner with integrals
replaced by summations.
11: (b) Suppose that X and Y are independent random variables then
M (t1; t2) = E

et1X+t2Y

= E

et1Xet2Y

= E

et1X

E

et2Y

= MX (t1)MY (t2)
Suppose that M (t1; t2) = MX (t1)MY (t2) for all jt1j < h; jt2j < h for some h > 0. Then
1Z
1
1Z
1
et1x+t2yf (x; y) dxdx =
1Z
1
et1xf1 (x) dx
1Z
1
et2Y f2 (y) dy
=
1Z
1
1Z
1
et1x+t2yf1 (x) f2 (y) dxdx
But by the Uniqueness Theorem for Moment Generating Functions this can only hold
if f (x; y) = f1 (x) f2 (y) for all (x; y) and therefore X and Y are independent random
variables.
Thus we have shown that X and Y are independent random variables if and only if
MX (t1)MY (t2) = M (t1; t2).
Note that this proof is for the case of (X;Y ) continuous random variables. The proof for
(X;Y ) discrete random variables follows in a similar manner with integrals replaced by
summations.
10.2. CHAPTER 3 405
11: (c) If (X;Y ) Multinomial(n; p1; p2; p3) then
M (t1; t2) =

p1e
t1 + p2e
t2 + p3
n
for t1 2 <; t2 2 <
By 11 (a)
E (XY ) =
@2
@t1@t2
M (t1; t2) j(t1;t2)=(0;0)
=
@2
@t1@t2

p1e
t1 + p2e
t2 + p3
n j(t1;t2)=(0;0)
= n (n 1) p1et1p2et2

p1e
t1 + p2e
t2 + p3
n2 j(t1;t2)=(0;0)
= n (n 1) p1p2
Also
MX (t) = M (t; 0)
=

p1e
t + 1 p1
n
for t 2 <
and
E (X) =
d
dt
MX (t) jt=0
= np1

p1e
t + 1 p1
n1 jt=0
= np1
Similarly
E (Y ) = np2
Therefore
Cov (X;Y ) = E (XY ) E (X)E (Y )
= n (n 1) p1p2 np1np2
= np1p2
406 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
13:(a) Note that since is a symmetric matrix T = so that 1 = I (2 2 identity
matrix) and T1 = I. Also tT = tT since tT is a scalar and xtT = txT since
(x ) tT is a scalar.
[x (+ t)] 1 [x (+ t)]T 2tT ttT
= [(x ) t] 1 [(x ) t]T 2tT ttT
= [(x ) t] 1
h
(x )T tT
i
2tT ttT
= (x ) 1 (x )T (x ) 1tT t1 (x )T + t1tT 2tT ttT
= (x ) 1 (x )T (x ) tT t (x )T + ttT 2tT ttT
= (x ) 1 (x )T xtT + tT txT + tT 2tT
= (x ) 1 (x )T xtT + tT xtT + tT 2tT
= (x ) 1 (x )T 2xtT
= (x ) 1 (x )T 2xtT
as required.
Now
M(t1; t2) = E

et1X1+t2X2

= E

exp

XtT

=
1Z
1
1Z
1
1
2jj1=2 exp(xt
T ) exp

1
2
(x ) 1 (x )T

dx1dx2
=
1Z
1
1Z
1
1
2jj1=2 exp

1
2
h
(x ) 1 (x )T 2xtT
i
dx1dx2
=
1Z
1
1Z
1
1
2jj1=2 exp

1
2

[x (+ t)] 1 [x (+ t)]T 2tT ttT

dx1dx2
= exp

tT +
1
2
ttT
1Z
1
1Z
1
1
2jj1=2 exp

1
2
[x (+ t)] 1 [x (+ t)]T

dx1dx2
= exp

tT +
1
2
ttT

for all t = (t1; t2) 2 <2
since
1
2jj1=2 exp

1
2
[x (+ t)] 1 [x (+ t)]T

is a BVN(+ t;) probability density function and therefore the integral is equal to one.
10.2. CHAPTER 3 407
13:(b) Since
MX1(t) = M(t; 0) = exp

1t+
1
2
t221

for t 2 <
which is the moment generating function of a N

1;
2
1

random variable, then by the
Uniqueness Theorem for Moment Generating Functions X1 N

1;
2
1

. By a similar
argument X2 N

2;
2
2

.
13:(c) Since
@2
@t1@t2
M(t1; t2) =
@2
@t1@t2
exp

T t+
1
2
tTt

=
@2
@t1@t2
exp

1t1 + 2t2 +
1
2
t21
2
1 + t1t212 +
1
2
t22
2
2

=
@
@t2

1 + t1
2
1 + t212

M(t1; t2)

= 12M(t1; t2) +

1 + t1
2
1 + t212

2 + t2
2
2 + t112

M(t1; t2)
therefore
E(XY ) =
@2
@t1@t2
M(t1; t2)j(t1;t2)=(0;0)
= 12 + 12
From (b) we know E(X1) = 1 and E(X2) = 2: Therefore
Cov(X;Y ) = E(XY ) E(X)E(Y ) = 12 + 12 12 = 12
13:(d) By Theorem 3.8.6, X1 and X2 are independent random variables if and only if
M(t1; t2) = MX1(t1)MX2(t2)
then X1 and X2 are independent random variables if and only if
exp

1t1 + 2t2 +
1
2
t21
2
1 + t1t212 +
1
2
t22
2
2

= exp

1t1 +
1
2
t21
2
1

exp

2t2 +
1
2
t22
2
2

for all (t1; t2) 2 <2 or
1t1 + 2t2 +
1
2
t21
2
1 + t1t212 +
1
2
t22
2
2 = 1t1 +
1
2
t21
2
1 + 2t2 +
1
2
t22
2
2
or
t1t212 = 0
for all (t1; t2) 2 <2 which is true if and only if = 0. Therefore X1 and X2 are independent
random variables if and only if = 0.
408 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
13:(e) Since
E

exp(XtT )

= exp

tT +
1
2
ttT

for t 2 <2
therefore
Efexp[(XA+ b)tT ]g
= Efexp[XAtT + btT ]g
= exp(btT )E
n
exp
h
X

tAT
T io
= exp(btT ) exp



tAT
T
+
1
2

tAT



tAT
T
= exp

btT + AtT +
1
2
t

ATA

tT

= exp

(A+ b) tT +
1
2
t(ATA)t

for t 2 <2
which is the moment generating function of a BVN

A+ b; ATA

random variable, then
by the Uniqueness Theorem for Moment Generating Functions,XA+b BVNA+ b; ATA.
13:(f) First note that
1 =
1
21
2
2 (1 2)
"
22 12
12 21
#
=
1
(1 2)
"
1
21
12
12 122
#
jj1=2 =

"
21 12
12
2
2
#
1=2
=
h
21
2
2 (12)2
i1=2
= 12
p
1 2
(x ) 1 (x )T = 1
(1 2)
"
x1 1
1
2
2

x1 1
1

x2 2
2

+

x2 2
2
2#
and
(x ) 1 (x )T

x1 1
1
2
=
1
(1 2)
"
x1 1
1
2
2

x1 1
1

x2 2
2

+

x2 2
2
2#


x1 1
1
2
=
1
(1 2)
"
x1 1
1
2
2

x1 1
1

x2 2
2

+

x2 2
2
2
1 2x1 1
1
2#
=
1
22 (1 2)

(x2 2)2 2
2
1
(x1 1) (x2 2) +
222
21
(x1 1)2

=
1
22 (1 2)

(x2 2)
2
1
(x1 1)
2
=
1
22 (1 2)

x2

2 +
2
1
(x1 1)
2
10.2. CHAPTER 3 409
The conditional probability density function of X2 given X1 = x1 is
f (x1; x2)
f1 (x1)
=
1
2jj1=2 exp
h
12 (x ) 1 (x )T
i
1p
21
exp

12

x11
1
2
=
1p
212
p
1 2 exp
(
1
2
"
(x ) 1 (x )T

x1 1
1
2#)
=
1p
22
p
1 2 exp
"
1
2
1
22 (1 2)

x2

2 +
2
1
(x1 1)
2#
for x 2 <2
which is the probability density function of a N

2 +
2
1
(x1 1) ; 22

1 2 random
variable.
14:(a)
M (t1; t2) = E

et1X+t2Y

=
1Z
y=0
yZ
x=0
et1x+t2y2exydxdy = 2
1Z
0
e(1t2)y
24 yZ
0
e(1t1)xdx
35 dy
= 2
1Z
0
e(1t2)y
"
e(1t1)x
1 t1 j
y
0
#
dy if t1 < 1
=
2
(1 t1)
1Z
0
e(1t2)y
h
1 e(1t1)y
i
dy which converges if t1 < 1; t2 < 1
=
2
(1 t1)
1Z
0
h
e(1t2)y e(2t1t2)y
i
dy
=
2
(1 t1) limb!1
"
e(1t2)y
(1 t2) +
e(2t1t2)y
(2 t1 t2) j
b
0
#
=
2
(1 t1) limb!1
"
e(1t2)b
(1 t2) +
e(2t1t2)b
(2 t1 t2) +
1
(1 t2)
1
(2 t1 t2)
#
=
2
(1 t1)

1
(1 t2)
1
(2 t1 t2)

=
2
(1 t1)

2 t1 t2 (1 t2)
(1 t2) (2 t1 t2)

=
2
(1 t1)

1 t1
(1 t2) (2 t1 t2)

=
2
(1 t2) (2 t1 t2) for t1 < 1; t2 < 1
410 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
14:(b)
MX (t) = M (t; 0) =
2
2 t =
1
1 12 t
for t < 1
which is the moment generating function of an Exponential

1
2

random variable. Therefore
by the Uniqueness Theorem for Moment Generating Functions X Exponential12.
MY (t) = M (0; t) =
2
(1 t) (2 t) =
2
2 3t+ t2 for t < 1
which is not a moment generating function we recognize. We …nd the probability density
function of Y using
f2 (y) =
yZ
0
2exydx = 2ey
exjy0
= 2ey

1 ey for y > 0
and 0 otherwise.
14:(c) E (X) = 12 since X Exponential

1
2

.
Alternatively
E (X) =
d
dt
MX (t) jt=0 =
d
dt

2
2 t

jt=0
=
2 (1) (1)
(2 t)2 jt=0 =
2
4
=
1
2
Similarly
E (Y ) =
d
dt
MY (t) jt=0 =
d
dt

2
2 3t+ t2

jt=0
=
2 (1) (3 + 2t)
(2 3t+ t2)2 jt=0 =
6
4
=
3
2
Since
@2
@t1@t2
M (t1; t2) = 2
@2
@t1@t2

1
(1 t2) (2 t1 t2)

= 2
@
@t2

1
(1 t2) (2 t1 t2)2

= 2

1
(1 t2)2
1
(2 t1 t2)2
+
1
(1 t2)
2
(2 t1 t2)3

we obtain
E (XY ) =
@2
@t1@t2
M (t1; t2) j(t1;t2)=(0;0)
= 2

1
(1 t2)2
1
(2 t1 t2)2
+
1
(1 t2)
2
(2 t1 t2)3

j(t1;t2)=(0;0)
= 2

1
4
+
2
8

= 1
10.2. CHAPTER 3 411
Therefore
Cov(X;Y ) = E(XY ) E(X)E(Y )
= 1

1
2

3
2

=
1
4
412 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
10.3 Chapter 4
1:We are given that X and Y are independent random variables and we want to show that
U = h (X) and V = g (Y ) are independent random variables. We will assume that X and
Y are continuous random variables. The proof for discrete random variables is obtained by
replacing integrals by sums.
Suppose X has probability density function f1 (x) and support set A1, and Y has
probability density function f2 (y) and support set A2. Then the joint probability density
function of X and Y is
f (x; y) = f1 (x) f2 (y) for (x; y) 2 A1 A2
Now for any (u; v) 2 <2
P (U u; V v) = P (h (X) u; g (Y ) v)
=
ZZ
B
f1 (x) f2 (y) dxdy
where
B = f(x; y) : h (x) u; g (y) vg
Let B1 = fx : h (x) ug and B2 = fy : g (y) vg. Then B = B1 B2.
Since
P (U u; V v) =
ZZ
B1B2
f1 (x) f2 (y) dxdy
=
Z
B1
f1 (x) dx
Z
B2
f2 (y) dy
= P (h (X) u)P (g (Y ) v)
= P (U u)P (V v) for all (u; v) 2 <2
therefore U and V are independent random variables.
10.3. CHAPTER 4 413
2:(a) The transformation
S : U = X + Y; V = X
has inverse transformation
X = V; Y = U V
The support set of (X;Y ), pictured in Figure 10.15, is
RXY = f(x; y) : 0 < x+ y < 1; 0 < x < 1; 0 < y < 1g
x
y
0 1
1
y=1-x
Rxy
Figure 10.15: RXY for Problem 2 (a)
Under S
(k; 0) ! (k; k) 0 < k < 1
(k; 1 k) ! (1; k) 0 < k < 1
(0; k) ! (k; 0) 0 < k < 1
and thus S maps RXY into
RUV = f(u; v) : 0 < v < u < 1g
which is pictured in Figure 10.16.
414 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
u
v
0 1
1 v=u
Ruv
Figure 10.16: RUV for Problem 2 (a)
The Jacobian of the inverse transformation is
@ (x; y)
@ (u; v)
=
0 11 1
= 1
The joint probability density function of U and V is given by
g (u; v) = f (v; u v) j1j
= 24v (u v) for (u; v) 2 RUV
and 0 otherwise.
2: (b) The marginal probability density function of U is given by
g1 (u) =
1Z
1
g (u; v) dv
=
uZ
0
24v (u v) dv
= 12uv2 8v3ju0
= 4u3 for 0 < u < 1
and 0 otherwise.
10.3. CHAPTER 4 415
6:(a)
x
y
0 1
1
t
t
y=t-x
At
Figure 10.17: Region of integration for 0 t 1
For 0 t 1
G (t) = P (T t) = P (X + Y t)
=
Z
(x;y)
Z
2At
4xydydx =
tZ
x=0
txZ
y=0
4xydydx
=
tZ
0
2x

y2jtx0

dx
=
tZ
0
2x (t x)2 dx
=
tZ
0
2x

t2 2tx+ x2 dx
= x2t2 4
3
tx3 +
1
2
x4jt0
= t4 4
3
t4 +
1
2
t4
=
1
6
t4
416 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
x
y
0 1
1
t-1
t-1
y=t-x
Bt
Figure 10.18: Region of integration for 1 t 2
For 1 t 2 we use
G (t) = P (T t) = P (X + Y t) = 1 P (X + Y > t)
where
P (X + Y > t) =
Z
(x;y)2
Z
Bt
4xydydx =
1Z
x=t1
1Z
y=tx
4xydydx =
1Z
t1
2x

y2j1tx

dx
=
1Z
t1
2x
h
1 (t x)2
i
dx =
1Z
t1
2x

1 t2 + 2tx x2 dx
= x2

1 t2+ 4
3
tx3 1
2
x4j1t1
=

1 t2+ 4
3
t 1
2


(t 1)2 1 t2+ 4
3
t (t 1)3 1
2
(t 1)4

= 1 t2 + 4
3
t 1
2
+
1
6
(t 1)3 (t+ 3)
so
G (t) = t2 4
3
t+
1
2
1
6
(t 1)3 (t+ 3) for 1 t 2
The probability density function of T = X + T is
g (t) =
dG (t)
dt
=
8<:
2
3 t
3 if 0 t 1
2t 43
h
1
2 (t 1)2 (t+ 3) + 16 (t 1)3
i
if 1 < t 2
=
8<:
2
3 t
3 if 0 t 1
2
3
t3 + 6t 4 if 1 < t 2
and 0 otherwise.
10.3. CHAPTER 4 417
6: (c) The transformation
S : U = X2; V = XY
has inverse transformation
X =
p
U; Y = V=
p
U
The support set of (X;Y ), pictured in Figure 10.19, is
RXY = f(x; y) : 0 < x < 1; 0 < y < 1g
x
y
0 1
1
Rxy
Figure 10.19: Support set of (X;Y ) for Problem 6 (c)
Under S
(k; 0) ! k2; 0 0 < k < 1
(0; k) ! (0; 0) 0 < k < 1
(1; k) ! (1; k) 0 < k < 1
(k; 1) ! k2; k 0 < k < 1
and thus S maps RXY into the region
RUV =

(u; v) : 0 < v <
p
u; 0 < u < 1

=

(u; v) : v2 < u < 1; 0 < v < 1

which is pictured in Figure 10.20.
418 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
u
v
0 1
1 u=v2 or v=sqrt(u)
Ruv
Figure 10.20: Support set of (U; V ) for Problem 6 (c)
The Jacobian of the inverse transformation is
@ (x; y)
@ (u; v)
=

1
2
p
u
0
@y
@u
1p
u
= 12u
The joint probability density function of U and V is given by
g (u; v) = f
p
u; v=
p
u
12u

= 4
p
u

v=
p
u
1
2u

=
2v
u
for (u; v) 2 RUV
and 0 otherwise.
6:(d) The marginal probability density function of U is
g1 (u) =
1Z
1
g (u; v) dv
=
p
uZ
0
2v
u
dv =
1
u
h
v2j
p
u
0
i
= 1 for 0 < u < 1
and 0 otherwise. Note that U Uniform(0; 1).
10.3. CHAPTER 4 419
The marginal probability density function of V is
g2 (v) =
1Z
1
g (u; v) du
=
1Z
v2
2v
u
du
= 2v

log uj1v2

= 2v log v2
= 4v log (v) for 0 < v < 1
and 0 otherwise.
6:(e) The support set of X is A1 = fx : 0 < x < 1g and the support set of Y is
A2 = fy : 0 < y < 1g. Since
f (x; y) = 4xy = 2x (2y)
for all (x; y) 2 RXY = A1 A2, therefore by the Factorization Theorem for Independence
X and Y are independent random variables. Also
f1 (x) = 2x for x 2 A1
and
f2 (y) = 2y for y 2 A2
so X and Y have the same distribution. Therefore
E

V 3

= E
h
(XY )3
i
= E

X3

E

Y 3

=

E

X3
2
=
24 1Z
0
x3 (2x) dx
352
=

2
5
x5j10
2
=

2
5
2
=
4
25
420 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
7:(a) The transformation
S : U = X=Y; V = XY
has inverse transformation
X =
p
UV ; Y =
p
V=U
The support set of (X;Y ), picture in Figure 10.21, is
RXY = f(x; y) : 0 < x < 1; 0 < y < 1g
x
y
0 1
1
Rxy
Figure 10.21: Support set of (X;Y ) for Problem 7 (a)
Under S
(k; 0) ! (1; 0) 0 < k < 1
(0; k) ! (0; 0) 0 < k < 1
(1; k) !

1
k
; k

0 < k < 1
(k; 1) ! (k; k) 0 < k < 1
and thus S maps RXY into
RUV =

(u; v) : v < u <
1
v
; 0 < v < 1

which is picture in Figure 10.22.
10.3. CHAPTER 4 421
u
v
0 1
1
uv=1
Ruv
v=u
. . .
. . .
Figure 10.22: Support set of (U; V ) for Problem 7 (a)
The Jacobian of the inverse transformation is
@ (x; y)
@ (u; v)
=

p
v
2
p
u
p
u
2
p
v
pv
2u3=2
1
2
p
u
p
v
= 14u + 14u = 12u
The joint probability density function of U and V is given by
g (u; v) = f
p
uv;
p
v=u
12u

= 4
p
uv
p
v=u

1
2u

=
2v
u
for (u; v) 2 RUV
and 0 otherwise.
7:(b) The support set of (U; V ) is RUV =

(u; v) : v < u < 1v ; 0 < v < 1

which is not
rectangular. The support set of U is A1 = fu : 0 < u <1g and the support set of V is
A2 = fv : 0 < v < 1g.
Consider the point

1
2 ;
3
4

=2 RUV so g

1
2 ;
3
4

= 0. Since 12 2 A1 then g1

1
2

> 0. Since
3
4 2 A2 then g2

3
4

> 0. Therefore g1

1
2

g2

3
4

> 0 so
g

1
2
;
3
4

= 0 6= g1

1
2

g2

3
4

and U and V are not independent random variables.
422 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
7:(c) The marginal probability density function of V is given by
g2 (v) =
1Z
1
g (u; v) du
= 2v
1=vZ
v
1
u
du
= 2v [ln v] j1=vv
= 4v ln v for 0 < v < 1
and 0 otherwise.
The marginal probability density function of U is given by
g1 (u) =
1Z
1
g (u; v) dv
=
8>>>><>>>>:
1
u
uR
0
2vdv if 0 < u < 1
1
u
1=uR
0
2vdv if u 1
=
8><>:
u if 0 < u < 1
1
u3
if u 1
and 0 otherwise.
10.3. CHAPTER 4 423
8: Since X Uniform(0; ) and Y Uniform(0; ) independently the joint probability
density function of X and Y is
f (x; y) =
1
2
for (x; y) 2 RXY
where
RXY = f(x; y) : 0 < x < ; 0 < y < g
which is pictured in Figure 10.23.
x
y
0 q
q
Rxy
Figure 10.23: Support set of (X;Y ) for Problem 8
The transformation
S : U = X Y; V = X + Y
has inverse transformation
X =
1
2
(U + V ) ; Y =
1
2
(V U)
Under S
(k; 0) ! (k; k) 0 < k <
(0; k) ! (k; k) 0 < k <
(; k) ! ( k; + k) 0 < k <
(k; ) ! (k ; k + ) 0 < k <
S maps RXY into
RUV = f(u; v) : u < v < 2 + u; < u 0 or u < v < 2 u; 0 < u < g
which is pictured in Figure 10.24.
424 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
u
v
0- q q
2q
v =uv =-u
v =2q-uv =2q+u
Ruv
Figure 10.24: Support set of (U; V ) for Problem 8
The Jacobian of the inverse transformation is
@ (x; y)
@ (u; v)
=
12 1212 12
= 14 + 14 = 12
The joint probability density function of U and V is given by
g (u; v) = f

1
2
(u+ v) ;
1
2
(v u)
12

=
1
22
for (u; v) 2 RUV
and 0 otherwise.
10.3. CHAPTER 4 425
The marginal probability density function of U is
g1 (u) =
1Z
1
g (u; v) dv
=
8>>>><>>>>:
1
22
2+uR
u
dv = 1
22
(2 + u+ u) for < u 0
1
22
2uR
u
dv = 1
22
(2 u u) for 0 < u <
=
8><>:
u+
2
for < u 0
u
2
for 0 < u <
=
juj
2
for < u <
and 0 otherwise.
426 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
9: (a) The joint probability density function of (Z1; Z2) is
f (z1; z2) =
1p
2
ez
2
1=2
1p
2
ez
2
2=2
=
1
2
e(z
2
1+z
2
2)=2 for (z1; z2) 2 <2
The support set of (Z1; Z2) is <2. The transformation
X1 = 1 + 1Z1; X2 = 2 + 2
h
Z1 +

1 21=2 Z2i
is a one-to-one transformation from <2 into <2. The inverse transformation is
Z1 =
X1 1
1
; Z2 =
1p
1 2

X2 2
2


X1 1
1

The Jacobian of the inverse transformation is
@ (z1; z2)
@ (x1; x2)
=

1
1
0
@z2
@x2
1
2(12)1=2
= 112p1 2 = 1jj1=2
Note that
z21 + z
2
2
=

x1 1
1
2
+
1
(1 2)
"
x2 2
2
2
2

x1 1
1

x2 2
2

+ 2

x1 1
1
2#
=
1
(1 2)
"
x1 1
1
2
2

x1 1
1

x2 2
2

+

x2 2
2
2#
= (x ) 1 (x )T
where
x =
h
x1 x2
i
and =
h
1 2
i
Therefore the joint probability density function of X =
h
X1 X2
i
is
g (x) =
1
2 jj1=2
exp
1
2
(x ) 1 (x )T

for x 2 <2
and thus X BVN(;).
9: (b) Since Z1 N(0; 1) and Z1 N(0; 1) independently we know Z21 2 (1) and Z22
2 (1) and Z21 + Z
2
2 2 (2). From (a) Z21 + Z22 = (X ) 1 (X )T . Therefore
(X ) 1 (X )T 2 (2).
10.3. CHAPTER 4 427
10: (a) The joint moment generating function of U and V is
M (s; t) = E

esU+tV

= E
h
es(X+Y )+t(XY )
i
= E
h
e(s+t)Xe(st)Y
i
= E
h
e(s+t)X
i
E
h
e(st)Y
i
since X and Y are independent random variables
= MX (s+ t)MY (s t)
=

e(s+t)+
2(s+t)2=2

e(st)+
2(st)2=2

since X N ; 2 and Y N ; 2
= e2s+
2(2s2+2t2)=2
=

e(2)s+(2
2)s2=2

e(2
2)t2=2

for s 2 <; t 2 <
10: (b) The moment generating function of U is
MU (s) = M (s; 0)
= e(2)s+(2
2)s2=2 for s 2 <
which is the moment generating function of a N

2; 22

random variable. The moment
generating function of V is
MV (t) = M (0; t)
= e(2
2)t2=2 for t 2 <
which is the moment generating function of a N

0; 22

random variable.
Since
M (s; t) = MU (s)MV (t) for all s 2 <; t 2 <
therefore by Theorem 3.8.6, U and V are independent random variables.
Also by the Uniqueness Theorem for Moment Generating Functions U N2; 22
and V N0; 22.
428 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
12: The transformation de…ned by
Y1 =
X1
X1 +X2
; Y2 =
X1 +X2
X1 +X2 +X3
; Y3 = X1 +X2 +X3
has inverse transformation
X1 = Y1Y2Y3; X2 = Y2Y3 (1 Y1) ; X3 = Y3 (1 Y2)
Let
RX = f(x1; x2; x3) : 0 < x1 <1; 0 < x2 <1; 0 < x3 <1g
and
RY = f(y1; y2; y3) : 0 < y1 < 1; 0 < y2 < 1; 0 < y3 <1g
The transformation from (X1; X2; X3)! (Y1; Y2; Y3) maps RX into RY .
The Jacobian of the transformation from (X1; X2; X3)! (Y1; Y2; Y3) is
@ (x1; x2; x3)
@ (y1; y2; y3)
=

@x1
@y1
@x1
@y2
@x1
@y3
@x2
@y1
@x2
@y2
@x2
@y3
@x3
@y1
@x3
@y2
@x3
@y3

=

y2y3 y1y3 y1y2
y2y3 y3 (1 y1) y2 (1 y1)
0 y3 1 y2

= y3

(1 y1) y22y3 + y1y22y3

+ (1 y2)

y2y
2
3 (1 y1) + y1y2y23

= y3

y22y3 y1y22y3 + y1y22y3

+ (1 y2)

y2y
2
3 y1y2y23 + y1y2y23

= y22y
2
3 + y2y
2
3 y22y23
= y2y
2
3
Since the entries in @(x1;x2;x3)@(y1;y2;y3) are all continuous functions and
@(x1;x2;x3)
@(y1;y2;y3)
6= 0 for (y1; y2; y3) 2
RY therefore by the Inverse Mapping Theorem the transformation is one-to-one and has
an inverse.
Since X1; X2; X3 are independent Exponential(1) random variables the joint probability
density function of (X1; X2; X3) is
f (x1; x2; x3) = e
x1x2x3 for (x1; x2; x3) 2 RX
The joint probability density function of (Y1; Y2; Y3) is
g (y1; y2; y3) = e
y3 y2y23
= y2y
2
3e
y3 for (y1; y2; y3) 2 RY
and 0 otherwise.
10.3. CHAPTER 4 429
Let
A1 = fy1 : 0 < y1 < 1g
A2 = fy2 : 0 < y2 < 1g
A3 = fy3 : 0 < y3 <1g
g1 (y1) = 1 for y1 2 A1
g2 (y2) = 2y2 for y2 2 A2
and
g3 (y3) =
1
2
y23 exp (y3) for y3 2 A3
Since g (y1; y2; y3) = g1 (y1) g2 (y2) g3 (y3) for all (y1; y2; y3) 2 A1A2A3 therefore by the
Factorization Theorem for Independence, (Y1; Y2; Y3) are independent random variables.
Since Z
Ai
gi (yi) dyi = 1 for i = 1; 2; 3
therefore the marginal probability density function of Yi is equal to gi (yi) ; i = 1; 2; 3.
Note that Y1 Uniform(0; 1) = Beta(1; 1) ; Y2 Beta(2; 1) ; and Y3 Gamma(3; 1)
independently.
430 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
13: Let
RY = f(y1; y2; y3) : 0 y1 <1; 0 y2 < 2; 0 y3 < g
and RX = <3. The transformation from (X1; X2; X3)! (Y1; Y2; Y3) de…ned by
X1 = Y1 cosY2 sinY3; X2 = Y1 sinY2 sinY3; X3 = Y1 cosY3
maps RX into RY .
The Jacobian of the transformation from (X1; X2; X3)! (Y1; Y2; Y3) is
@ (x1; x2; x3)
@ (y1; y2; y3)
=

@x1
@y1
@x1
@y2
@x1
@y3
@x2
@y1
@x2
@y2
@x2
@y3
@x3
@y1
@x3
@y2
@x3
@y3

=

cos y2 sin y3 y1 sin y2 sin y3 y1 cos y2 cos y3
sin y2 sin y3 y1 cos y2 sin y3 y1 sin y2 cos y3
cos y3 0 sin y3

= y21 sin y3

cos y2 sin y3 sin y2 cos y2 cos y3
sin y2 sin y3 cos y2 sin y2 cos y3
cos y3 0 sin y3

= y21 sin y3[cos y3
sin2 y2 cos y3 cos2 y2 cos y3
sin y3

cos2 y2 sin y3 + sin
2 y2 sin y3

]
= y21 sin y3
cos2 y3 sin2 y3
= y21 sin y3
Since the entries in @(x1;x2;x3)@(y1;y2;y3) are all continuous functions and
@(x1;x2;x3)
@(y1;y2;y3)
6= 0 for
(y1; y2; y3) 2 f(y1; y2; y3) : 0 < y1 <1; 0 y2 < 2; 0 < y3 < g
therefore by the Inverse Mapping Theorem the transformation is one-to-one and has an
inverse.
Since X1; X2; X3 are independent N(0; 1) random variables the joint probability density
function of (X1; X2; X3) is
f (x1; x2; x3) = (2)
3=2 exp

1
2

x21 + x
2
2 + x
2
2

for (x1; x2; x3) 2 <3
Now
x21 + x
2
2 + x
2
2 = (y1 cos y2 sin y3;)
2 + (y1 sin y2 sin y3)
2 + (y1 cos y3)
2
= y21

cos2 y2 sin
2 y3; + sin
2 y2 sin
2 y3 + cos
2 y3

= y21

sin2 y3;

cos2 y2 + sin
2 y2

+ cos2 y3

= y21

sin2 y3; + cos
2 y3

= y21
10.3. CHAPTER 4 431
The joint probability density function of (Y1; Y2; Y3) is
g (y1; y2; y3) = (2)
3=2 exp

1
2
y21
y21 sin y3
= (2)3=2 exp

1
2
y21

y21 sin y3 for (y1; y2; y3) 2 RY
and 0 otherwise.
Let
A1 = fy1 : y1 0g
A2 = fy2 : 0 y2 2g
A3 = fy3 : 0 y3 g
g1 (y1) =
2p
2
y21 exp

1
2
y21

for y1 2 A1
g2 (y2) =
1
2
for y2 2 A2
and
g3 (y3) =
1
2
sin y3 for y3 2 A3
Since g (y1; y2; y3) = g1 (y1) g2 (y2) g3 (y3) for all (y1; y2; y3) 2 A1A2A3 therefore by the
Factorization Theorem for Independence, (Y1; Y2; Y3) are independent random variables.
Since Z
Ai
gi (yi) dyi = 1 for i = 1; 2; 3
therefore the marginal probability density function of Yi is equal to gi (yi) ; i = 1; 2; 3.
432 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
15: Since X 2 (n), the moment generating function of X is
MX (t) =
1
(1 2t)n=2
for t <
1
2
Since U = X + Y 2 (m), the moment generating function of U is
MU (t) =
1
(1 2t)m=2
for t <
1
2
The moment generating function of U can also be obtained as
MU (t) = E

etU

= E

et(X+Y )

= E

etX

E

etY

since X and Y are independent random variables
= MX (t)MY (t)
By rearranging MU (t) = MX (t)MY (t) we obtain
MY (t) =
MU (t)
MX (t)
=
(1 2t)m=2
(1 2t)n=2
=
1
(1 2t)(mn)=2
for t <
1
2
which is the moment generating function of a 2 (m n) random variable. Therefore by
the Uniqueness Theorem for Moment Generating Functions Y 2 (m n).
10.3. CHAPTER 4 433
16:(a) Note that
nP
i=1

Xi X

=

nP
i=1
Xi

n X = 0 and
nP
i=1
(si s) = 0
Therefore
nP
i=1
tiXi =
nP
i=1

si s+ s
n

Xi X + X

(10.17)
=
nP
i=1
h
si

Xi X

+
s
n
s

Xi X

+ (si s) X + s
n
X
i
=
nP
i=1
siUi +
s
n
s
nP
i=1

Xi X

+ X
nP
i=1
(si s) + s X
=
nP
i=1
siUi + s X
Also since X1; X2; :::; Xn are independent N

; 2

random variables
E

exp

nP
i=1
tiXi

=
nY
i=1
E [exp (tiXi)] =
nY
i=1
exp

ti +
1
2
2t2i

(10.18)
= exp


nP
i=1
ti +
1
2
2
nP
i=1
t2i

Therefore by (10.17) and (10.18)
E

exp

nP
i=1
siUi + s X

= E

exp

nP
i=1
tiXi

(10.19)
= exp


nP
i=1
ti +
1
2
2
nP
i=1
t2i

16:(b)
nP
i=1
ti =
nP
i=1

si s+ s
n

=
nP
i=1
(si s) + n s
n
= 0 + s = s (10.20)
nP
i=1
t2i =
nP
i=1

si s+ s
n
2
(10.21)
=
nP
i=1
(si s)2 + 2 s
n
nP
i=1
(si s) +
nP
i=1
s
n
2
=
nP
i=1
(si s)2 + 0 + s
2
n
=
nP
i=1
(si s)2 + s
2
n
434 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
16:(c)
M (s1; :::; sn; s) = E

exp

nP
i=1
siUi + s X

= E

exp

nP
i=1
tiXi

= exp


nP
i=1
ti +
2
2
nP
i=1
t2i

by (10.19)
= exp

s+
1
2
2

nP
i=1
(si s)2 + s
2
n

by (10.20) and (10.21)
= exp

s+
1
2
2

s2
n

exp

1
2
2
nP
i=1
(si s)2

16:(d) Since
M X (s) = M (0; :::; 0; s) = exp

s+
1
2
2

s2
n

and
MU (s1; :::; sn) = M (s1; :::; sn; 0) = exp

1
2
2
nP
i=1
(si s)2

we have
M (s1; :::; sn; s) = M X (s)MU (s1; :::; sn)
By the Independence Theorem for Moment Generating Functions, X and U = (U1; U2; : : : ; Un)
are independent random variables. Therefore by Chapter 4, Problem 1, X and
nP
i=1
U2i =
nP
i=1

Xi X
2 are independent random variables.
10.4. CHAPTER 5 435
10.4 Chapter 5
1: (a) Since Yi Exponential(; 1), i = 1; 2; : : : independently then
P (Yi > x) =
1Z
x
e(y)dy = e(x) for x > ; i = 1; 2; : : : (10.22)
and for x >
Fn (x) = P (Xn x) = P (min (Y1; Y2; :::; Yn) x) = 1 P (Y1 > x; Y2 > x:::; Yn > x)
= 1
nQ
i=1
P (Yi > x) since Yi’s are independent random variables
= 1 en(x) using (10:22) (10.23)
Since
lim
n!1Fn (x) =
8<:1 if x > 0 if x <
therefore
Xn !p (10.24)
(b) By (10.24) and the Limit Theorems
Un =
Xn

!p 1
(c) Now
P (Vn v) = P [n (Xn ) < v] = P

Xn v
n
+

= 1 en(v=n+) using (10:23)
= 1 ev for v 0
which is the cumulative distribution function of an Exponential(1) random variable. There-
fore Vn Exponential(1) for n = 1; 2; ::: which implies
Vn !D V Exponential (1)
(d) Since
P (Wn w) = P

n2 (Xn ) < w

= P

Xn v
n2
+

= 1 en(w=n2+)
= 1 ew=n for w 0
therefore
lim
n!1P (Wn w) = 0 for all w 2 <
which is not a cumulative distribution function. Therefore Wn has no limiting distribution.
436 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
2: We …rst note that
P (Yn y) = P (max (X1; :::; Xn) y)
= P (X1 y; :::; Xn y)
=
nQ
i=1
P (Xi y) since X 0is are independent random variables
=
nQ
i=1
F (y)
= [F (y)]n for y 2 < (10.25)
Since F is a cumulative distribution function, F takes on values between 0 and 1. Therefore
the function n [1 F ()] takes on values between 0 and n. Gn (z) = P (Zn z), the
cumulative distribution function of Zn, equals 0 for z 0 and equals 1 for z n. For
0 < z < n
Gn (z) = P (Zn z)
= P (n [1 F (Yn)] z)
= P

F (Yn) 1 z
n

Let A be the support set of Xi. F (x) is an increasing function for x 2 A and therefore has
an inverse, F1, which is de…ned on the interval (0; 1). Therefore for 0 < z < n
Gn (z) = P

F (Yn) 1 z
n

= P

Yn F1

1 z
n

= 1 P

Yn < F
1

1 z
n

= 1
h
F

F1

1 z
n
in
by (10.25)
= 1

1 z
n
n
Since
lim
n!1
h
1

1 z
n
ni
= 1 ez for z > 0
therefore
lim
n!1Gn (z) =
8<:0 if z < 01 ez if z > 0
which is the cumulative distribution function of a Exponential(1) random variable. There-
fore by the de…nition of convergence in distribution
Zn !D Z Exponential (1)
10.4. CHAPTER 5 437
3: The moment generating function of a Poisson() random variable is
M (t) = exp



et 1 for t 2 <
The moment generating function of
Yn =
p
n

X = 1p
n
nP
i=1
Xi
p
n
is
Mn (t) = E

etYn

= E

exp

pnt+

tp
n
nP
i=1
Xi

= e
p
ntE

exp

tp
n
nP
i=1
Xi

= e
p
nt
nQ
i=1
E

exp

tp
n
Xi

since the X 0is are independent random variables
= e
p
nt

M

tp
n
n
= e
p
nt exp
h
n

et=
p
n 1
i
= exp
h
pnt+ n

et=
p
n 1
i
for t 2 <
and
logMn (t) =
p
nt+ n

et=
p
n 1

for t 2 <
By Taylor’s Theorem
ex = 1 + x+
x2
2
+
ec
3!
x3
for some c between 0 and x. Therefore
et=
p
n = 1 +
tp
n
+
1
2

tp
n
2
+
1
3!

tp
n
3
ecn
= 1 +
tp
n
+
1
2

t2
n

+
1
3!

t3
n3=2

ecn
for some cn between 0 and t=
p
n.
438 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore
logMn (t) =
p
nt+ n

et=
p
n 1

= pnt+ n

tp
n
+
1
2

t2
n

+
1
3!

t3
n3=2

ecn

=
1
2
t2 +

t3
3!

1p
n

ecn for t 2 <
Since
lim
n!1 cn = 0
it follows that
lim
n!1 e
cn = e0 = 1
Therefore
lim
n!1 logMn (t) = limn!1

1
2
t2 +

t3
3!

1p
n

ecn

=
1
2
t2 +

t3
3!

(0) (1)
=
1
2
t2 for t 2 <
and
lim
n!1Mn (t) = e
1
2
t2 for t 2 <
which is the moment generating function of a N(0; ) random variable.
Therefore by the Limit Theorem for Moment Generating Functions
Yn !D Y N (0; )
10.4. CHAPTER 5 439
4: The moment generating function of an Exponential() random variable is
M (t) =
1
1 t for t <
1

Since X1; X2; : : : ; Xn are independent random variables, the moment generating function
of
Zn =
1p
n

nP
i=1
Xi n

=
1p
n
nP
i=1
Xi
p
n
is
Mn (t) = E

etZn

= E

exp

t

1p
n
nP
i=1
Xi
p
n

= E

e
p
nt exp

tp
n
nP
i=1
Xi

= e
p
nt
nQ
i=1
E

exp

tp
n
Xi

= e
p
nt

M

tp
n
n
= e
p
nt

1
1 tp
n
!n
=

et=
p
n

1 tp
n
n
for t <
p
n

By Taylor’s Theorem
ex = 1 + x+
x2
2
+
ec
3!
x3
for some c between 0 and x. Therefore
et=
p
n = 1 +
tp
n
+
1
2!

tp
n
2
+
1
3!

tp
n
3
ecn
= 1 +
tp
n
+
1
2

2t2
n

+
1
3!

3t3
n3=2

ecn
for some cn between 0 and t=
p
n.
440 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore
Mn (t) =

1 +
tp
n
+
1
2

2t2
n

+
1
3!

3t3
n3=2

ecn

1 tp
n
n
= [1 +
tp
n
+
1
2

2t2
n

+
1
3!

3t3
n3=2

ecn
tp
n


tp
n

tp
n

1
2

2t2
n

tp
n

1
3!

3t3
n3=2

ecn

tp
n

]n
=

1 1
2

2t2
n

1
2

3t3
n3=2

+
1
3!

3t3
n3=2

4t4
n2

ecn
n
=

1 1
2

2t2
n

+
(n)
n
n
where
(n) = 1
2

3t3
n1=2

+
1
3!

3t3p
n

4t4
n

ecn
Since
lim
n!1 cn = 0
it follows that
lim
n!1 e
cn = e0 = 1
Also
lim
n!1
3t3p
n
= 0; lim
n!1
4t4
n
= 0
and therefore
lim
n!1 (n) = 0
Thus by Theorem 5.1.2
lim
n!1Mn (t) = limn!1

1 1
2

2t2
n

+
(n)
n
n
= e
2t2=2 for t 2 <
which is the moment generating function of a N

0; 2

random variable.
Therefore by the Limit Theorem for Moment Generating Functions
Zn !D Z N

0; 2

10.4. CHAPTER 5 441
6: We can rewrite S2n as
S2n =
1
n 1
nP
i=1

Xi Xn
2
=
1
n 1
nP
i=1

(Xi )

Xn
2
=
1
n 1

nP
i=1
(Xi )2 2

Xn
nP
i=1
(Xi ) + n

Xn
2
=
1
n 1

nP
i=1
(Xi )2 2

Xn
nP
i=1
Xi n

+ n

Xn
2 (10.26)
=
1
n 1

nP
i=1
(Xi )2 2

Xn

n

Xn

+ n

Xn
2 (10.27)
=
1
n 1

nP
i=1
(Xi )2 n

Xn
2
= 2

n
n 1
"
1
n
nP
i=1

Xi

2

Xn

2#
(10.28)
Since X1; X2; : : : are independent and identically distributed random variables with
E (Xi) = and V ar (Xi) = 2 <1, then
Xn !p (10.29)
by the Weak Law of Large Numbers.
By (10.29) and the Limit Theorems Xn

2
!p 0 (10.30)
Let Wi =

Xi

2
, i = 1; 2; : : : with
E (Wi) = E
"
Xi

2#
= 1
and V ar (Wi) <1 since E

X4i

<1.
Since W1;W2; : : : are independent and identically distributed random variables with
E (Wi) = 1 and V ar (Wi) <1, then
Wn =
1
n
nX
i=1

Xi

2
!p 1 (10.31)
by the Weak Law of Large Numbers.
By (10.28), (10.30), (10.31) and the Limit Theorems
S2n !p 2 (1) (1 + 0) = 2
442 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and therefore
Sn

!p 1 (10.32)
by the Limit Theorems.
Since X1; X2; : : : are independent and identically distributed random variables with
E (Xi) = and V ar (Xi) = 2 <1, then
p
n

Xn


!D Z N (0; 1) (10.33)
by the Central Limit Theorem.
By (10.32), (10.33) and Slutsky’s Theorem
Tn =
p
n

Xn

Sn
=
p
n( Xn)

Sn

!D Z
1
= Z N (0; 1)
10.4. CHAPTER 5 443
7: (a) Let Y1; Y2; : : : be independent Binomial(1; ) random variables with
E (Yi) =
and
V ar (Yi) = (1 )
for i = 1; 2; : : :
By 4.3.2(1)
nP
i=1
Yi Binomial (n; )
Since Y1; Y2; : : : are independent and identically distributed random variables with
E (Yi) = and V ar (Yi) = (1 ) <1, then by the Weak Law of Large Numbers
1
n
nP
i=1
Yi = Yn !p
Since Xn and
nP
i=1
Yi have the same distribution
Tn =
Xn
n
!p (10.34)
(b) By (10.34) and the Limit Theorems
Un =
Xn
n

1 Xn
n

!p (1 ) (10.35)
(c) Since Y1; Y2; : : : are independent and identically distributed random variables with
E (Yi) = and V ar (Yi) = (1 ) <1, then by the Central Limit Theorem
p
n

Yn
p
(1 ) !D Z N (0; 1)
Since Xn and
nP
i=1
Yi have the same distribution
Sn =
p
n

Xn
n
p
(1 ) !D Z N (0; 1) (10.36)
By (10.36) and Slutsky’s Theorem
Wn =
p
n

Xn
n


= Sn
p
(1 )!D W =
p
(1 )Z
Since Z N(0; 1), W = p (1 )Z N(0; (1 )) and therefore
Wn !D W N (0; (1 )) (10.37)
444 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
(d) By (10.35), (10.37) and Slutsky’s Theorem
Zn =
Wnp
Un
!D Wp
(1 ) =
Z
p
(1 )p
(1 ) = Z N (0; 1)
(e) To determine the limiting distribution of
Vn =
p
n
"
arcsin
r
Xn
n
!
arcsin
p

#
let g (x) = arcsin (
p
x) and a = . Then
g0 (x) =
1q
1 (px)2

1
2
p
x

=
1
2
p
x (1 x)
and
g0 (a) = g0 ()
=
1
2
p
(1 )
By (10.37) and the Delta Method
Vn !D 1
2
p
(1 )Z
p
(1 ) = Z
2
N

0;
1
4

(f) The limiting variance of Wn is equal to (1 ) which depends on . The limiting
variance of Zn is 1 which does not depend on . The limiting variance of Vn is 1=4 which
does not depend on . The transformation g (x) = arcsin (
p
x) is a variance-stabilizing
transformation for the Binomial distribution.
10.4. CHAPTER 5 445
8: X1; X2; : : : are independent Geometric() random variables with
E (Xi) =
1

and
V ar (Xi) =
1
2
i = 1; 2; : : :
(a) Since X1; X2; : : : are independent and identically distributed random variables with
E (Xi) =
1
and V ar (Xi) =
1
2
<1, then by the Weak Law of Large Numbers
Xn =
Yn
n
=
1
n
nP
i=1
Xi !p 1

(10.38)
(b) Since X1; X2; : : : are independent and identically distributed random variables with
E (Xi) =
1
and V ar (Xi) =
1
2
<1, then by the Central Limit Theorem
p
n

Xn 1
q
1
2
!D Z N (0; 1) (10.39)
By (10.39) and Slutsky’s Theorem
Wn =
p
n

Xn 1


=
r
1
2
p
n

Xn 1
q
1
2
!D W =
r
1
2
Z
Since Z N(0; 1), W =
q
1
2
Z N

0; 1
2

and therefore
Wn =
p
n

Xn 1


!D W N

0;
1
2

(10.40)
(c) By (10.38) and the Limit Theorems
Vn =
1
1 + Xn
!p 1
1 + 1
=

+ 1 = (10.41)
(d) To …nd the distribution of
Zn =
p
n (Vn )p
V 2n (1 Vn)
we …rst note that
Wn =
p
n

Xn

1


=
p
n

Xn

1

1

=
p
n

Xn + 1 1


=
p
n

1
Vn
1


446 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
Therefore by (10.40)
p
n

1
Vn
1


!D W N

0;
1
2

(10.42)
Next we determine the limiting distribution of
p
n (Vn )
which is the numerator of Zn. Let g (x) = x1 and a = 1. Then
g0 (x) = x2
and
g0 (a) = g0

1

= 2
By (10.42) and the Delta Theorem
p
n (Vn )!D
2r1

Z = p1 Z N 0; 2 (1 ) (10.43)
By (10.43) and Slutsky’s Theorem
1q
2 (1 )
p
n (Vn )!D 1q
2 (1 )

p1 Z

= Z N (0; 1)
or p
n (Vn )q
2 (1 )
!D Z N (0; 1) (10.44)
since if Z N(0; 1) then Z N(0; 1) by symmetry of the N(0; 1) distribution.
Since
Zn =
p
n (Vn )p
V 2n (1 Vn)
=
p
n(Vn)p
2(1)q
V 2n (1Vn)
2(1)
(10.45)
then by (10.41) and the Limit Theoremss
V 2n (1 Vn)
2 (1 ) !p
s
2 (1 )
2 (1 ) = 1 (10.46)
By (10.44), (10.45), (10.46), and Slutsky’s Theorem
Zn !D Z
1
= Z N (0; 1)
10.4. CHAPTER 5 447
9: X1; X2; : : : ; Xn are independent Gamma(2; ) random variables with
E (Xi) = 2 and V ar (Xi) = 22 for i = 1; 2; : : : ; n
(a) Since X1; X2; : : : are independent and identically distributed random variables with
E (Xi) = 2 and V ar (Xi) = 22 <1 then by the Weak Law of Large Numbers
Xn =
1
n
nP
i=1
Xi !p 2 (10.47)
By (10.47) and the Limit Theorems
Xnp
2
!p 2p
2
=
p
2
and p
2
Xn=
p
2
!p 1 (10.48)
(b) Since X1; X2; : : : are independent and identically distributed random variables with
E (Xi) = 2 and V ar (Xi) = 22 <1 then by the Central Limit Theorem
Wn =
p
n

Xn 2

p
22
=
p
n

Xn 2

p
2
!D Z N (0; 1) (10.49)
By (10.49) and Slutsky’s Theorem
Vn =
p
n

Xn 2
!D p2Z N 0; 22 (10.50)
(c) By (10.48), (10.49) and Slutsky’s Theorem
Zn =
p
n

Xn 2

Xn=
p
2
=
"p
n

Xn 2

p
2
#" p
2
Xn=
p
2
#
!D Z (1) = Z N (0; 1)
(d) To determine the limiting distribution of
Un =
p
n

log

Xn
log (2)
let g (x) = log x and a = 2. Then g0 (x) = 1=x and g0 (a) = g0 (2) = 1= (2). By (10.50)
and the Delta Method
Un !D 1
2
p
2Z =
Zp
2
N

0;
1
2

(e) The limiting variance of Zn is 1 which does not depend on . The limiting variance
of Un is 1=2 which does not depend on . The transformation g (x) = log x is a variance-
stabalizing transformation for the Gamma distribution.
448 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
10.5 Chapter 6
3: (a) If Xi Geometric() then
f (x; ) = (1 )x for x = 0; 1; :::; 0 < < 1
The likelihood function is
L () =
nQ
i=1
f (xi; )
=
nQ
i=1
(1 )xi
= n (1 )t for 0 < < 1
where
t =
nP
i=1
xi
The log likelihood function is
l () = logL () = n log + t log (1 ) for 0 < < 1
The score function is
S () = l0 () =
n

t
1
=
n (n+ t)
(1 ) for 0 < < 1
S () = 0 for =
n
n+ t
Since S () > 0 for 0 < < n= (n+ t) and S () < 0 for n= (n+ t) < < 1; therefore by
the …rst derivative test the maximum likelihood estimate of is
^ =
n
n+ t
and the maximum likelihood estimator is
^ =
n
n+ T
where T =
nP
i=1
Xi
The information function is
I () = S0 () = l00 ()
=
n
2
+
t
(1 )2 for 0 < < 1
Since I () > 0 for all 0 < < 1, the graph of l () is concave down and this also con…rms
that ^ = n= (n+ t) is the maximum likelihood estimate.
10.5. CHAPTER 6 449
3: (b) The observed information is
I(^) =
n
^
2 +
t
(1 ^)2 =
n
n
n+t
2 + t( tn+t)2
=
(n+ t)2
n
+
(n+ t)2
t
=
(n+ t)3
nt
=
n
^
2
(1 ^)
The expected information is
E

n
2
+
T
(1 )2

=
n
2
+
E (T )
(1 )2
=
n
2
+
n (1 ) =
(1 )2
= n

(1 ) +
2 (1 )

=
n
2 (1 ) for 0 < < 1
3: (c) Since
= E (Xi) =
(1 )

then by the invariance property of maximum likelihood estimators the maximum likelihood
estimator of = E (Xi) is
^ =
1 ^
^
=
T
n
= X
3: (d) If n = 20 and t =
20P
i=1
xi = 40 then the maximum likelihood estimate of is
^ =
20
20 + 40
=
1
3
The relative likelihood function of is given by
R () =
L ()
L(^)
=
20 (1 )40
1
3
20 2
3
40 for 0 1
A graph of R () is given in Figure 10.25A 15% likelihood interval is found by solving
R () = 0:15. The 15% likelihood interval is [0:2234; 0:4570].
R (0:5) = 0:03344 implies that = 0:5 is outside a 10% likelihood interval and we would
conclude that = 0:5 is not a very plausible value of given the data.
450 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
0.2 0.3 0.4 0.5
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
q
R
( q
)
Figure 10.25: Relative likelihood function for Problem 3
4: Since (X1; X2; X3) Multinomial

n; 2; 2 (1 ) ; (1 )2

the likelihood function is
L (1; 2) =
n!
x1!x2! (n x1 x2)!

2
x1 [2 (1 )]x2 h(1 )2inx1x2
=
n!
x1!x2! (n x1 x2)!2
x22x1+x2 (1 )2n2x1x2 for 0 < < 1
The log likelihood function is
l () = log

n!
x1!x2! (n x1 x2)!

+ x2 log 2
+ (2x1 + x2) log + (2n 2x1 x2) log (1 ) for 0 < < 1
The score function is
S () =
2x1 + x2

2n (2x1 + x2)
1
=
(2x1 + x2) (1 ) [2n (2x1 + x2)]
(1 )
=
2n+ (2x1 + x2) + (2x1 + x2)
(1 ) for 0 < < 1
S () = 0 if =
2x1 + x2
2n
Since
S () > 0 for 0 < <
2x1 + x2
2n
10.5. CHAPTER 6 451
and
S () < 0 for 1 > >
2x1 + x2
2n
therefore by the …rst derivative test, l () has an absolute maximum at
= (2x1 + x2) = (2n). Thus the maximum likelihood estimate of is
^ =
2x1 + x2
2n
and the maximum likelihood estimator of is
^ =
2X1 +X2
2n
The information function is
I () =
2x1 + x2
2
+
2n (2x1 + x2)
(1 )2 for 0 < < 1
and the observed information is
I(^) = I

2x1 + x2
2n

=
(2n)2 (2x1 + x2)
(2x1 + x2)
2 +
(2n)2 [2n (2x1 + x2)]
[2n (2x1 + x2)]2
=
(2n)2
(2x1 + x2)
+
(2n)2
[2n (2x1 + x2)] =
2n
2x1+x2
2n

1 2x1+x22n

=
2n
^

1 ^

Since
X1 Binomial

n; 2

and X2 Binomial (n; 2 (1 ))
E (2X1 +X2) = 2n
2 + n [2 (1 )] = 2n
The expected information is
J () = E

2X1 +X2
2
+
2n (2X1 +X2)
(1 )2

=
2n
2
+
2n (1 )
(1 )2 = 2n

1

+
1
1

=
2n
(1 ) for 0 < < 1
6: (a) Given
P (k children in family; ) = k for k = 1; 2; : : :
P (0 children in family; ) =
1 2
1 for 0 < <
1
2
452 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and the observed data
No. of children 0 1 2 3 4 Total
Frequency observed 17 22 7 3 1 50
the appropriate likelihood function for is based on the multinomial model:
L () =
50!
17!22!7!3!1!

1 2
1
17
22

2
7
3
3
4
1
=
50!
17!22!7!3!1!

1 2
1
17
49 for 0 < <
1
2
or more simply
L () =

1 2
1
17
49 for 0 < <
1
2
The log likelihood function is
l () = 17 log (1 2) 17 log (1 ) + 49 log for 0 < < 1
2
The score function is
S () =
34
1 2 +
17
1 +
49

=
34 2+ 17 22+ 49 1 3 + 22
(1 ) (1 2)
=
982 164 + 49
(1 ) (1 2) for 0 < <
1
2
The information function is
I () =
68
(1 2)2
17
(1 )2 +
49
2
for 0 < <
1
2
6: (b) S () = 0 if
982 164 + 49 = 0 or 2 82
49
+
1
2
= 0
Therefore S () = 0 if
=
82
49
q
82
49
2 4 12
2
=
41
49
1
2
s
82
49
2
2 = 41
49
1
2
s
(82)2 2 (49)2
(49)2
=
41
49
1
98
p
6724 4802 = 41
49
1
98
p
1922
Since 0 < < 12 and =
41
49 +
1
98
p
1922 > 1, we choose
=
41
49
1
98
p
1922
10.5. CHAPTER 6 453
Since
S () > 0 for 0 < < =
41
49
1
98
p
1922 and S () < 0 for =
41
49
1
98
p
1922 < <
1
2
therefore the maximum likelihood estimate of is
^ = =
41
49
1
98
p
1922 0:389381424147286 0:3894
The observed information for the given data is
I(^) =
68
(1 2^)2
17
(1 ^)2 +
49
^
2 1666:88
6: (c) A graph of the relative likelihood function is given in Figure 10.26.A 15% likelihood
0.30 0.35 0.40 0.45
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
q
R
( q
)
Figure 10.26: Relative likelihood function for Problem 6
interval for is [0:34; 0:43].
6: (d) Since R (0:45) 0:0097, = 0:45 is outside a 1% likelihood interval and therefore
= 0:45 is not a plausible value of for these data.
6: (e) The expected frequencies are calculated using
e0 = 50

1 2^
1 ^
!
and ek = 50^
k
for k = 1; 2; : : :
The observed and expected frequencies are
454 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
No. of children 0 1 2 3 4 Total
Observed Frequency: fk 17 22 7 3 1 50
Expected Frequency: ek 18:12 19:47 7:58 2:95 1:15 50
We see that the agreement between observed and expected frequencies is very good and
the model gives a reasonable …t to the data.
7: Show ^ = x(1) is the maximum likelihood estimate. By Example 2.5.3(a) is a location
parameter for this distribution. By Theorem 6.6.4 Q (X; ) = ~ = X(1) is a pivotal
quantity.
P (Q (X; ) q) = P

~ q

= P

X(1) q

= 1 P X(1) q +
= 1
nQ
i=1
e(q+) since P (Xi > x) = e(x) for x >
= 1 enq for q 0
Since
P

~ +
1
n
log (1 p) ~

= P

0 ~ 1
n
log (1 p)

= P

0 Q (X; ) 1
n
log (1 p)

= P

Q (X; ) 1
n
log (1 p)

P (Q (X; ) 0)
= 1 e log(1p) 0
= 1 (1 p) = ph
^ + 1n log (1 p) ; ^
i
is a 100p% con…dence interval for .
Since
P

~ +
1
n
log

1 p
2

~ + 1
n
log

1 + p
2

= P

1
n
log

1 + p
2

~ 1
n
log

1 p
2

= P

1
n
log

1 + p
2

Q (X; ) 1
n
log

1 p
2

= 1 elog( 1p2 )

1 elog( 1+p2 )

= 1
2
+
p
2
+
1
2
+
p
2
= p
10.5. CHAPTER 6 455h
^ + 1n log

1p
2

; ^ + 1n log

1+p
2
i
is a 100p% con…dence interval for .
The interval
h
^ + 1n log (1 p) ; ^
i
is a better choice since it contains ^ while the intervalh
^ + 1n log

1p
2

; ^ + 1n log

1+p
2
i
does not.
8:(a) If x1; x2; :::; xn is an observed random sample from the Gamma

1
2 ;
1


distribution then
the likelihood function is
L () =
nQ
i=1
f (xi; )
=
nQ
i=1
1=2x
1=2
i e
xi


1
2

=

nQ
i=1
xi
1=2


1
2
n
n=2et for > 0
where
t =
nP
i=1
xi
or more simply
L () = n=2et for > 0
The log likelihood function is
l () = logL ()
=
n
2
log t for > 0
and the score function is
S () =
d
d
l () =
n
2
t = n 2t
2
for > 0
S () = 0 for =
n
2t
Since
S () > 0 for 0 < <
n
2t
and
S () < 0 for >
n
2t
therefore by the …rst derivative test l () has a absolute maximum at = n2t . Thus
^ =
n
2t
=
1
2x
is the maximum likelihood estimate of and
~ =
1
2 X
456 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
is the maximum likelihood estimator of .
8:(b) If Xi Gamma

1
2 ;
1


; i = 1; 2; :::; n independently then
E (Xi) =
1
2
and V ar (Xi) =
1
22
and by the Weak Law of Large Numbers
X !p 1
2
and by the Limit Theorems
~ =
1
2 X
!p 1
2

1
2
=
as required.
8:(c) By the Invariance Property of Maximum Likelihood Estimates the maximum likeli-
hood estimate of
= V ar (Xi) =
1
22
is
^ =
1
2^
2 =
1
2

n
2t
2 = 4t22n2 = 2

t
n
2
= 2x2
8:(d) The moment generating function of Xi is
M (t) =
1
1 t
1=2 for t <
The moment generating function of Q = 2
nP
i=1
Xi is
MQ (t) = E

etQ

= E

exp

2t
nP
i=1
Xi

=
nQ
i=1
E [exp (2tXi)] =
nQ
i=1
M (2t)
=

1
1 2t
!n=2
for 2t <
=
1
(1 2t)n=2
for t <
1
2
which is the moment generating function of a 2 (n) random variable. Therefore by the
Uniqueness Theorem for Moment Generating Functions, Q 2 (n).
10.5. CHAPTER 6 457
To construct a 95% equal tail con…dence interval for we …nd a and b such that
P (Q a) = 0:025 = P (Q > b) so that
P (a < Q < b) = P (a < 2T < b) = 0:95
or
P

a
2T
< <
b
2T

= 0:95
so that

a
2t ;
b
2t

is a 95% equal tail con…dence interval for .
For n = 20 we have
P (Q 9:59) = 0:025 = P (Q > 34:17)
For t =
20P
i=1
xi = 6 a 95% equal tail con…dence interval for is

9:59
2(6)
;
34:17
2(6)

= [0:80; 2:85]
Since = 0:7 is not in the 95% con…dence interval it is not a plausible value of in light of
the data.
8:(e) The information function is
I () = d
d
S () =
n
22
for > 0
The expected information is
J () = E [I (;X1; :::; Xn)] = E

n
22

=
n
22
for > 0
and h
J(~)
i1=2
~

=
p
np
2~

~

=
p
np
2~

1
2 X


By the Central Limit Theorem
p
n

X 12

1p
2
=
p
2n

X 1
2

!D Z N (0; 1) (10.51)
Let
g(x) =
1
2x
and a =
1
2
then
g0 (x) =
1
2x2
g(a) = g

1
2

=
1
2

1
2
=
458 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and
g0 (a) = g0

1
2

=
1
2

1
2
2 = 22
By the Delta Theorem and (10.51)
p
2n

1
2 X


!D 22Z N

0; 44

or p
2n

~

!D 22Z N

0; 44

By Slutsky’s Theorem p
2n
22

~

!D Z N (0; 1)
But if Z N(0; 1) then Z N(0; 1) and thus
p
np
2

~

!D Z N (0; 1) (10.52)
Since ~ !p then by the Limit Theorems
~

!p 1 (10.53)
Thus by (10.52), (10.53) and Slutsky’s Theorem
h
J(~)
i1=2
~

=
p
n

~

p
2~
=
p
np
2

~

~

!D Z
1
= Z N (0; 1)
An approximate 95% con…dence interval is given by2664^ 1:96r
J

^
; ^ + 1:96r
J

^

3775
For n = 20 and t =
20P
i=1
xi = 6;
^ =
1
2

6
20
= 5
3
and J

^

=
n
2^
2 =
20
2

5
3
2 = 3:6
and an approximate 95% con…dence interval is given by
5
3
1:96p
3:6
;
5
3
+
1:96p
3:6

= [0:63; 2:70]
10.5. CHAPTER 6 459
For n = 20 and t =
20P
i=1
xi = 6 the relative likelihood function of is given by
R () =
L ()
L(^)
=
10e6
5
3
10
e10
=

3
5
10
e106 for > 0
A graph of R () is given in Figure 10.27.A 15% likelihood interval is found by solving
1 2 3 4
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
q
R
( q
)
Figure 10.27: Relative likelihood function for Problem 8
R () = 0:15. The 15% likelihood interval is [0:84; 2:91].
The exact 95% equal tail con…dence interval [0:80; 2:85], the approximate 95% con…-
dence interval [0:63; 2:70] ; and the 15% likelihood interval [0:84; 2:91] are all approximately
of the same width. The exact con…dence interval and likelihood interval are skewed to the
right while the approximate con…dence interval is symmetric about the maximum likelihood
estimate ^ = 5=3. Approximate con…dence intervals are symmetric about the maximum
likelihood estimate because they are based on a Normal approximation. Since n = 20 the
approximation cannot be completely trusted. Therefore for these data the exact con…dence
interval and the likelihood interval are both better interval estimates for .
R (0:7) = 0:056 implies that = 0:7 is outside a 10% likelihood interval so based on
the likelihood function we would conclude that = 0:7 is not a very plausible value of
given the data. Previously we noted that = 0:7 is also not contained in the exact
95% con…dence interval. Note however that = 0:7 is contained in the approximate 95%
con…dence interval and so based on the approximate con…dence interval we would conclude
that = 0:7 is a reasonable value of given the data. Again the reason for the disagreement
is because n = 20 is not large enough for the approximation to be a good one.
460 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
10.6 Chapter 7
1: Since Xi has cumulative distribution function
F (x; 1; 2) = 1

1
x
2
for x 1; 1 > 0; 2 > 0
the probability density function of Xi is
f (x; 1; 2) =
d
dx
F (x; 1; 2)
=
2
x

1
x
2
for x 1; 1 > 0; 2 > 0
The likelihood function is
L (1; 2) =
nQ
i=1
f (xi; 1; 2)
=
nQ
i=1
2
xi

1
xi
2
if 0 < 1 xi; i = 1; 2; :::; n and 2 > 0
= n2
n2
1

nQ
i=1
xi
21
if 0 < 1 x(1) and 2 > 0
For each value of 2 the likelihood function is maximized over 1 by taking 1 to be as large
as possible subject to 0 < 1 x(1). Therefore for …xed 2 the likelihood is maximized
for 1 = x(1). Since this is true for all values of 2 the value of (1; 2) which maximizes
L (1; 2) will necessarily have 1 = x(1).
To …nd the value of 2 which maximizes L

x(1); 2

consider the function
L2 (2) = L

x(1); 2

= n2x
n2
(1)

nQ
i=1
xi
21
for 2 > 0
and its logarithm
l2 (2) = logL2 (2) = n log 2 + n2 log x(1) (2 + 1)
nP
i=1
log xi
Now
d
d2
l2 (2) = l
0
2 (2) =
n
2
+ n log x(1)
nP
i=1
log xi
=
n
2

nP
i=1
log

xi
x(1)

=
n
2
t
=
n 2t
2
10.6. CHAPTER 7 461
where
t =
nP
i=1
log

xi
x(1)

Now l02 (2) = 0 for 2 = n=t. Since l02 (2) > 0 for 0 < 2 < n=t and l02 (2) < 0 for 2 > n=t
therefore by the …rst derivative test l2 (2) is maximized for 2 = n=t = ^2. Therefore
L2 (2) = L

x(1); 2

is also maximized for 2 = ^2: Therefore the maximum likelihood
estimators of 1 and 2 are given by
^1 = X(1) and ^2 =
n
nP
i=1
log

Xi
X(1)

4: (a) If the events S and H are independent events then P (S \ H) = P (S)P (H) = ,
P (S \ H) = P (S)P ( H) = (1 ), etc.
The likelihood function is
L(; ) =
n!
x11!x12!x21!x22!
()x11 [ (1 )]x12 [(1 )]x21 [(1 ) (1 )]x22
or more simply (ignoring constants with respect to and )
L(; ) = x11+x12 (1 )x21+x22 x11+x21 (1 )x12+x22 for 0 1, 0 1
The log likelihood is
l(; ) = (x11 + x12) log+ (x21 + x22) log(1 ) + (x11 + x21) log + (x12 + x22) log(1 )
for 0 < < 1, 0 < < 1
Since
@l
@
=
x11 + x12

x21 + x22
1 =
x11 + x12

n (x11 + x12)
1 =
x11 + x12 n
(1 )
@l
@
=
x11 + x21

x12 + x22
1 =
x11 + x21

n (x11 + x21)
1 =
x11 + x21 n
(1 )
the score vector is
S(; ) =
h
x11+x12n
(1)
x11+x21n
(1)
i
for 0 < < 1, 0 < < 1
Solving S(; ) = (0; 0) gives the maximum likelihood estimates
^ =
x11 + x12
n
and ^ =
x11 + x21
n
462 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
The information matrix is
I (; ) =
24 @2l@2 @2l@@
@2l@@ @
2l
@2
35
=
24 x11+x122 + n(x11+x12)(1)2 0
0 x11+x21
2
+ n(x11+x21)
(1)2
35
4:(b) Since X11 +X12 = the number of times the event S is observed and P (S) = , then
the distribution of X11 +X12 is Binomial(n; ). Therefore E (X11 +X12) = n and
E

X11 +X12
2
+
n (X11 +X12)
(1 )2

=
n
(1 )
Since X11 +X21 = the number of times the event H is observed and P (H) = , then the
distribution of X11 +X21 is Binomial(n; ). Therefore E (X11 +X21) = n and
E

X11 +X21
2
+
n (X11 +X21)
(1 )2

=
n
(1 )
Therefore the expected information matrix is
J (; ) =
24 n(1) 0
0 n(1)
35
The inverse matrix is
[J (; )]1 =
24 (1)n 0
0 (1)n
35
Also V ar (~) = (1)n and V ar(~) =
(1)
n so the diagonal entries of [J (; )]
1 give us
the variances of the maximum likelihood estimators.
7:(a) If Yi Binomial(1; pi) where pi =

1 + exi
1
and the xi are known constants,
the likelihood function for (; ) is
L (; ) =
nQ
i=1

1
yi

pyii (1 pi)1yi
or more simply
L (; ) =
nQ
i=1
pyii (1 pi)1yi
The log likelihood function is
l (; ) = logL (; ) =
nP
i=1
[yi log (pi) + (1 yi) log (1 pi)]
10.6. CHAPTER 7 463
Note that
@pi
@
=
@
@

1 + exi
1
=
exi
(1 + exi)2
=
1
(1 + exi)
exi
(1 + exi)
= pi (1 pi)
and
@pi
@
=
@
@

1 + exi
1
=
xie
xi
(1 + exi)2
= xi 1
(1 + exi)
exi
(1 + exi)
= xipi (1 pi)
Therefore
@l
@
=
@l
@pi
@pi
@
=
nP
i=1

yi
pi
(1 yi)
(1 pi)

@pi
@
=
nP
i=1

yi (1 pi) (1 yi) pi
pi (1 pi)

pi (1 pi)
=
nP
i=1
[yi (1 pi) (1 yi) pi]
=
nP
i=1
(yi pi)
@l
@
=
@l
@pi
@pi
@
=
nP
i=1

yi
pi
(1 yi)
(1 pi)

@pi
@
=
nP
i=1

yi (1 pi) (1 yi) pi
pi (1 pi)

xipi (1 pi)
=
nP
i=1
xi [yi (1 pi) (1 yi) pi]
=
nP
i=1
xi (yi pi)
The score vector is
S (; ) =
"
@l
@
@l
@
#
=
2664
nP
i=1
(yi pi)
nP
i=1
xi (yi pi)
3775
To obtain the expected information we …rst note that
@2l
@2
=
@
@

@l
@

=
@
@

nP
i=1
(yi pi)

=
nP
i=1
@pi
@
=
nP
i=1
pi (1 pi)
@2l
@@
=
@
@

@l
@

=
@
@

nP
i=1
(yi pi)

=
nP
i=1
@pi
@
=
nP
i=1
xipi (1 pi)
464 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
and
@2l
@2
=
@
@

@l
@

=
@
@

nP
i=1
xi (yi pi)

=
nP
i=1
xi @pi
@
=
nP
i=1
x2i pi (1 pi)
The information matrix is
I (; ) =
24 @2l@2 @2l@@
@2l@@ @
2l
@2
35
=
26664
nP
i=1
pi (1 pi)
nP
i=1
xipi (1 pi)
nP
i=1
xipi (1 pi)
nP
i=1
x2i pi (1 pi)
37775
which is a constant function of the random variables Y1; Y2; :::; Yn and therefore the expected
information is J (; ) = I (; )
5:(b) The maximum likelihood estimates of and are found by solving the equations
S (; ) =
h
@l
@
@l
@
i
=

nP
i=1
(yi pi)
nP
i=1
xi (yi pi)

=
h
0 0
i
which must be done numerically.
Newton’s method is given byh
(i+1) (i+1)
i
=
h
(i) (i)
i
+S

(i); (i)
h
I

(i); (i)
i1
i = 0; 1; :::; convergence
where

(0); (0)

is an initial estimate of (; ).
10.7. CHAPTER 8 465
10.7 Chapter 8
1:(a) The hypothesis H0 : = 0 is a simple hypothesis since the model is completely
speci…ed.
From Example 6.3.6 the likelihood function is
L() = n

nQ
i=1
xi

for > 0
The log likelihood function is
l() = n log +
nP
i=1
log xi for > 0
and the maximum likelihood estimate is
^ =
n
nP
i=1
log xi
The relative likelihood function is
R () =
L()
L(^)
=


^
n nQ
i=1
xi
^
for 0
The likelihood ratio test statistic for H0 : = 0 is
(0;X) = 2 logR (0;X)
= 2 log
"
0
~
n nQ
i=1
Xi
0~#
= 2

n log

0
~



0 ~
nP
i=1
logXi

= 2

n log

0
~

+ n

0 ~
1
n
nP
i=1
logXi

= 2n

log

0
~

+

0 ~
1
~

since

nP
i=1
logXi
n
=
1
~
= 2n

0
~
1

log

0
~

The observed value of the likelihood ratio test statistic is
(0;x) = 2 logR (0;X)
= 2n

0
^
1

log

0
^

466 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
The parameter space is
= f : > 0g which has dimension 1 and thus k = 1. The
approximate p-value is
p-value P (W (0;x)) where W 2 (1)
= 2
h
1 P

Z
p
(0;x)
i
where Z N (0; 1)
(b) If n = 20 and
20P
i=1
log xi = 25 and H0 : = 1 then ^ = 20=25 = 0:8 the observed value
of the likelihood ratio test statistic is
(0;x) = 2 logR (0;X)
= 2 (20)

1
0:8
1

log

1
0:8

= 40 [0:25 log (1:25)]
= 1:074258
and
p-value P (W 1:074258) where W 2 (1)
= 2
h
1 P

Z
p
1:074258
i
where Z N (0; 1)
= 0:2999857
calculated using R. Since p-value > 0:1 there is no evidence against H0 : = 1 based on
the data.
4: Since
= f(1; 2) : 1 > 0; 2 > 0g which has dimension k = 2 and

0 = f(1; 2) : 1 = 2; 1 > 0; 2 > 0g which has dimension q = 1 and the hypothesis is
composite.
From Example 6.5.2 the likelihood function for an observed random sample x1; x2; : : : ; xn
from an Weibull(2; 1) distribution is
L1(1) =
2n
1 exp

1
21
nP
i=1
x2i

for 1 > 0
with maximum likelihood estimate ^1 =

1
n
nP
i=1
x2i
1=2
.
Similarly the likelihood function for an observed random sample y1; y2; : : : ; ym from a
Weibull(2; 2) distribution is
L2(2) =
2m
2 exp

1
22
mP
i=1
y2i

for 2 > 0
with maximum likelihood estimate ^2 =

1
m
mP
i=1
y2i
1=2
.
10.7. CHAPTER 8 467
Since the samples are independent the likelihood function for (1; 2) is
L (1; 2) = L1(1)L2(2) for 1 > 0; 2 > 0
and the log likelihood function
l (1; 2) = 2n log 1 1
21
nP
i=1
x2i 2m log 2
1
22
mP
i=1
y2i for 1 > 0; 2 > 0
The independence of the samples implies the maximum likelihood estimators are
~1 =

1
n
nP
i=1
X2i
1=2
~2 =

1
m
mP
i=1
Y 2i
1=2
Therefore
l(~1; ~2;X;Y) = n log

1
n
nP
i=1
X2i

m log

1
m
mP
i=1
Y 2i

(n+m)
If 1 = 2 = then the log likelihood function is
l () = 2 (n+m) log 1
2

nP
i=1
x2i +
mP
i=1
y2i

for > 0
which is only a function of . To determine max
(1;2)2
0
l(1; 2;X;Y) we note that
d
d
l () =
2 (n+m)

+
2
3

nP
i=1
x2i +
mP
i=1
y2i

and dd l () = 0 for
=

1
n+m

nP
i=1
x2i +
mP
i=1
y2i
1=2
and therefore
max
(1;2)2
0
l(1; 2;X;Y) = (n+m) log

1
n+m

nP
i=1
X2i +
mP
i=1
Y 2i

(n+m)
The likelihood ratio test statistic is
(X;Y;
0)
= 2

l(~1; ~2;X;Y) max
(1;2)2
0
l(1; 2;X;Y)

= n log

1
n
nP
i=1
X2i

m log

1
m
mP
i=1
Y 2i

(n+m)
+ (n+m) log

1
n+m

nP
i=1
X2i +
mP
i=1
Y 2i

+ (n+m)
= (n+m) log

1
n+m

nP
i=1
X2i +
mP
i=1
Y 2i

n log

1
n
nP
i=1
X2i

m log

1
m
mP
i=1
Y 2i

468 10. SOLUTIONS TO SELECTED END OF CHAPTER PROBLEMS
with corresponding observed value
(x;y;
0) = (n+m) log

1
n+m

nP
i=1
x2i +
mP
i=1
y2i

n log

1
n
nP
i=1
x2i

m log

1
m
mP
i=1
y2i

Since k q = 2 1 = 1
p-value P [W (x;y;
0)] where W 2(1)
= 2
h
1 P

Z
p
(x;y;
0)
i
where Z N (0; 1)
11. Summary of Named
Distributions
469
Summary of Discrete Distributions
Notation and
Parameters
Probability
Function
fx
Mean
EX
Variance
VarX
Moment
Generating
Function
Mt
Discrete Uniforma,b
b ≥ a
a,b integers
1
b−a1
x  a,a  1,… ,b
ab
2
b−a12−1
12
1
b−a1 ∑
xa
b
etx
t ∈ 
HypergeometricN, r,n
N  1,2,…
n  0,1,… ,N
r  0,1,… ,N
r
x  N−rn−x 
N
n 
x  max 0,n − N  r,
… ,minr,n
nr
N
nr
N 1 − rN  N−nN−1 Not tractable
Binomialn,p
0 ≤ p ≤ 1, q  1 − p
n  1,2,…
n
x pxqn−x
x  0,1,… ,n
np npq pet  qn
t ∈ 
Bernoullip
0 ≤ p ≤ 1, q  1 − p
pxq1−x
x  0,1
p pq pe
t  q
t ∈ 
Negative Binomialk,p
0  p ≤ 1, q  1 − p
k  1,2,…
xk−1
x pkqx
 −kx pk−qx
x  0,1,…
kq
p
kq
p2
p
1−qet
k
t  − lnq
Geometricp
0  p ≤ 1, q  1 − p
pqx
x  0,1,…
q
p
q
p2
p
1−qet
t  − lnq
Poisson
 ≥ 0
xe−
x!
x  0,1,…
  ee
t−1
t ∈ 
Multinomialn;p1,p2,… ,pk
0 ≤ pi ≤ 1
i  1,2,… ,k
and∑
i1
k
pi  1
fx1,x2,… ,xk 
n!
x1!x2!xk! p1
x1p2
x2pkxk
xi  0,1,… ,n
i  1,2,… ,k
and∑
i1
k
xi  n
EXi  npi
i  1,2,… ,k
VarXi
 npi1 − pi
i  1,2,… ,k
Mt1, t2,… , tk−1 
p1et1  p2et2  
pk−1etk−1  pkn
ti ∈ 
i  1,2,… ,k − 1
Summary of Continuous Distributions
Notation and
Parameters
Probability
Density
Function
fx
Mean
EX
Variance
VarX
Moment
Generating
Function
Mt
Uniforma,b
b  a
1
b−a
a  x  b
ab
2
b−a2
12
ebt−eat
b−at t ≠ 0
1 t  0
Betaa,b
a  0, b  0
Γab
ΓaΓb x
a−11 − xb−1
0  x  1
Γ  
0

x−1e−xdx
a
ab
ab
ab1ab2
1 ∑
k1
 
i0
k−1
ai
abi
tk
k!
t ∈ 
N,2
 ∈ , 2  0
e−x−2/22
2 
x ∈ 
 2 et
2t2/2
t ∈ 
Lognormal,2
 ∈ , 2  0
e−logx−2/22
2 x
x  0
e2/2 e
22 
−e22
DNE
Exponential
  0
1
 e
−x/
x ≥ 0
 2
1
1−t
t  1
Two Parameter
Exponential,
 ∈ ,   0
1
 e
−x−/
x ≥ 
   2
et
1−t
t  1
Double
Exponential,
 ∈ ,   0
1
2 e
−|x−|/
x ∈ 
 22
et
1−2t2
|t|  1
Extreme Value,
 ∈ ,   0
1
 e
x−/−ex−/
x ∈ 
 − 
  0.5772
Euler’s constant
22
6
etΓ1  t
t  −1/
Gamma,
  0,   0
x−1e−x/
Γ
x  0
 2 1 − t
−
t  1
Inverse Gamma,
  0,   0
x−−1e−1/x
Γ
x  0
1
−1
  1
1
2−12−2
  2
DNE
Summary of Continuous Distributions Continued
Notation and
Parameters
Probability
Density
Function
fx
Mean
EX
Variance
VarX
Moment
Generating
Function
Mt
2k
k  1,2,…
xk/2−1e−x/2
2k/2Γk/2
x  0
k 2k 1 − 2t
−k/2
t  12
Weibull,
  0,   0

 x
−1e−x/
x  0
Γ1  1  
2Γ1  2 
−Γ21  1 
Not tractable
Pareto,
  0,   0

x1
x  

−1
  1
2
−12−2
  2
DNE
Logistic,
 ∈ ,   0
e−x−/
 1e−x−/ 2
x ∈ 
 223 etΓ1 − tΓ1  t
Cauchy,
 ∈ ,   0
1
 1x−/2
x ∈ 
DNE DNE DNE
tk
k  1,2,…
Γ k12 1 x
2
k
−k1/2
k Γ k2
x ∈ 
0
k  2,3,…
k
k−2
k  3,4,…
DNE
Fk1,k2
k1  1,2,…
k2  1,2,…
k1
k2
k1
2 Γ k1k22
Γ k12 Γ
k2
2

x
k1
2 −1 1  k1k2 x
− k1k22
x  0
k2
k2−2
k2  2
2k22k1k2−2
k1k2−22k2−4
k2  4
DNE
X  X1X2  BVN,
  12
1 ∈ , 2 ∈ 
  1
2 12
12 22
1  0, 2  0
−1    1
fx1,x2 
1
2||1/2 e
− 12 x−T−1x−
x1 ∈ , x2 ∈ 
 
Mt1, t2
 eTt 12 tTt
t1 ∈ , t2 ∈ 
12. Distribution Tables
471
This table gives values of F(x) = P(X ≤ x) for X ~ N(0,1) and x ≥ 0

x  0.00  0.01  0.02  0.03  0.04  0.05  0.06  0.07  0.08  0.09 
0.0  0.50000  0.50399  0.50798  0.51197  0.51595  0.51994  0.52392  0.52790  0.53188  0.53586 
0.1  0.53983  0.54380  0.54776  0.55172  0.55567  0.55962  0.56356  0.56749  0.57142  0.57535 
0.2  0.57926  0.58317  0.58706  0.59095  0.59483  0.59871  0.60257  0.60642  0.61026  0.61409 
0.3  0.61791  0.62172  0.62552  0.62930  0.63307  0.63683  0.64058  0.64431  0.64803  0.65173 
0.4  0.65542  0.65910  0.66276  0.66640  0.67003  0.67364  0.67724  0.68082  0.68439  0.68793 
0.5  0.69146  0.69497  0.69847  0.70194  0.70540  0.70884  0.71226  0.71566  0.71904  0.72240 
0.6  0.72575  0.72907  0.73237  0.73565  0.73891  0.74215  0.74537  0.74857  0.75175  0.75490 
0.7  0.75804  0.76115  0.76424  0.76730  0.77035  0.77337  0.77637  0.77935  0.78230  0.78524 
0.8  0.78814  0.79103  0.79389  0.79673  0.79955  0.80234  0.80511  0.80785  0.81057  0.81327 
0.9  0.81594  0.81859  0.82121  0.82381  0.82639  0.82894  0.83147  0.83398  0.83646  0.83891 
1.0  0.84134  0.84375  0.84614  0.84849  0.85083  0.85314  0.85543  0.85769  0.85993  0.86214 
1.1  0.86433  0.86650  0.86864  0.87076  0.87286  0.87493  0.87698  0.87900  0.88100  0.88298 
1.2  0.88493  0.88686  0.88877  0.89065  0.89251  0.89435  0.89617  0.89796  0.89973  0.90147 
1.3  0.90320  0.90490  0.90658  0.90824  0.90988  0.91149  0.91309  0.91466  0.91621  0.91774 
1.4  0.91924  0.92073  0.92220  0.92364  0.92507  0.92647  0.92785  0.92922  0.93056  0.93189 
1.5  0.93319  0.93448  0.93574  0.93699  0.93822  0.93943  0.94062  0.94179  0.94295  0.94408 
1.6  0.94520  0.94630  0.94738  0.94845  0.94950  0.95053  0.95154  0.95254  0.95352  0.95449 
1.7  0.95543  0.95637  0.95728  0.95818  0.95907  0.95994  0.96080  0.96164  0.96246  0.96327 
1.8  0.96407  0.96485  0.96562  0.96638  0.96712  0.96784  0.96856  0.96926  0.96995  0.97062 
1.9  0.97128  0.97193  0.97257  0.97320  0.97381  0.97441  0.97500  0.97558  0.97615  0.97670 
2.0  0.97725  0.97778  0.97831  0.97882  0.97932  0.97982  0.98030  0.98077  0.98124  0.98169 
2.1  0.98214  0.98257  0.98300  0.98341  0.98382  0.98422  0.98461  0.98500  0.98537  0.98574 
2.2  0.98610  0.98645  0.98679  0.98713  0.98745  0.98778  0.98809  0.98840  0.98870  0.98899 
2.3  0.98928  0.98956  0.98983  0.99010  0.99036  0.99061  0.99086  0.99111  0.99134  0.99158 
2.4  0.99180  0.99202  0.99224  0.99245  0.99266  0.99286  0.99305  0.99324  0.99343  0.99361 
2.5  0.99379  0.99396  0.99413  0.99430  0.99446  0.99461  0.99477  0.99492  0.99506  0.99520 
2.6  0.99534  0.99547  0.99560  0.99573  0.99585  0.99598  0.99609  0.99621  0.99632  0.99643 
2.7  0.99653  0.99664  0.99674  0.99683  0.99693  0.99702  0.99711  0.99720  0.99728  0.99736 
2.8  0.99744  0.99752  0.99760  0.99767  0.99774  0.99781  0.99788  0.99795  0.99801  0.99807 
2.9  0.99813  0.99819  0.99825  0.99831  0.99836  0.99841  0.99846  0.99851  0.99856  0.99861 
3.0  0.99865  0.99869  0.99874  0.99878  0.99882  0.99886  0.99889  0.99893  0.99896  0.99900 
3.1  0.99903  0.99906  0.99910  0.99913  0.99916  0.99918  0.99921  0.99924  0.99926  0.99929 
3.2  0.99931  0.99934  0.99936  0.99938  0.99940  0.99942  0.99944  0.99946  0.99948  0.99950 
3.3  0.99952  0.99953  0.99955  0.99957  0.99958  0.99960  0.99961  0.99962  0.99964  0.99965 
3.4  0.99966  0.99968  0.99969  0.99970  0.99971  0.99972  0.99973  0.99974  0.99975  0.99976 
3.5  0.99977  0.99978  0.99978  0.99979  0.99980  0.99981  0.99981  0.99982  0.99983  0.99983 
 

N(0,1) Quantiles: This table gives values of F-1(p) for p ≥ 0.5
p  0.00  0.01  0.02  0.03  0.04  0.05  0.06  0.07  0.075  0.08  0.09  0.095 
0.5  0.0000  0.0251  0.0502  0.0753  0.1004  0.1257  0.1510  0.1764  0.1891  0.2019  0.2275  0.2404 
0.6  0.2533  0.2793  0.3055  0.3319  0.3585  0.3853  0.4125  0.4399  0.4538  0.4677  0.4959  0.5101 
0.7  0.5244  0.5534  0.5828  0.6128  0.6433  0.6745  0.7063  0.7388  0.7554  0.7722  0.8064  0.8239 
0.8  0.8416  0.8779  0.9154  0.9542  0.9945  1.0364  1.0803  1.1264  1.1503  1.1750  1.2265  1.2536 
0.9  1.2816  1.3408  1.4051  1.4758  1.5548  1.6449  1.7507  1.8808  1.9600  2.0537  2.3263  2.5758 

N(0,1) Cumulative
Distribution Function
Chi‐Squared Quantiles
This table gives values of x for p = P(X ≤ x) =  F(x)
df\p 0.005 0.01 0.025 0.05 0.1 0.9 0.95 0.975 0.99 0.995
1 0.000 0.000 0.001 0.004 0.016 2.706 3.842 5.024 6.635 7.879
2 0.010 0.020 0.051 0.103 0.211 4.605 5.992 7.378 9.210 10.597
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860
5 0.412 0.554 0.831 1.146 1.610 9.236 11.070 12.833 15.086 16.750
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278
8 1.344 1.647 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188
11 2.603 3.054 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819
14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267
17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718
18 6.265 7.015 8.231 9.391 10.865 25.989 28.869 31.526 34.805 37.156
19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582
20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401
22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181
24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559
25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928
26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290
27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645
28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993
29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336
30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672
40 20.707 22.164 24.433 26.509 29.051 51.805 55.758 59.342 63.691 66.766
50 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490
60 35.534 37.485 40.482 43.188 46.459 74.397 79.082 83.298 88.379 91.952
70 43.275 45.442 48.758 51.739 55.329 85.527 90.531 95.023 100.430 104.210
80 51.172 53.540 57.153 60.391 64.278 96.578 101.880 106.630 112.330 116.320
90 59.196 61.754 65.647 69.126 73.291 107.570 113.150 118.140 124.120 128.300
100 67.328 70.065 74.222 77.929 82.358 118.500 124.340 129.560 135.810 140.170
 
Student t Quantiles
This table gives values of x  for p  = P(X  ≤ x ) =  F (x ), for p ≥ 0.6
df \ p 0.6 0.7 0.8 0.9 0.95 0.975 0.99 0.995 0.999 0.9995
1 0.3249 0.7265 1.3764 3.0777 6.3138 12.7062 31.8205 63.6567 318.3088 636.6192
2 0.2887 0.6172 1.0607 1.8856 2.9200 4.3027 6.9646 9.9248 22.3271 31.5991
3 0.2767 0.5844 0.9785 1.6377 2.3534 3.1824 4.5407 5.8409 10.2145 12.9240
4 0.2707 0.5686 0.9410 1.5332 2.1318 2.7764 3.7469 4.6041 7.1732 8.6103
5 0.2672 0.5594 0.9195 1.4759 2.0150 2.5706 3.3649 4.0321 5.8934 6.8688
6 0.2648 0.5534 0.9057 1.4398 1.9432 2.4469 3.1427 3.7074 5.2076 5.9588
7 0.2632 0.5491 0.8960 1.4149 1.8946 2.3646 2.9980 3.4995 4.7853 5.4079
8 0.2619 0.5459 0.8889 1.3968 1.8595 2.3060 2.8965 3.3554 4.5008 5.0413
9 0.2610 0.5435 0.8834 1.3830 1.8331 2.2622 2.8214 3.2498 4.2968 4.7809
10 0.2602 0.5415 0.8791 1.3722 1.8125 2.2281 2.7638 3.1693 4.1437 4.5869
11 0.2596 0.5399 0.8755 1.3634 1.7959 2.2010 2.7181 3.1058 4.0247 4.4370
12 0.2590 0.5386 0.8726 1.3562 1.7823 2.1788 2.6810 3.0545 3.9296 4.3178
13 0.2586 0.5375 0.8702 1.3502 1.7709 2.1604 2.6503 3.0123 3.8520 4.2208
14 0.2582 0.5366 0.8681 1.3450 1.7613 2.1448 2.6245 2.9768 3.7874 4.1405
15 0.2579 0.5357 0.8662 1.3406 1.7531 2.1314 2.6025 2.9467 3.7328 4.0728
16 0.2576 0.5350 0.8647 1.3368 1.7459 2.1199 2.5835 2.9208 3.6862 4.0150
17 0.2573 0.5344 0.8633 1.3334 1.7396 2.1098 2.5669 2.8982 3.6458 3.9651
18 0.2571 0.5338 0.8620 1.3304 1.7341 2.1009 2.5524 2.8784 3.6105 3.9216
19 0.2569 0.5333 0.8610 1.3277 1.7291 2.0930 2.5395 2.8609 3.5794 3.8834
20 0.2567 0.5329 0.8600 1.3253 1.7247 2.0860 2.5280 2.8453 3.5518 3.8495
21 0.2566 0.5325 0.8591 1.3232 1.7207 2.0796 2.5176 2.8314 3.5272 3.8193
22 0.2564 0.5321 0.8583 1.3212 1.7171 2.0739 2.5083 2.8188 3.5050 3.7921
23 0.2563 0.5317 0.8575 1.3195 1.7139 2.0687 2.4999 2.8073 3.4850 3.7676
24 0.2562 0.5314 0.8569 1.3178 1.7109 2.0639 2.4922 2.7969 3.4668 3.7454
25 0.2561 0.5312 0.8562 1.3163 1.7081 2.0595 2.4851 2.7874 3.4502 3.7251
26 0.2560 0.5309 0.8557 1.3150 1.7056 2.0555 2.4786 2.7787 3.4350 3.7066
27 0.2559 0.5306 0.8551 1.3137 1.7033 2.0518 2.4727 2.7707 3.4210 3.6896
28 0.2558 0.5304 0.8546 1.3125 1.7011 2.0484 2.4671 2.7633 3.4082 3.6739
29 0.2557 0.5302 0.8542 1.3114 1.6991 2.0452 2.4620 2.7564 3.3962 3.6594
30 0.2556 0.5300 0.8538 1.3104 1.6973 2.0423 2.4573 2.7500 3.3852 3.6460
40 0.2550 0.5286 0.8507 1.3031 1.6839 2.0211 2.4233 2.7045 3.3069 3.5510
50 0.2547 0.5278 0.8489 1.2987 1.6759 2.0086 2.4033 2.6778 3.2614 3.4960
60 0.2545 0.5272 0.8477 1.2958 1.6706 2.0003 2.3901 2.6603 3.2317 3.4602
70 0.2543 0.5268 0.8468 1.2938 1.6669 1.9944 2.3808 2.6479 3.2108 3.4350
80 0.2542 0.5265 0.8461 1.2922 1.6641 1.9901 2.3739 2.6387 3.1953 3.4163
90 0.2541 0.5263 0.8456 1.2910 1.6620 1.9867 2.3685 2.6316 3.1833 3.4019
100 0.2540 0.5261 0.8452 1.2901 1.6602 1.9840 2.3642 2.6259 3.1737 3.3905
>100 0.2535 0.5247 0.8423 1.2832 1.6479 1.9647 2.3338 2.5857 3.1066 3.3101

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468