程序代写案例-ST-342

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Mathematics of Random Events
Nikolaos Constantinou Leonardo T. Rolla
8th December 2020
University of Warwick, Autumn 2020, Module ST-342
The following may differ from the chapters posted weekly on Moodle:
• Slight changes in the proof of Theorem 3.12.
• Added more explanations in the proof of Theorem 2.43, and fixed a typo.
• Amended the proof of Theorem 5.26 and added examples before it.
• Definition 2.25, added definition of finite and changed the wording.
• Expanded Definition 4.1.
• Expanded Remark 6.21.
• Fixed mistakes in §5.4.1.
• Added Example 3.6.
• Added §5.2.5.
• Expanded Remark 5.38.
• Typo in Theorem 5.22.
• Expanded solution to Exercise 5.1.
• Fixed Exercise 4.15.
• Mention that sets form a partition in Example 4.14.
• Expanded the proof of Lemma 4.8.
• Added definition and properties of symmetric difference to §3.2.
• In the proof of Proposition 2.34, fixed equation that was giving +∞−∞.
• Definition 2.46, fixed a typo.
• Example 1.8, typo fixed.
1
©2019-2020 Nikolaos Constantinou & Leonardo T. Rolla
This work is licensed under
Reusing this material
This license allows users to distribute, remix, adapt, and build upon the
material in any medium or format, so long as attribution is given to the
creator. The license allows for commercial use. If you remix, adapt, or
build upon the material, you must license the modified version under
identical terms.
Attribution
Most of this material is based on the lecture notes typeset by Nikolaos
Constantinou when the ST342 module was taught by Dr. Wei Wu in
the Autumn 2019 term. The notes are available at
https://github.com/leorolla/teaching/raw/master/Probability/ST342-2019.pdf
2
Contents
Notation, terminology and preliminaries 4
1 Infinity 6
1.1 Events at infinity . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 The measure problem . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Infinite numbers and infinite sums . . . . . . . . . . . . . . . . . 9
2 Measure Spaces 13
2.1 Classes of sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Uniqueness and Dynkin’s pi-λ Theorem . . . . . . . . . . . . . . . 19
3 Existence and approximation 22
3.1 Carathéodory Extension Theorem . . . . . . . . . . . . . . . . . 22
3.2 Approximation by events in a generating algebra . . . . . . . . . 25
3.3 Lebesgue measure on Rd . . . . . . . . . . . . . . . . . . . . . . . 28
4 Measurable functions and random variables 30
4.1 Definition and properties . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Extended Borel functions and random variables . . . . . . . . . . 32
4.3 Operations on extended Borel functions . . . . . . . . . . . . . . 35
5 Integration and expectation 38
5.1 Axiomatic definition . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.2 Main properties of the integral . . . . . . . . . . . . . . . . . . . 41
5.3 Convergence of integral and applications . . . . . . . . . . . . . . 49
5.4 Construction of the integral . . . . . . . . . . . . . . . . . . . . . 52
6 Density and Radon-Nikodým Theorem 59
6.1 Density of measures and continuous variables . . . . . . . . . . . 59
6.2 The Radon-Nikodým Theorem . . . . . . . . . . . . . . . . . . . 63
7 Product measures and independence 67
7.1 Product σ-algebras and their properties . . . . . . . . . . . . . . 67
7.2 Product measure . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.3 Fubini and Tonelli Theorems . . . . . . . . . . . . . . . . . . . . 71
7.4 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8 Convergence of measurable functions 78
8.1 Modes of convergence . . . . . . . . . . . . . . . . . . . . . . . . 78
8.2 Almost sure convergence and Borel-Cantelli . . . . . . . . . . . . 84
8.3 Laws of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . 88
8.4 Relation between modes of convergence . . . . . . . . . . . . . . 90
References 95
3
Notation, terminology and preliminaries
Notation
N {1, 2, 3, 4, . . . }
N0 {0, 1, 2, 3, . . . }
Ac Ω \A (note that Ω is implicit)
An ↑ A An ⊆ An+1 for all n and ∪nAn = A
An ↓ A An ⊇ An+1 for all n and ∩nAn = A
xn ↑ x xn is non-decreasing and xn → x
xn ↓ x xn is non-increasing and xn → x
{f < g} {ω ∈ Ω : f(ω) < g(ω)}
supn fn the function h defined by h(ω) = supn fn(ω) for each ω
max{f1, f2} supn fn over n ∈ {1, 2}
[x]+ max{0, x}, the positive part of x
[x]− max{0,−x}, the negative part of x, so x = [x]+ − [x]−
bxc the largest n ∈ Z such that n 6 x ∈ R, or x if x = ±∞
µ-a.e. except on a set A with µ(A) = 0
a.e. µ-a.e. when µ is obvious from the context
µ-a.e. ω every ω ∈ Ω \A, for some set A ∈ F with µ(A) = 0
a.s. with probability 1
a.s. on ω for P-a.e. ω
Terminology
σ-additivity Countable additivity
Countable sets
See [Coh13, §A.6].
Extended real numbers
See §1.3.1 and [Coh13, §B.4].
Supremum and infimum
See [Coh13, §B.5].
liminf, limsup, limit
See [Coh13, §B.6].
One criterion of convergence is the following.
4
Lemma 0.1. Let (an)n be a sequence of real numbers, and a ∈ R. Then
limn an = a if and only if, for every subsequence (ank)k, there is a further
subsequence (ankj )j such that limj ankj = a.
Sketch of proof. Suppose an → a and let ank be a subsequence. For each ε > 0,
{n : |an − a| > ε} is finite, so {k : |ank − a| > ε} is also finite, hence ank → a.
Now suppose an 6→ a, so there is ε > 0 such that {n : |an − a| > ε} is infinite.
Taking (nk)k from this set, no sub-subsequence (ankj ) may converge to a.
Metric spaces
Given a metric space (Ω, d) and a subset B ⊆ Ω, the interior of B is the set of
points x ∈ Ω for which there is a ball centred at x and contained in B. A set
B ⊆ R is called open if the interior of B equals B. A set B ⊆ R is called closed
if Bc is open. The sets Ω and ∅ are both opened and closed.
When Ω = R, the interior of B ⊆ R is the set {x ∈ B : ∃ε > 0, (x−ε, x+ε) ⊆ B}.
Also, open intervals are open and closed intervals are closed. The only sets that
are open and closed at the same time are R and ∅. Every open set is given by
the union of countably many disjoint open intervals.
Given two metric spaces (Ω, d) and (Ω′, d′) and a function f : Ω → Ω′, we say
that f is continuous if the preimage of every open sets is open.
Riemann integral - Darboux’s definition
See [Coh13, §p.xv].
5
1 Infinity
Can we toss a coin infinitely many times?
Why is countable additivity important?
We know that E[X + Y ] = EX + EY, and X > Y implies EX > EY , but
why? It doesn’t seem to follow from the usual definition that treats discrete
and continuous random variables as hermetically separated entities.
So, what is expectation, really?
Do continuous random variables even exist?
How small can a class A of events be so that their probabilities determine P?
Why can we differentiate moment generating functions to compute moments?
Our aim is to study concepts of Measure Theory useful to Probability Theory,
providing a solid ground for the latter. A measure generalises the notion of
area for arbitrary sets in Euclidean spaces Rd, d > 1. We introduce the
theory of measurable spaces, measurable functions, integral with respect to a
measure, density of measures, product measures, and convergence of functions
and random variables.
Below we give examples that motivate the need for such a theory, discuss in
which sense modern Measure Theory is the best we can hope for, and introduce
the concept of infinite numbers and infinite sums used throughout the remaining
chapters.
1.1 Events at infinity
We know that An ↑ A implies P(An) ↑ P(A) which, assuming that P is non-
negative and finitely additive, is equivalent to P being σ-additive (a shorthand
for countably additive). Below we see examples where interesting models and
events require some sort of limiting process in their study.
Example 1.1. Coin flips {0, 1}.
(i) Coin flip N times, so the sample space is Ω = {(x1, . . . , xN ) : xi ∈ {0, 1}}.
For N = 7, define the event A = {(x1, . . . , xN ) : x1 = 1, x5 = 0}.
Rigorously, the probability of this event is P(A) = 1/4.
(ii) Coin flip infinitely many times (if we can!), so the sample space in this case
is Ω = {(x1, x2, . . . ) : xi ∈ {0, 1}}. Let B be the event of 106 consecutive
zeros, i.e. B =
⋃∞
n=1{ω ∈ Ω : xn = xn+1 = · · · = xn+106−1 = 0}. The
probability P(B) is computed by first an estimate, and then taking limit.
4
Example 1.2 (Ruin probability). Gambling with initial wealth X0 ∈ N0. For
any t > 1, we bet an integer amount and reach a wealth denoted by Xt. If
at any point in time, wealth amounts to 0, it remains 0 forever. The sample
6
space that indicates the wealth process is Ω = {(x0, x1, . . . ) : xi ∈ N0}. We
define the function of wealth after the n-th gamble by, Xn : Ω→ N∪{0}, where
Xn(ω) = xn. In fact, (Xn)n>0 is a Markov process. Then
{stay in state 0 eventually} =
∞⋃
n=0
∞⋂
m=n
{ω ∈ Ω : Xm(ω) = 0}. (1.3)
By monotonicity and continuity, its probability is
P({stay in state 0 eventually}) = lim
n→∞P({ω ∈ Ω : Xn(ω) = 0}).
4
Example 1.4 (Brownian motion). It is possible to construct a sequence of
continuous piece-wise linear functions, which in the limit give a continuous
nowhere differentiable random path that is at the core of Stochastic
Analysis. 4
Example 1.5 (Uniform variable). If X ∼ U [0, 1], then P(X 6= x) = 1 for every
x ∈ R. Nevertheless, P(X 6= x for every x ∈ R) = 0, so there will be some
unlucky x that will happen to be hit by X. Now for a countable set A, by
σ-additivity we have P(X ∈ A) = ∑x∈A P(X = x) = 0. Since Q is countable,
we see that a uniform random variable is always irrational. Well, this is unless
there is an uncountable number of such variables in the same probability space,
in which case some unlucky variables may happen to take rational values. 4
Example 1.6 (Strong law of large numbers). Let X1, X2, X3, . . . be independent
and take value ±1 with probability 12 each, and take Sn = X1 + · · ·+Xn. The
strong law of large numbers says that, almost surely (a shorthand for “P(· · · ) =
1”), Snn → 0 as n→∞. In order to show this, we need to express this event as
a limit and compute its probability. The first part is simply:
{ lim
n→∞
Sn
n = 0} =
∞⋂
k=1
∞⋃
n0=1
∞⋂
n=n0
{|Snn | < 1k}.
4
Example 1.7 (Recurrence of a random walk). For the sequence (Sn)n of the
previous example, we know that Snn → 0 with probability one. We want to
consider whether Snn converges to zero from above, from below, or oscillating,
which is the same as asking whether Sn = 0 infinitely often. 4
1.2 The measure problem
It is clear (it should be!) from previous examples that we want to work with
measures that have nice continuity properties, so we can take limits. However,
when the mass is spread over uncountably many sample points ω ∈ Ω, it is not
possible to assign a measure to all subsets of Ω in a reasonable way.
7
We would like to define a random variable uniformly distributed on [0, 1], by
means of a function that assigns a weight to subsets of this interval. For instance,
what is the probability that this number is irrational? What is the probability
that its decimal expansion does not contain a 3?
This is the same problem as assigning a ‘length’ to subsets of R. We are also
interested in defining a measure of ‘area’ on R2, ‘volume’ on R3, and so on.
Of course, a good measure of length/area/volume/etc. on Rd should:
(i) give the correct value on obvious sets, such as intervals and balls;
(ii) give the same value if we rotate or translate a set;
(iii) be σ-additive.
We stress again that σ-additivity is equivalent to a measure being continuous,
and we are not willing to resign that. On the other hand, we do not want more
than that: each of the uncountably many points in [0, 1] alone has length zero,
but all together they have length one; likewise, each sequence of coin tosses in
{0, 1}N has probability zero, but all together they have probability one.
The measure problem is the following.
There is no measure m defined on all subsets of Rd which satisfy all the
reasonable properties listed above. What modern Measure Theory does is to
work with measures that are defined on a class of sets which is large enough
to be useful and small enough for these properties to hold.
The next example shows that there is no measure m defined on all subsets of
R3 which satisfies these three properties.
Example 1.8 (Banach-Tarski paradox). Consider the ball
B = {x ∈ R3 : ‖x‖ 6 1}.
There exist1 k ∈ N, disjoint sets A1, . . . , A2k, and isometries (maps that
preserve distances and angles) S1, . . . , S2k : R3 → R3 such that
B = (A1 ∪ · · · ∪Ak) ∪ (Ak+1 ∪ · · · ∪A2k),
B = S1A1 ∪ · · · ∪ SkAk, B = Sk+1Ak+1 ∪ · · · ∪ S2kA2k.
So B was decomposed into finitely many pieces, which were later on moved
around rigidly and recombined to produce two copies of B! Why is it a paradox?
Finitely many pieces is not the issue in itself, since N can be decomposed into
even and odd numbers, and they can be compressed (or stretched, in some sense)
to produce two copies of N. Rigidity alone is not the issue either, since we can
move each of the uncountably many points of the segment [0, 1] to form the
1See https://youtu.be/s86-Z-CbaHA for a nice overview of the proof.
8
segment [0, 2]. The paradox is that this magic was done with rigid movements
on finitely many pieces. And here we can see the measure problem: if all these
disjoint sets A1, . . . , A2k were to have a volume V1, . . . , V2k > 0, what would be
the volume of the ball B? 4
The next example is not nearly as effective in impressing friends at a party, and
would certainly not make a youtube video with 31 million views, but it has two
advantages. First, it shows directly that the measure problem already occurs
on d = 1. Second, we can actually explain its proof in a third of a page rather
than a dozen.
Example 1.9 (Vitali set). Consider the unit circle S1 with points indexed by
turns instead of degrees or radians. This is the same as the interval S1 = [0, 1)
with the angle addition operation x⊕ y = x+ y mod 1. There exists a set E ⊆
[0, 1) such that S1 is decomposed into disjoint {En}n∈N which are translations of
E. And here we see again the measure problem: by σ-additivity, if the length of
E is zero, then the length of the circle is zero; and if the length of E is non-zero,
then the length of the circle is infinite. So E is not measurable. 4
Sketch of proof. Write Q∩[0, 1) = {rn}n=1,2,3,.... For E ⊆ S1, let En = {x⊕rn :
x ∈ E} be the translation of E by rn. We want to find a set E such that
(i) The sets E1, E2, E3, . . . are disjoint,
(ii) The union satisfies ∪nEn = S1.
Start with a small set that satisfies the first property, such as E = {0}. Enlarge
the set E by adding a point x ∈ S1 \ (∪nEn). Adding such point does not break
the first property (proof omitted), and may help the second one. Keep adding
points this way, until it is no longer possible. When it is no longer possible, it
can only be so because the second property is also satisfied.2
Remark 1.10. Some people really like to emphasise that the Banach-Tarski
paradox and the Vitali set depend crucially on the Axiom of Choice (for the
above sketch of proof, it is concealed in the expression “keep adding until”).
We may wonder what happens if we do not accept this axiom. In this case, we
cannot prove the Banach-Tarski paradox, nor the existence of a Vitali set. But
neither can we prove that they do not exist, so the measure problem persists. 4
1.3 Infinite numbers and infinite sums
We now define the set of extended real numbers and briefly discuss some its
useful properties, then discuss the meaning of infinite sums, and move on to
other perhaps philosophical questions about this theory.
2See [Coh13, 1.4.9] for a complete proof.
9
1.3.1 Extended real numbers
We are about to start working with measures, and because measures can be
infinite, and integrals can be negative infinite, we work with the set of extended
real numbers R := [−∞,+∞] that extends R by adding two symbols ±∞. The
novelty is of course to conveniently allow operations and comparisons involving
these symbols.
Basically, we can safely operate as one would reasonably guess:
−∞ < −1 < 0 < 5 < +∞, −7 +∞ = +∞, (−2)× (−∞) = +∞,
| −∞| = +∞, (+∞)× (−∞) = −∞, a 6 b =⇒ a+ x 6 b+ x
lim
n→∞(2 + n
2) = lim
n→∞ 2 + limn→∞n
2 = 2 +∞ = +∞,
etc. Since we will never need to divide by infinity, let us leave x∞ = 0 undefined
(otherwise we would need to check that x is finite).
The non-obvious definition is 0 · ∞ = 0. In Calculus, it would have been
considered an indeterminate form, but in Measure Theory it is convenient to
define it this way because the integral of a function that takes value 0 on an
interval of infinite length and +∞ at a few points should still be 0. That is, the
area of a rectangle having zero width and infinite length is zero.
Now some caveats. First, limn(anbn) = (limn an)(limn bn) may fail in case it
gives 0 · ∞. Also, note that now a + b = a + c does not imply b = c. This can
be false when a = ±∞. Likewise, a < b no longer implies that a + x < b + x.
So we should be careful with cancellations.
The one thing that is definitely not allowed, and that Measure Theory does
not handle well, is
“ +∞−∞ ” !
This is simply forbidden, and if we will ever write this, it will be in quotation
marks and just in order to say that this case is excluded.
The reader should consult [Coh13, §§B.4–B.6] and [Tao11, p. xi] for a more
complete description of operations on R.
1.3.2 Infinite sums
Infinite sums of numbers on [0,+∞] are always well-defined through a rather
simple formula. If Λ is an index set and xα ∈ [0,+∞] for all α ∈ Λ, we define:∑
α∈Λ
xα = sup
A⊆Λ
A finite
∑
α∈A
xα.
10
The set Λ can be uncountable, but the sum can be finite only if Λ+ = {α : xα >
0} is countable (proof omitted). If xα ∈ [−∞,+∞], we define Λ− = {α : xα <
0} and ∑
α∈Λ
xα = sup
A⊆Λ+
A finite
∑
α∈A
xα − sup
A⊆Λ−
A finite
∑
α∈A
−xα, (1.11)
as long as this difference does not give “+∞−∞” !
The theory of conditionally convergent sums as∑
j∈N
xj = lim
n
n∑
j=1
xj (1.12)
is hardly meaningful to us. In case the expression in (1.11) is well-defined, we
can write (Λ− ∪ Λ+) = {αj}j∈N by ordering these indices in any way we want
(assuming for simplicity that these sets are countable), and formula (1.12) will
give the same result as (1.11). Pretty robust.
However, in case (1.12) converges but (1.11) is not well-defined (so it gives
“+∞−∞”), we can re-order the index set N so that (1.12) will give any number
we want. This is definitely not the type of delicacy we want to handle here.
For this reason, we will only use (1.12) when either
xj ∈ [0,+∞]
for all j, or when ∑
j
|xj | <∞.
So there are two overlapping cases where we can work comfortably: non-negative
extended numbers, or series which are absolutely summable.
1.3.3 Two different and overlapping theories
The above tradeoff is already a good prelude to something rather deep that will
appear constantly in upcoming chapters. Borrowing from [Tao11]:
Because of this tradeoff, we will see two overlapping types of measure
and integration theory: the non-negative theory, which involves quantities
taking values in [0,+∞], and the absolutely integrable theory, which involves
quantities taking values in R or C.
However, at the risk of leaving it for the reader to figure out some corner cases,
we can (and will) extend these theories to a theory on [−∞,+∞] by doing what
we just did above. Namely, whereas the absolutely integrable theory requires
that both terms in (1.11) be finite, and the non-negative theory requires that
one of them be zero, we only require that one of them be finite.
11
We end this chapter with the simplest example of these overlapping theories.
Theorem 1.13 (Tonelli Theorem for series). Let xm,n ∈ [0,+∞] be a doubly-
indexed sequence. Then
∞∑
m=1
∞∑
n=1
xm,n =
∑
(m,n)∈N2
xm,n =
∞∑
n=1
∞∑
m=1
xm,n.
Proof. See [Tao11, Theorem 0.0.2] for a direct proof. A proof is also given in
§7.3 as an application of Tonelli Theorem.
Theorem 1.14 (Fubini Theorem for series). Let xm,n ∈ [−∞,+∞] be a doubly-
indexed sequence. If
∞∑
m=1
∞∑
n=1
|xm,n| <∞,
then ∞∑
m=1
∞∑
n=1
xm,n =
∑
(m,n)∈N2
xm,n =
∞∑
n=1
∞∑
m=1
xm,n.
Proof. Given in §7.3 as a application of Fubini Theorem.
Example 1.15 (Failure). Consider the doubly-indexed sequence
xm,n =
1 −1 0 0 0 . . .
0 1 −1 0 0 . . .
0 0 1 −1 0 . . .
0 0 0 1 −1 . . .
...
...
...
... . . . . . .
which is not absolutely summable. Note that summing columns and then rows
we get 1, whereas summing rows and then columns we get 0. 4
12
2 Measure Spaces
The goal in this chapter is to define the notion of a measure space (Ω,F , µ)
and work on examples to get familiar with the relevant concepts. Later on, we
will focus our attention on two types of measures: the Lebesgue measure m and
probability measures P.
2.1 Classes of sets
This may be the least pleasant topic, but we need to go through it.
2.1.1 σ-algebras
We start with the notion of σ-algebra on a sample space Ω.
Definition 2.1 (σ-algebra). Let Ω be a sample space. A collection F of
subsets of Ω is called a σ-algebra (or σ-field) on Ω, if
(i) Ω ∈ F ,
(ii) A ∈ F =⇒ Ac ∈ F ,
(iii) A1, A2, · · · ∈ F =⇒
⋃∞
i=1Ai ∈ F .
The pair (Ω,F) is called a measurable space. The sets A ∈ F are called
F-measurable sets, or measurable sets when F is clear from the context.
Remark 2.2. In this case, it also satisfies
(iv) ∅ ∈ F ,
(v) A1, A2, · · · ∈ F =⇒
⋂∞
i=1Ai ∈ F . 4
Let us review the following examples to build our intuition.
Example 2.3. For a given sample space Ω, let P(Ω) := {A : A ⊆ Ω} be the
power set of Ω. Then P(Ω) is a σ-algebra on Ω. 4
Example 2.4. For a given sample space Ω, let A = {∅,Ω}. Then A defines a
σ-algebra on Ω, and it is called the trivial σ-algebra. 4
Exercise 2.5. Let Ω be a set and A the class of all subsets which are either
countable or whose complement is countable. Prove that A is a σ-algebra. 4
Exercise 2.6. Let A1 and A2 be σ-algebras on a given sample space Ω, and let
A = A1 ∩ A2 be the class made of the sets that belong to both A1 and A2.
Prove that A := A1 ∩ A2 is also a σ-algebra on Ω. 4
Exercise 2.7 (Pull-back). If f : Ω1 → Ω2 and F2 is a σ-algebra on Ω2, prove
that f−1(F2) := {f−1(A) : A ∈ F2} is a σ-algebra on Ω1. 4
13
So when a function goes from a sample space Ω1 to another space Ω2, it pulls
back any σ-algebra F on Ω2 to a σ-algebra G on Ω1, as in the following diagram:
Ω1
f

G
Ω2 F
f−1
OO
(2.8)
2.1.2 The σ-algebra generated by a class
It will often be very useful to consider the σ-algebra “spanned” by a class.
Definition 2.9 (σ-algebra generated by a class of subsets). Let E be a class
of subsets of Ω. We define the σ-algebra generated by E , denoted σ(E), as the
unique class of subsets of Ω with the following properties:
(i) σ(E) is a σ-algebra,
(ii) σ(E) ⊇ E ,
(iii) if F ⊇ E and F is a σ-algebra, then F ⊇ σ(E).
So σ(E) is the smallest σ-algebra that contains E .
We now prove that σ(E) is indeed well-defined.
Proof by exercise. Let {Aα}α∈Λ be a family of σ-algebras on Ω, where Λ is an
arbitrary non-empty index set. Prove that
⋂
α∈ΛAα is also a σ-algebra. Let E
be a class of subsets of Ω. Prove that there exists at least one σ-algebra on Ω
that contains E as a subclass. Prove that σ(E) exists by considering the family
of σ-algebras that contain E . Finally, prove that σ(E) is unique.
Some people like to describe σ(E) as the collection of sets that can be obtained
by countably many operations of taking union, intersection and complement
of sets in E . Whereas this description may indeed give some comfort to those
facing such bitter abstraction for the first time, we should not rely on it because
it is hard to make it precise. It is better to work directly with the definition.
The examples below illustrate this.
Example 2.10. Let Ω be a sample space, A ⊆ Ω and E = {A}. We claim that
the σ-algebra generated by E is A := {∅,Ω, A,Ac}. Indeed, A is a σ-algebra, it
contains E , and any σ-algebra that contains E must contain A. 4
Example 2.11. Let E = {A1, . . . , An}, be a partition of a sample space Ω. Then
the σ-algebra generated by E is A = {Ai1 ∪· · ·∪Aim : i1, . . . , im ∈ {1, . . . , n}}∪
{∅}. Indeed, A is a σ-algebra, it contains E , and any σ-algebra containing E
must contain A. 4
14
Exercise 2.12. Let Ω = {1, 2, 3} and E = {{1}, {2}, {3}}. What is σ(E)? 4
Exercise 2.13. Let E = {{x} : x ∈ Ω} be the class of singletons. What is
σ(E)? 4
Proposition 2.14. Suppose E1 and E2 are classes of subsets of Ω and E1 ⊆
σ(E2). Prove that σ(E1) ⊆ σ(E2).
Exercise 2.15. Prove the above proposition. 4
2.1.3 The Borel sets on R and R
See page 5 for a very brief recall of basic facts from Metric Spaces. When
working on R or R, the most important σ-algebra is the following.
Definition 2.16 (Borel sets on R). For the space Ω = R, the Borel σ-algebra
B(R) is the σ-algebra generated by open sets. It is also the smallest σ-algebra
that contains all the open intervals, or all the closed intervals.
For the extended line, we define B(R) := {A ⊆ R : A ∩ R ∈ B(R)}, that is,
we allow adding the points ±∞ to sets in B(R).
When it is clear that Ω = R or R, we may write B instead of B(R) or B(R).
Below we list more classes of sets, all of which generate B(R). Any “reasonable”
set is a Borel set, and it is actually hard to construct one which is not, such as
the Vitali set in Example 1.9. Nevertheless, we should keep in mind that all the
theory works on σ-algebras, and B is the one we are most interested in.
Denote by E1, . . . , E6 the following classes of subsets of R, respectively:
1. Closed subsets of R.
2. Open subsets of R.
3. Intervals of the form (a, b), a, b ∈ [−∞,+∞].
4. Finite intervals of the form (a, b], a, b ∈ R.
5. Semi-infinite intervals of the form (−∞, b], b ∈ R.
6. Semi-infinite intervals of the form (−∞, b), b ∈ R.
One can show that σ(E1) = · · · = σ(E6) as follows:
σ(E2) ⊆ σ(E1) ⊆ σ(E2) ⊆ σ(E3) ⊆ σ(E4) ⊆ σ(E5) ⊆ σ(E6) ⊆ σ(E2).
We prove two inclusions and leave five as exercise.
Let A ∈ E2. Then A is the countable union of disjoint open intervals. So there
exist {In}n∈N ⊆ E3 ⊆ σ(E3) such that A = ∪nIn. Since σ(E3) is a σ-algebra,
it is closed under countable unions, and A ∈ σ(E3). Thus, E2 ⊆ σ(E3). By
Proposition 2.14, σ(E2) ⊆ σ(E3).
15
Let A ∈ E5. Then A = (−∞, b] for some b ∈ R. Consider the sequence of
sets An = (−∞, b + 1n ) in E6. Since σ(E6) ⊇ E6 is a σ-algebra, it is closed
under countable intersections, and A = ∩nAn ∈ σ(E6). Thus, E5 ⊆ σ(E6). By
Proposition 2.14, σ(E5) ⊆ σ(E6).
Exercise 2.17. Show that each of the classes below generate B(R):
(a) {[−∞, a)}a∈R,
(b) {[−∞, a]}a∈R,
(c) {[a,+∞]}a∈R,
(d) {(a,+∞]}a∈R. 4
2.1.4 Algebras
On a few occasions, we will use a class with less structure than σ-algebras.
Definition 2.18 (Algebra). Let Ω be a sample space. A class A of subsets of
Ω is called an algebra on Ω, if
(i) Ω ∈ A,
(ii) A ∈ A =⇒ Ac ∈ A,
(iii) A,B ∈ A =⇒ A ∪B ∈ A.
Remark 2.19. In this case, it also satisfies:
(iv) ∅ ∈ A,
(v) A,B ∈ A =⇒ A ∩B ∈ A.
In other words, an algebra on Ω is a class of subsets of Ω, which contains Ω and
is stable under finitely many set operations. 4
Example 2.20. Every σ-algebra is an algebra. 4
Remark 2.21. If Ω is a sample space and F1 ⊆ F2 ⊆ F3 ⊆ · · · are σ-algebras,
then A := ∪nFn may not be a σ-algebra, but it is still an algebra. 4
Exercise 2.22. Prove that A in the previous remark is indeed an algebra. 4
Example 2.23. Let Ω = R and
E = {(a, b] : −∞ 6 a < b < +∞} ∪ {(a,+∞) : a ∈ R}.
Consider the class
A = {I1 ∪ · · · ∪ In : Ik ∈ E for k = 1, . . . , n, Ik ∩ Ij = ∅ for j 6= k} ∪ {R, ∅}.
Then A is an algebra. 4
Exercise 2.24. Prove that A in the previous example is an algebra. Suggestion:
show that E ∪{∅} is closed under intersections, use this to show that A is closed
under intersections, also show that A is closed under complement, and conclude
that A is closed under unions. 4
16
2.2 Measures
Definition 2.25 (Measure and σ-finite measure). Let (Ω,F) be a given
measurable space. A function µ : F → [0,+∞] is called a measure if µ(∅) = 0
and
µ
( ∞⋃
n=1
An
)
=
∞∑
n=1
µ(An)
for every sequence {An}∞n=1 ⊆ F of disjoint sets.
In this case, the triple (Ω,F , µ) is called a measure space.
We say that µ is finite if µ(Ω) < ∞ and σ-finite if there exist measurable
sets (An)n∈N such that µ(An) <∞ for all n and ∪n∈NAn = Ω.
Let us start with some simple but important examples.
Example 2.26 (Dirac mass measure). Let Ω be a set, F = P(Ω) and x ∈ Ω. The
measure δx is defined as
δx(A) = 1A(x)
for all A ⊆ Ω. This is a finite measure. It is also a probability measure. 4
Remark 2.27. Even though an indicator function and a Dirac mass function
perform a similar test, they are different objects. The indicator function of a
given A ⊆ Ω is a function 1A : Ω → {0, 1}, whereas the Dirac measure on a
given x ∈ Ω is a map δx : P(Ω)→ {0, 1}. In other words, 1A checks whether a
point lies in A, whereas δx checks whether a set contains x. 4
Example 2.28 (Counting measure). Let Ω be a set, F = P(Ω), and µ = ∑x∈Ω δx.
So µ(A) =
∑
x∈Ω δx(A) counts how many elements A has. If Ω is a finite set,
this is a finite measure, and if Ω is countable then this is a σ-finite measure. 4
Example 2.29. Let Ω = N, F = P(Ω), and µ(A) = ∑k∈A k−1. Then µ is a
σ-finite measure (why?). 4
Exercise 2.30. Prove that the functions described in previous examples are
indeed measures. 4
Exercise 2.31. Let µ1, µ2, . . . be a sequence of measures on a measurable space
(Ω,F), and let x1, x2, · · · ∈ [0,+∞]. Show that
µ =
∑
n
xnµn
is a measure on (Ω,F). 4
Example 2.32 (Coarsening). If (Ω,F , µ) is a measure space, and G ⊆ F is also
a σ-algebra, then (Ω,G, µ) is also a measure space (why?). 4
Example 2.33 (Restriction). If (Ω,F , µ) is a measure space and A ∈ F , then
the restriction of µ to A, denoted µ|A and given by µ|A(B) = µ(A ∩ B) is also
a measure (why?). 4
17
Proposition 2.34 (Properties of a measure). Let (Ω,F , µ) be a measure space.
Then, we have the following properties:
(i) Countable sub-additivity: If {Ai}∞i=1 ⊆ F , then µ(
⋃∞
i=1Ai) 6
∑∞
i=1 µ(Ai).
(ii) Continuity from below: If {Ai}∞i=1 ⊆ F , Ak ⊆ Ak+1,∀k ∈ N, then
µ(
⋃∞
i=1Ai) := µ( limn→∞An) = limn→∞µ(An).
(iii) Continuity from above: If {Ai}∞i=1 ⊆ F , Ak ⊇ Ak+1,∀k ∈ N, and µ(Aj) <
∞ for some j ∈ N, then µ(⋂∞i=1Ai) := µ( limn→∞An) = limn→∞µ(An).
Proof. For sub-additivity, set Bk := Ak \ (
⋃k−1
i=1 Ai),∀k ∈ N, so that
⋃∞
i=1Ai =⋃∞
i=1Bi, Bk are disjoint and Bk ⊆ Ak,∀k ∈ N. It follows that,
µ
( ∞⋃
i=1
Ai
)
= µ
( ∞⋃
i=1
Bi
)
=
∞∑
i=1
µ(Bi) 6
∞∑
i=1
µ(Ai).
For continuity from below, let A0 = ∅ and Bk := Ak \ Ak−1,∀k ∈ N, so that⋃∞
i=1Ai =
⋃∞
i=1Bi and Bk are disjoint. It follows that,
µ
( ∞⋃
i=1
Ai
)
= µ
( ∞⋃
i=1
Bi
)
=
∞∑
i=1
µ(Bi)
=
∞∑
i=1
µ(Ai \Ai−1)
= lim
n→∞
( n∑
i=1
µ(Ai \Ai−1)
)
= lim
n→∞µ(An).
For continuity from above, let Bk := Aj \ Aj+k,∀k ∈ N, use continuity from
below, and subtract both sides from µ(Aj). This concludes the proof.
Remark 2.35. Continuity from above may be false without the assumption that
µ(Aj) is finite for some j. For instance, if µ is the counting measure on N and
An = {n, n+ 1, n+ 2, . . . }. Then An ↓ ∅ =: A. Since µ(An) =∞ and µ(A) = 0,
we have µ(An) 6→ µ(A) even though An ↓ A. 4
We now state the most important measure in the theory.
Theorem 2.36 (Lebesgue Measure). There exists a unique measure m on
(R,B) such that
m([a, b]) = b− a
for all a < b ∈ R. This measure m is called the Lebesgue Measure on R.
Proof. Existence will be proved in §3.1 and uniqueness in §2.3.
18
Example 2.37 (Countable sets). Any countable set has Lebesgue measure 0.
Although Q is present everywhere in the sense that it intersects every tiny open
interval, its length is actually zero. 4
Example 2.38 (Restriction). Given real numbers a < b, the function µ given by
µ(A) = m(A ∩ [a, b])
is a measure on (R,B). 4
Definition 2.39 (Probability). A measure µ on a measurable space (Ω,F)
is called a probability measure if µ(Ω) = 1. The triple (Ω,F , µ) is called a
probability space.
Probability measures are often denoted by P , P or P instead of µ.
Example 2.40 (Equiprobable outcomes). Let n ∈ N and let Ω be a set with n
elements. Take F = P(Ω) and P = 1n
∑
ω∈Ω δω. Then (Ω,F ,P) is the model of
equiprobable outcomes. 4
Example 2.41. Let n ∈ N, Ω = {ω1, . . . , ωn}, and E = {A1, . . . , An}, where
Ai = {ωi}, i = 1, . . . , n. Set P(Ai) = P({ωi}) := pi > 0, i = 1, . . . , n, such that∑n
i=1 pi = 1. We can extend the definition of P to P(Ω) by finite additivity.
Then, we can check that P defines a probability measure on (Ω,P(Ω)). 4
Exercise 2.42. Let P1,P2, . . . be a sequence of probability measures on a
measurable space (Ω,F), and let p1, p2, · · · ∈ [0, 1] be such that
∑
n pn = 1
Show that
P =
∑
n
pnPn
is a probability measure on (Ω,F). Give an interpretation to P and (pn)n. 4
2.3 Uniqueness and Dynkin’s pi-λ Theorem
In this section we will study Dynkin’s pi-λ Theorem and use it to prove the
following.
Theorem 2.43 (Uniqueness of measures). Let µ and ν be measures on a
measurable space (Ω,F), and let C ⊆ F be a class of subsets of Ω closed under
intersection. Suppose that µ(An) = ν(An) < ∞ for some sequence An ↑ Ω of
sets in C. If µ = ν on C, then µ = ν on σ(C).
Of course, the uniqueness part of Theorem 2.36 is a particular case of the above
theorem. For that, we take C = {[a, b] : a, b ∈ R}, since σ(C) = B(R) and note
that [−n, n] ↑ R are in C and have finite measure.
Before introducing the concepts used in its proof, let us discuss where they
arise from. Suppose we want to check that two finite measures µ and ν on the
19
measurable space (Ω,F) coincide on a large class of sets, such as F . Suppose also
that µ(Ω) = ν(Ω). On how many sets A do we need to test that µ(A) = ν(A)?
Testing all A ∈ F is a bit overkilling. Having tested for a disjoint sequence
(An)n, it will hold for Ac1 as well as ∪nAn by elementary properties of finite
measures. However, having tested for given sets A,B ∈ F , there is no way
to infer from the basic properties of measures alone that these two measures
coincide on A ∩B. So the idea is to consider the class
D = {A ∈ F : µ(A) = ν(A)} (2.44)
and show that it is large. This class already has some structure inherited from
the way finite measures treat disjoint unions, but that structure is still missing
a small bit, namely being closed under intersections.
Definition 2.45 (pi-system). A class C of sets is called a pi-system if it is
closed under intersections, that is, A ∩B ∈ C for all A,B ∈ C.
Good examples of pi-systems are the classes listed in §2.1.3, if we add ∅ to them.
Definition 2.46 (λ-system). Let Ω be a sample space. A class D of subsets
of Ω is a λ-system on Ω, if Ω ∈ D and it is closed under complement and
countable disjoint unions, that is,
(i) Ω ∈ D,
(ii) A ∈ D =⇒ Ac ∈ D,
(iii) {Ai}∞i=1 ⊆ D, Ai ∩Aj = ∅ ∀i 6= j =⇒
⋃∞
i=1Ai ∈ D.
These concepts are useful when the (large and complicated) class of sets
satisfying a given property will naturally form a λ-system, and at the same
time we can find a (smaller and simpler) subclass that forms a pi-system and
generates a large σ-algebra. This is the case for Theorem 2.43.
Proposition 2.47. If µ and ν are measures on a measurable space (Ω,F) such
that µ(Ω) = ν(Ω) <∞, then the class D defined in (2.44) is a λ-system.
Proof. Suppose A ∈ D. Then µ(Ac) = µ(Ω) − µ(A) = ν(Ω) − ν(A) = ν(Ac),
so Ac ∈ D. Now consider a countable family {An}n ⊆ D of disjoint sets. Then
µ(∪nAn) =
∑
n µ(An) =
∑
n ν(An) = ν(∪nAn), so ∪nAn ∈ D.
We now have enough motivation for the following celebrated tool.
Theorem 2.48 (Dynkin’s pi-λ Theorem). Let Ω be a sample space. Suppose
C is a pi-system of subsets of Ω and D is a λ-system on Ω. If D contains C,
then D contains σ(C).
20
Proof. See [Coh13, 1.6.2].
Proof of Theorem 2.43. Assume for a moment that µ(Ω) = ν(Ω) < ∞. Define
the class D as in (2.44). By Proposition 2.47, D is a λ-system. By the pi-λ
Theorem, σ(C) ⊆ D, which means exactly that µ = ν on σ(C) as claimed.
Now drop the previous assumption, and instead assume that µ(An) = ν(An) <
∞ for some sequence An ↑ Ω of sets in C. For each n ∈ N, define the restriction
measures µn and νn by µn(A) = µ(A ∩ An) and νn(A) = ν(A ∩ An). With
this definition, µn(Ω) = µ(An) = ν(An) = ν(Ω) < ∞. Note also that, for
each C ∈ C, we have C ∩ An ∈ C because C is a pi-system. Hence, µn(C) =
µ(C ∩An) = ν(C ∩An) = νn(C) for every C ∈ C.
So, for each n, the measures µn and νn are in the previous case, and, for every
A ∈ σ(C), we have µn(A) = νn(A). Using continuity from below,
µ(A) = lim
n
µ(A ∩An) = lim
n
µn(A) = lim
n
νn(A) = lim
n
µ(A ∩An) = νn(A),
which concludes the proof.3
In §3.2, we will use the pi-λ Theorem to prove that the probability of any event
can be approximated by an event in a generating algebra.
Another useful application of the pi-λ Theorem is to study independent random
variables. Suppose we know that the vector (X,Y ) satisfies
P
(
(X,Y ) ∈ (−∞, r]× (−∞, t]) = P(X ∈ (−∞, r])P(Y ∈ (−∞, t])
for all r, t ∈ R. Does it imply that
P
(
(X,Y ) ∈ A×B) = P(X ∈ A)P(Y ∈ B)
for every A,B ∈ B(R)? The answer is yes, and the proof is a little more ingenious
than the one we just showed. It uses twice the fact that {(−∞, b] : b ∈ R} is a
pi-system that generates B(R). We leave it as a teaser for now.
3Expanded from [Coh13, 1.6.4].
21
3 Existence and approximation
In this chapter we introduce a fundamental tool to prove existence of measure
spaces starting from the specification on an algebra. We give an example of a
finitely additive measure that cannot be extended to a true measure, and state
a sufficient condition for such extension to exist. We then discuss how this tool
is used to construct the Lebesgue measure on R as well as Lebesgue-Stieltjes
measures which include all probability measures. We then give two different
proofs of the approximation theorem, which says that events on probability
spaces can be approximated by events on a generating algebra. We conclude by
discussing the Lebesgue measure on Rd.
3.1 Carathéodory Extension Theorem
In this section, we introduce the key theorem that allows us to extend a
“measure” µ from an algebra to a larger σ-algebra.
Given a sample space Ω and an algebra A, we say that a function µ : A → [0,∞]
is a finitely additive measure if it is non-negative and finitely additive, that is,
(i) µ(∅) = 0
(ii) µ(A ∪B) = µ(A) + µ(B) for all disjoint A,B ∈ A.
Note that finitely additive measures are not true measures. The theory of finitely
additive measures is rather limited, as one cannot take limits there.
Example 3.1 (Uniform “measure” on N). Suppose we pick a large natural
number at random. The chances that this number is odd should be 12 . The
chances that a year is a leap year should be 97400 . This notion of chance is
finitely additive. For instance, the chances that a year is either odd or a leap
year would be 297400 , and the chance that it is not a leap year would be
303
400 . We
could even talk about independent events. For instance, this random number
being a multiple of 5 and a multiple of 6 would be independent, whereas a year
being multiple of 5 and leap would not. This “frequentist” notion would be
µ(A) = lim
n→∞
#(A ∩ [1, n])
n
,
and that this µ can be extended4 to a finitely additive measure defined on all
P(N), including sets A for which the limit does not exist. However, taking
An = {n, n + 1, n + 2, . . . }, we have µ(An) = 1 for every n in spite of An ↓ ∅
and µ(∅) = 0. Not really the type of measure that we want to study here. 4
Fortunately, asking µ to be “σ-additive whenever possible” is enough to fix this.
Definition 3.2 (Pre-measure). Given a sample space Ω and an algebra A, we
say that a finitely additive measure µ : A → [0,∞] is a pre-measure if it is
4This requires the Hahn-Banach Theorem which is a topic of Functional Analysis.
22
σ-additive, that is, µ(
⋃
nAn) =
∑
n µ(An) for every sequence of disjoint sets
An ∈ A whose union A happens to be in A.
We now state a fundamental theorem of Measure Theory and Probability.
Theorem 3.3 (Carathéodory Extension Theorem). Let A be an algebra on
Ω. If µ is a pre-measure on (Ω,A), then µ extends to a measure on (Ω, σ(A)).
Moreover, if the extension µ is σ-finite, then it is unique.
Before outlining its proof, let us consider its main application.
3.1.1 Lebesgue measure on R
Here we prove the existence part in Theorem 2.36. Fix Ω = R and
E = {(a, b] ∩ R : −∞ 6 a < b 6 +∞}
as in Example 2.23. We already know that the class A of sets A given by R, ∅
and finite unions of disjoint left open intervals is an algebra. Set m(I) = +∞
for infinite intervals I, and
m((a, b]) := b− a
for finite intervals (a, b]. We can extend the definition of m to A by
m(A) =
n∑
j=1
m(Ij), (3.4)
where A = I1 ∪ · · · ∪ In and Ik ∈ E are disjoint. Note that the above formula
is well-defined even if I can be written in many different ways as such a finite
disjoint union (for instance, if I1 = (0, 1], I2 = (1, 2], it gives m((0, 2]) =
m((0, 1]) +m((1, 2]) = 2). We omit the straightforward proof of this fact.
Lemma 3.5. The set function m : A → [0,∞] is finitely additive.
Proof. Immediate from the fact that (3.4) is well-defined.
What is not so easy to prove is that m is σ-additive. For instance, take Ak =
( 1k+1 ,
1
k ] ∈ E . Note that the Ak’s are disjoint and
⋃
k Ak = (0, 1], which happens
to be in A. In this case,
∞∑
j=1
m(Aj) =
∞∑
j=1
(
1
j
− 1
j + 1
)
= 1 = m((0, 1]).
To conclude the proof of Theorem 2.36 from Theorem 3.3, one needs to show
that this property holds in general.
23
Lemma 3.6. The set function m : A → [0,∞] is a pre-measure.
Proof. See [Dem20, Lemma 1.1.31].
Example 3.6. If we take A ⊆ [0, 1) as a Vitali set, then A 6∈ B(R).
We sketch the proof. This is just expanding part of the sketch in Example 1.9.
If A were in B(R), we could define m(A) > 0. Then by translation invariance,
for each rn ∈ Q ∩ [0, 1), defining An = {x + rn mod 1 : x ∈ A}, we would
have m(An) = m(A). Moreover, these sets (An)n are mutually disjoint and
∪nAn = [0, 1). So m([0, 1)) =
∑
n∈Nm(An) =
∑
n∈Nm(A) which is either 0 or
∞ depending on whether m(A) > 0 or not. This contradicts m([0, 1)) = 1. 4
3.1.2 Lebesgue-Stieltjes measures
Let F : R → R be a right-continuous non-decreasing function. Consider the
sample space Ω = R and the class E = {(a, b] : −∞ < a < b < +∞}. Define the
set function mF : E → [0,∞) by mF ((a, b]) = F (b)− F (a).
We can define mF on semi-infinite intervals by taking limits as a → −∞ or
b → +∞. Consider the algebra A defined in Example 2.23 and revisited in
§3.1.1. We can extend mF to A in a unique (and obvious) way so that it is
finitely additive. It can be shown that mF is, in fact, a pre-measure on A.
Again, by Carathéodory Extension Theorem, there exists a measure mF on
(R,B) that gives the right values on E . Moreover, since E is a pi-system which
contains a sequence (−n, n] ↑ R of sets with finite measure, this measure is
unique. Such measure is called the Lebesgue-Stieltjes measure induced by the
function F on the space R.
Remark 3.7. If F (x) = x, then mF reduces to the Lebesgue measure on R. 4
Remark 3.8. We say that F is a distribution function if it is right-continuous,
non-decreasing, limx→−∞ F (x) = 0 and limx→∞ F (x) = 1. In this case, mF is a
probability measure on R. Conversely, if P is a probability measure on R, then
F (x) = P({ω : ω 6 b}) is a distribution function. So these two types of objects
are equivalent. 4
3.1.3 Outer measures and measurable sets
Here we outline the proof of Theorem 3.3. Given a pre-measure µ on (Ω,A), we
define the outer measure µ∗ : P(Ω) → [0,∞] as follows. For each subset E of
Ω, consider the collection DE of all sequences (An)n of sets in A that cover E:
DE =
{
(An)n : An ∈ A for all n and E ⊆
⋃
n
An
}
.
The outer measure µ∗ is defined by
µ∗(E) = inf
(An)∈DE
∑
n
µ(An). (3.9)
24
One might think of the outer measure as a conservative upper bound for what
should be the measure of a set seen from the outside. Loosely speaking,
problematic sets such as those of Examples 1.9 and 1.8 look bigger from the
outside than form the inside. It turns out, the way to exclude sets like this is
to consider sets E ⊆ Ω such that,
for every F ⊆ Ω, µ∗(F ) = µ∗(F ∩ E) + µ∗(F ∩ Ec).
The sets E satisfying this condition are called µ∗-measurable, and the collection
of such sets is denoted by F∗.
Lemma 3.10 (Properties of the outer measure). The outer measure µ∗ satisfies:
(i) A ∈ A =⇒ µ∗(A) = µ(A),
(ii) A ⊆ B =⇒ µ∗(A) 6 µ∗(B),
(iii) Let {Ai}∞i=1 ⊆ A, then µ∗(
⋃∞
i=1Ai) 6
∑∞
i=1 µ
∗(Ai).
Proof. See [Kub15, Proposition 8.3].
The last property above implies that µ∗(F ) 6 µ∗(F ∩ E) + µ∗(F ∩ Ec), so
problematic sets are those where inequality is strict. For the Vitali set shown
in Example 1.9, we have µ∗(E) + µ∗([0, 1) \ E) > 1, which one might interpret
as E being larger seen from the outside than from the inside.
Lemma 3.11 (Carathéodory). The class F∗ of µ∗-measurable sets is a σ-
algebra on Ω which contains A, and the restriction of µ∗ to F∗ is a measure.
Proof. See [Kub15, Theorem 8.4].
From the two lemmas above, we have that A ⊆ F∗, hence σ(A) ⊆ σ(F∗) =
F∗. Moreover, µ∗, restricted to σ(A), is σ-additive, hence it is a measure.
Furthermore, µ∗, restricted to A, coincides with µ. This proves the existence
part of Theorem 3.3.
For uniqueness, suppose µ∗ is σ-finite, so there exist a countable collection
{An}n in σ(A) such that µ∗(An) < ∞ for all n, and ∪nAn = Ω. By (3.9),
for each n there is a countable family {An,k}k such that ∪kAn,k ⊇ An and
µ∗(An,k) < ∞. Re-indexing the doubly-indexed countable family {An,k}n,k as
{Bj}j∈N and defining C` = B1 ∪ · · · ∪ B`, we have C` ∈ A, µ(C`) < ∞ and
C` ↑ Ω. Noting that A is a pi-system, it follows from Theorem 2.43 that µ∗ is
the unique measure on σ(A) that agrees with µ on A, concluding the proof of
Theorem 3.3.
3.2 Approximation by events in a generating algebra
The symmetric difference A4B is a way to measure how much two events A
and B are different. It is defined by A4B := (A \B) ∪ (B \A).
25
The symmetric satisfies an analogue of the triangle inequality: A4C ⊆ (A4B)∪
(B4C). It also satisfies Ac4Bc = A4B. Moreover, (A ∪ B)4(C ∪ D) ⊆
(A4C) ∪ (B4D). (Why?)
Theorem 3.12. Let (Ω,F ,P) be a probability space, and A ⊆ F be an
algebra. For every B ∈ σ(A) and ε > 0, there exists A ∈ A such that
P(A4B) < ε.
Remark 3.13. We can generalise Theorem 3.12 to any measure space (Ω,F , µ),
as long as µ is σ-finite and µ(B) <∞. 4
We give two different proofs.
Proof using outer measure. Fix B ∈ σ(A). By construction of the outer
measure P∗ from (3.9) and using Lemma 3.11, we have
P(B) = P∗(B) = inf
(Dn)∈DB
∞∑
n=1
P(Dn).
Hence, for a given ε > 0, we can choose a sequence of sets {Dn}∞n=1 in A with
B ⊆
∞⋃
n=1
Dn and
∞∑
n=1
P(Dn) < P∗(B) +
ε
2 ,
where we also used the fact that P(B) is finite. Now take k ∈ N such that
P(
∞⋃
n=k+1
Dn) 6
∞∑
n=k+1
P(Dn) <
ε
2 ,
and set A :=
k⋃
n=1
Dn ∈ A. Then, we get
B \A ⊆
∞⋃
n=k+1
Dn and A \B ⊆
( ∞⋃
n=1
Dn
) \B,
which consequently imply that
P(B \A) < ε2 and P(A \B) <
ε
2 ,
respectively. Thus, we can deduce that
P(A4B) < ε,
which is precisely what we wanted to show.5
5Based on [Tay73, Theorem 4.4].
26
Alternatively, Theorem 3.12 can be proved using the pi-λ Theorem.
Proof using the pi-λ Theorem. Consider the collection G given by
G = {B ∈ σ(A) : ∀ε > 0, ∃A ∈ A such that P(A4B) < ε}.
Note that G contains A. Note also that, for being an algebra, A is also a pi-
system. Thus, it suffices to show that G is a λ-system on Ω, since we can then
apply the pi-λ Theorem to deduce that G = σ(A). Indeed, we have
(i) Ω ∈ G, since P(Ω4Ω) = 0.
(ii) Let B ∈ G. Then, for each ε > 0, there is A ∈ A such that P(A4B) < ε.
By noticing that
Ac4Bc = A4B,
where Ac ∈ A, we can deduce that Bc ∈ G.
(iii) Let B1, B2, · · · ∈ G, where Bj are disjoint. Then, for each j ∈ N and ε > 0,
there is Aj ∈ A such that P(Aj4Bj) < ε2j+1 . By observing that
(
k⋃
i=1
Ai)4(
k⋃
i=1
Bi) ⊆
k⋃
i=1
(Ai4Bi).
For each k ∈ N, we can apply finite sub-additivity to get
P((
k⋃
i=1
Ai)4(
k⋃
i=1
Bi)) <
ε
2 .
Now, since P is a finite measure, take K ∈ N such that
P(
∞⋃
i=K+1
Bi) =
∞∑
i=K+1
P(Bi) <
ε
2 ,
and set A :=
⋃K
i=1Ai ∈ A. By noticing that
A4(
∞⋃
i=1
Bi) ⊆
K⋃
i=1
(Ai4Bi) ∪ (
∞⋃
i=K+1
Bi),
we can apply finite sub-additivity to conclude that
⋃∞
i=1Bi ∈ G.
and so we are done.6
6Based on [Edg98, 1.1.9]
27
b2
a2
a1 b1
b3
a3b2
a2
a1 b1
Figure 3.1: Elements of E for d = 2 and d = 3.
3.3 Lebesgue measure on Rd
Consider the measurable space (Rd,B(Rd)), where B(Rd) denotes the Borel σ-
algebra on Rd, d > 1. More precisely, we define B(Rd) := σ(E), where
E = {(a, b] : −∞ < ai < bi < +∞, i = 1, . . . , d}
and
(a, b] = (a1, b1]× . . . (ad, bd] ⊆ Rd.
Figure 3.1 illustrates the structure of elements in E for higher dimensions, which
take the shape of rectangles and cuboids for d = 2 and d = 3.
It can be checked that the family
A := {I1 ∪ · · · ∪ In : n ∈ N0, Ik ∈ E for k = 1, . . . , n}
is an algebra of sets on Rd.
Now, for any (a, b] ∈ E , we define the set function
m
(
(a, b]
)
:= (b1 − a1) · · · (bd − ad)
as the d-dimensional volume of the cuboid (a, b]. As in §3.1.1, it is possible to
prove that m defines a pre-measure on A. Hence, by Carathéodory Extension
Theorem, m extends to a measure on (Rd,B(Rd)). The measure m is called
the Lebesgue measure on Rd, which is the analogue to the 1-dimensional case.
Also, E is a pi-system on Rd that generates B(Rd), from which one can also show
uniqueness of m.
Example 3.14. Take a point set {x} = {(x1, . . . , xd)} ∈ B(Rd). Indeed, as
one might expect, the Lebesgue measure m on (Rd,B(Rd)) assigns value of 0
for {x}. To show this rigorously, consider Ak = {y ∈ Rd : xi − 1k 6 yi 6
xi, i = 1, . . . , d} ∈ B(Rd), where Ak ⊇ Ak+1,∀k ∈ N. Then, one see that
{x} = ⋂∞i=1Ai. Now, by continuity from above, it follows that,
m({x}) = m
( ∞⋂
i=1
Ai
)
= lim
n→∞m(An) = limn→∞
(
1
n
)d
= 0.
28
x1,0 x1,1
x2,0 x2,2x2,1
xn,0 xn,k xn,n
Figure 3.2: Covering L by smaller boxes as Ln.
4
Remark 3.15. As in the above example, monotone-convergence properties of a
measure are quite helpful when it comes to computing measures of sets. 4
Remark 3.16. If a set in B(Rd) is countable, then it has Lebesgue measure 0.
Equivalently, if a set in B(Rd) has strictly positive Lebesgue measure, then it is
uncountable. 4
Example 3.17. For d > 2, we look at a line segment L = {x ∈ Rd : x =
rα+β, r ∈ [0, 1]}, where α, β ∈ Rd are fixed. Set points on L by, xn,k = knα+β,
so as to consider Ln =
⋃n
k=1(xn,k−1, xn,k] ∈ B(Rd), where Ln ⊇ Ln+1,∀n ∈ N.
The idea is to iteratively split the line segment L into smaller segments and cover
each such segment by via small squares Ln, which is captured in Figure 3.2.
Now, we can see that, L =
⋂∞
i=1 Li, so applying continuity from above yields,
m(L) = m
( ∞⋂
i=1
Li
)
= lim
n→∞m(Ln) = 0.
4
Remark 3.18. Notice that, L lies in a subspace of Rd. In general, if A ∈ B(Rd)
lies in a subspace of Rd, then m(A) = 0. 4
29
4 Measurable functions and random variables
This chapter intends to introduce and construct the concept of measurable
functions, through a number of important properties and theorems.
4.1 Definition and properties
As its name suggests, a measurable function between two measurable spaces
preserves the measurability properties of sets.
Definition 4.1 (Measurable function). Let (Ω1,F1) and (Ω2,F2) be two
measurable spaces. A function f : Ω1 → Ω2 is called F1/F2-measurable if
the pre-image of any F2-measurable set is F1-measurable. That is,
f−1(A) ∈ F1, ∀A ∈ F2,
where f−1(A) := {ω1 ∈ Ω1 : f(ω1) ∈ A} is the pre-image of f . When the σ-
algebras F1 and F2 are clear from context, we often drop ‘F1/F2-measurable’
to just ‘measurable’ or ‘F1-measurable’.
Example 4.2. Let (Ω,F) be a measurable space and A ∈ F . We show that the
indicator function 1A : Ω→ {0, 1} given by
1A(ω) =
{
1, ω ∈ A
0, ω /∈ A
is F/P({0, 1})-measurable. Indeed, we have that
• 1−1A (∅) = ∅ ∈ F ,
• 1−1A ({0}) = Ac ∈ F ,
• 1−1A ({1}) = A ∈ F ,
• 1−1A ({0, 1}) = 1−1A ({0}) ∪ 1−1A ({1}) = Ac ∪A = Ω ∈ F ,
so 1A is indeed measurable. 4
Naturally, one can expect from Definition 4.1 that a composition of measurable
functions is measurable in any general measurable spaces.
Lemma 4.3 (Composition of measurable functions). Let (Ω1,F1), (Ω2,F2) and
(Ω3,F3) be measurable spaces and f : Ω1 → Ω2, g : Ω2 → Ω3 be measurable
functions. Then, the composition g ◦ f is (F1/F3-)measurable.
Proof. Since g is measurable, then for any A ∈ F3, we have g−1(A) ∈ F2. Now,
since f is measurable, then f−1(g−1(A)) ∈ F1. But, this implies that
(g ◦ f)−1(A) = f−1(g−1(A)) ∈ F1, ∀A ∈ F3,
30
so that the composition g ◦ f is indeed measurable.
Given a target measurable space (Ω2,F2) and a function f : Ω1 → Ω2, we can
construct a σ-algebra on Ω1 induced by f .
Definition 4.4 (σ-algebra generated by a function f). Let (Ω2,F2) be a
measurable space and f : Ω1 → Ω2 a function, where Ω1 is a sample space.
The σ-algebra generated by f on Ω1 is defined as
σ(f) := {f−1(A) : A ∈ F2},
From Exercise 2.7, σ(f) is indeed a σ-algebra. See the diagram below:
Ω1
f

σ(f)
Ω2 F2
f−1
OO
Note that the notation σ(f) does not make explicit its dependence on F2.
Remark 4.5. It follows from the definition of measurability that σ(f) is the
smallest σ-algebra on Ω1 for which f is measurable. That is, for any σ-algebra
G on Ω1, the function f is G/F2-measurable if and only if G ⊇ σ(f). The latter
condition is interpreted as G being finer than σ(f). 4
Definition 4.6. If f is a measurable function from a measure space (Ω1,F1, µ)
to (Ω2,F2), we can define the push-forward measure f∗µ on (Ω2,F2) by
(f∗µ)(A) = µ({ω ∈ Ω1 : f(ω) ∈ A}) = µ(f−1(A)).
This gives an enlarged diagram as follows:
Ω1
f

σ(f) µ
◦f−1

Ω2 F2
f−1
OO
f∗µ
(4.7)
Quite often, it is unhandy to check measurability using the whole target space,
in which case the following lemma comes in our aid.
Lemma 4.8 (Measurability criterion). Let (Ω1,F1) and (Ω2,F2) be measurable
spaces. Suppose F2 = σ(E) for some class E of subsets of Ω2. Then, f : Ω1 → Ω2
is a measurable function, if and only if f−1(A) ∈ F1 for all A ∈ E.
Proof. The first direction follows from Definition 4.1. Since any A ∈ E implies
that A ∈ F2, then measurability of f gives f−1(A) ∈ F1. Conversely, suppose
31
that f−1(A) ∈ F1, for each A ∈ E . To prove that f is measurable, it suffices to
show that the collection G given by
G = {A ⊆ Ω2 : f−1(A) ∈ F1}
is a σ-algebra on Ω2, where E ⊂ G, so that F2 = σ(E) ⊆ G. Indeed, we observe
that
(i) First, Ω2 ∈ G, since f−1(Ω2) = Ω1 ∈ F1.
(ii) Let A ∈ G, so f−1(A) ∈ F1. Since f has domain Ω1 and codomain Ω2, we
have f−1(Ω2 \ A) = Ω1 \ f−1(A). Since F1 is closed under complements,
we have Ω1 \ f−1(A) ∈ F1, hence Ac ∈ G.
(iii) Let A1, A2, · · · ∈ G. Then, f−1(Ak) ∈ F1, for each k ∈ N. This in turn
implies that f−1(
⋃∞
i=1Ai) =
⋃∞
i=1 f
−1(Ai) ∈ F1. Thus,
⋃∞
i=1Ai ∈ G.
and so G defines a σ-algebra on Ω2.
Lemma 4.9 (Continuous functions). Let Ω1 and Ω2 be metric spaces. Take
B1 and B2, respectively, as the Borel σ-algebras, that is, those generated by the
families of open sets. If f : Ω1 → Ω2 is continuous, then it is measurable.
Proof. Denote by O the class of open sets in Ω2. Let B ∈ O. Since f is
continuous, f−1(B) is an open set in Ω1, hence f−1(B) ∈ B1. Since B2 = σ(O),
by Lemma 4.8, f is measurable.
4.2 Extended Borel functions and random variables
We now discuss extended Borel functions and extended random variables.
4.2.1 Extended Borel functions
Definition 4.10 (Extended Borel function). Let (Ω,F) be a measurable
space. If f : (Ω,F) → (R,B) is a measurable function, then it is called a
Borel function. Likewise, a measurable function taking values on (R,B) is
called an extended Borel function.
Remark 4.11. One can think of a Borel function as simply an extended Borel
function that never takes infinite values. So properties of extended Borel
functions also hold for Borel functions. 4
Recall from §2.1.3 the several classes of subsets of R that generate B(R) and
hence B(R). We can then apply Lemma 4.8 above to conclude that measurability
suffices to hold for each such class on R.
Lemma 4.12 (Equivalent criteria). Let (Ω,F) be a measurable space and f :
Ω→ R. The following are equivalent:
32
(a) f is an extended Borel function,
(b) {f < a} ∈ F , ∀a ∈ R,
(c) {f 6 a} ∈ F , ∀a ∈ R,
(d) {f > a} ∈ F , ∀a ∈ R,
(e) {f > a} ∈ F , ∀a ∈ R.
Proof. By Exercise 2.17, each one of the classes {f < a}a∈R, {f 6 a}a∈R,
{f > a}a∈R and {f > a}a∈R generates B(R). Hence, the equivalence follows
from Lemma 4.8.
Lemma 4.13 (Pull-back of Borel sets). Let Ω be a sample space and f a
function from Ω to (R,B). Then, σ(f) = σ({{f 6 a} : a ∈ R}).
Proof. Since [−∞, a] ∈ B, we have {f 6 a} ∈ σ(f), for each a ∈ R. Thus,
{f 6 a}a∈R ⊆ σ(f), and so G := σ({f 6 a}a∈R) ⊆ σ(σ(f)) = σ(f). On the
other hand, by Lemma 4.12 f is G/B(R)-measurable. Therefore, by Remark 4.5,
G ⊇ σ(f).
Example 4.14. Let (Ω,F) be a measurable space. Given a partitionA1, . . . , An ∈
F of Ω and b1, . . . , bn ∈ R, define the function f : Ω→ R by
f(ω) := b11A1(ω) + · · ·+ bn1An(ω), ω ∈ Ω.
Suppose that Ai ∩ Aj = ∅ for i 6= j, and bi 6= bj for i 6= j. We will show that
σ(f) = σ({A1, . . . , An}). To prove that σ(f) ⊆ σ({A1, . . . , An}), it suffices to
show that
{f 6 a} ∈ σ({A1, . . . , An}), ∀a ∈ R,
and by Lemma 4.13, we can conclude that σ(f) ⊆ σ({A1, . . . , An}). Now note
that
{f 6 a} = {b11A1 + · · ·+ bn1An 6 a}
=
{
Aij : bij 6 a, ij ∈ {1, . . . , n}, j = 1, . . . ,m
}
=
m⋃
j=1
Aij ∈ σ({A1, . . . , An}),
which follows by construction of f . Conversely, to show that {A1, . . . , An} ⊂
σ(f), observe that
{f = bi} = Ai ∈ σ(f), i = 1, . . . , n,
and so σ({A1, . . . , An}) ⊆ σ(f). 4
Exercise 4.15. Consider the measurable space ([0, 1],B([0, 1])), define the
functions f, g : [0, 1]→ {0, 1, 2, 3} by
33
f(ω) =

1, ω ∈ [0, 14 ]
2, ω ∈ ( 14 , 12 ]
3, ω ∈ ( 12 , 1]
, g(ω) =
{
0, ω ∈ [0, 12 ]
1, ω ∈ ( 12 , 1]
.
Find σ(f), and check that σ(g) ⊂ σ(f). 4
4.2.2 Extended random variables
Definition 4.16 (Extended random variable). An extended Borel function
X on a probability space (Ω,F ,P) is called an extended random variable.
The distribution of X is the push-forward measure PX on (R,B) given by
PX := X∗P = P ◦X−1,
that is,
PX(A) = P(X ∈ A) = P({ω : X(ω) ∈ A}), A ∈ B(R).
The function FX : R→ [0, 1] given by
FX(x) := PX
(
[−∞, x])
is called the cumulative distribution function of X.
Remark 4.17. If X never take values ±∞, we say that X is a random variable.
4
Remark 4.18. For a random variable X, the measure PX on (R,B) is the
Lebesgue-Stieltjes measure induced by the function FX , as defined in §3.1.2. 4
Example 4.19. Let (Ω,F ,P) be a probability space and A some event in F .
Consider the random variable X given by
X(ω) = 1A(ω), ω ∈ Ω,
with distribution
P(X = 1) = P(A) =: p ∈ [0, 1].
Then, X is called a Bernoulli random variable with parameter p. 4
Example 4.20. Let us revisit the construction of Example 4.14. Let (Ω,F ,P) be
a probability space, and for n ∈ N, let A1, . . . , An be events in F that partition
Ω and b1, . . . , bn distinct real numbers. Consider the random variable X defined
by
X(ω) =
n∑
i=1
bi1Ai(ω), ω ∈ Ω,
34
with distribution
P(X = bk) = P(Ak) =: pk ∈ [0, 1], k = 1, . . . , n.
Then, X is called a simple random variable that takes value bk with probability
pk, and has cumulative distribution function
FX(x) =
n∑
i=1
pi1{(−∞,bi]}(x). 4 (4.21)
Example 4.22. Take Ω = R, F = B and let P = m|[0,1] be the restriction of
the Lebesgue measure to [0, 1]. Let X(ω) = − log |ω|. Then X is measurable
(why?), hence an extended random variable. Moreover,
FX(x) = m({ω : − log |ω| 6 x} ∩ [0, 1])) =
{
1− e−x, x > 0,
0, x 6 0,
and X is said to follow the exponential distribution with unit parameter. 4
4.3 Operations on extended Borel functions
Given a measurable space (Ω,F), let f, g : Ω→ R be extended Borel functions.
A property that we constantly use is that measurability is preserved under
various operations. The properties studied in this section are so basic that in
subsequent chapters we use them without making explicit reference.
Lemma 4.23 (Comparison of functions). Let (Ω,F) be a measurable space and
f, g : Ω → R be extended Borel functions. Then, the subsets of Ω given by
{f < g}, {f 6 g}, {f = g}, and {f 6= g} are in F .
Proof. Since Q is dense in R, for each ω ∈ Ω, f(ω) < g(ω) if and only if there
is r ∈ Q such that f(ω) < r < g(ω). Thus, we can express {ω : f(ω) < g(ω)} as
the union of {ω : f(ω) < r < g(ω)} over all r ∈ Q, and Q is countable. Further,
{f < r}, {r < g} ∈ F , by Lemma 4.12 (b) and (d). Putting everything together
yields
{f < g} =
⋃
r∈Q
({f < r < g}) =
⋃
r∈Q
({f < r} ∩ {r < g}) ∈ F . (4.24)
Now, we can see that
{f 6= g} = {f < g} ∪ {f > g} ∈ F ,
{f = g} = {f 6= g}c ∈ F ,
{f 6 g} = {f < g} ∪ {f = g} ∈ F .
This proves the lemma.
35
As one would expect, sums and products of Borel functions f and g on some
measurable space (Ω,F) are also Borel. Importantly, we can extend this to f
and g being extended Borel functions, as we only need to impose restrictions on
infinite values. In particular, f + g is extended Borel if it is well-defined, that
is, if there is no ω ∈ Ω for which f(ω) + g(ω) =∞−∞.
Lemma 4.25 (Sums and products). Let (Ω,F) be a measurable space and f, g :
Ω→ R be extended Borel functions. Then fg is an extended Borel function. If
f + g is well-defined for all ω, then it is an extended Borel function.
Proof. First, we show that af+b is an extended Borel function for every a, b ∈ R.
For a = 0, the proof is trivial, so we assume that a ∈ R \ {0}. We have that
{af + b < c} = {af < c− b} =
{
{f < c−ba }, a > 0
{f > c−ba }, a < 0
where {f < c−ba }, {f > c−ba } ∈ F , by Lemma 4.12 (b) and (d).
Now suppose f + g is well-defined for all ω. For every b ∈ R, it follows that
{f + g < b} = {f < b− g} ∈ F ,
since b− g is extended Borel and by applying Lemma 4.23.
Next, we show that fg is extended Borel, where we treat the case of infinite
values separately. First we show that f2 is also extended Borel. We have that
{f2 > a} =
{
Ω, a < 0
{f < −√a} ∪ {f > √a}, a > 0
where Ω ∈ F and {f < −√a} ∪ {f > √a} ∈ F , by Lemma 4.12 (b) and (d).
Now suppose f and g are both finite Borel functions. Notice that we can write
fg = 14[(f + g)
2 − (f − g)2],
where (f + g)2 and (f − g)2 are both Borel, since f + g and f − g are, so we can
conclude that fg is Borel.
Finally, suppose f and g are both extended Borel functions. Let A = {ω :
f(ω) ∈ R, g(ω) ∈ R}, and B = Ac = {ω : f(ω) = ±∞ or g(ω) = ±∞}. Note
that A,B ∈ F . Define f1 = f1A, f2 = f1B , g1 = g1A, g2 = g1B . Then
f2g2 is measurable (why?), and by the previous case f1g1 is measurable. Since
fg = f1g1 + f2g2, we conclude that fg is measurable.
Remark 4.26. If f is an extended Borel function on some measurable space
(Ω,F), then any scalar multiple of f is also extended Borel, that is for any
a ∈ R, af is an extended Borel function (why?). 4
36
Lemma 4.27 (Maximum and minimum). Let (Ω,F) be a measurable space and
f, g : Ω → R be extended Borel functions. Then, we have that max{f, g} and
min{f, g} are extended Borel functions.
Proof. By taking the suitable inequalities for minimum and maximum, For every
a ∈ R, we have
{max{f, g} < a} = {f < a} ∩ {g < a} ∈ F ,
since {f < a}, {g < a} ∈ F by Lemma 4.12 (b). Finally, use min{f, g} =
−max{−f,−g} and Remark 4.26 for the minimum.
Limiting operations also maintain measurability.
Lemma 4.28 (Suprema and limits of functions). Let (Ω,F) be a measurable
space and (fn)n>1 be a sequence of extended Borel functions. Then, the
functions supn>1 fn, infn>1 fn, lim supn→∞ fn, lim infn→∞ fn are extended
Borel functions. Moreover, if limn→∞ fn exists, then
lim
n→∞ fn = lim supn→∞
fn = lim inf
n→∞ fn,
is an extended Borel function.
Proof. To prove that supn>1 fn is extended Borel, the idea is similar to the
proof of Lemma 4.27. For any a ∈ R, we can write{
sup
n>1
fn 6 a
}
=
∞⋂
n=1
{fn 6 a} ∈ F .
where we used the fact that F is a σ-algebra, so it closed under countable
intersections. The claim for infn>1 fn follows by noticing that
inf
n>1
fn = − sup
n>1
(−fn),
and using Remark 4.26. Now, using the definition of limit superior and limit
inferior,
lim sup
n→∞
fn = inf
n>1
sup
m>n
fm,
lim inf
n→∞ fn = supn>1
inf
m>n
fm,
which both are extended Borel, by applying the above properties twice for each
limit. Finally, assuming that the limit of fn exists, we have limn→∞ fn =
lim supn→∞ fn, which concludes the proof.
37
5 Integration and expectation
In Real Analysis, we first introduce the Riemann Integral, which approximates
the area under a curve on a given interval via vertical rectangles and considering
a sequence of step functions that converges to the associated curve. However
this procedure is not the most suitable to analyse sequences of functions when
taking limits. In contrast, the Lebesgue Integral considers horizontal shapes,
which makes it more flexible. It is also more descriptive as to when it is possible
to take limits. As an example, suppose we want to show that7
lim
n→∞
∫ ∞
0
e−nx
2
√
x
dx = 0
or
d
dt
∫ 1
0
etxf(x) dx =
∫ 1
0
xetxf(x) dx
for some bounded continuous f . Sure, we could do these by brute force, making
estimate over estimate. However, these will become much easier if we use the
Dominated Convergence Theorem, to be introduced shortly.
The reason to abandon the Riemann integral and adopt the Lebesgue integral
does not stem from the possibility of integrating exotic functions, but
rather from the robust and streamlined way in which the Lebesgue integral
commutes with limits, derivatives, Fourier series and other integrals.
Another reason to introduce the Lebesgue integral is that some measure spaces
(such as probability spaces!) are not naturally partitioned into little intervals or
little cubes, but we can still make measurements there. In order to define an
integral in this case, instead of partitioning the domain and measuring the height
of the graph of a function, we partition the co-domain and measure chunks of
the domain, as shown in Figure 5.1.
5.1 Axiomatic definition
In this section we define the integral of extended Borel functions by assuming
existence of a linear unitary monotone σ-additive operator on non-negative
functions.
7Example from [Bar95].
38
g(x)
x
X(ω)
ω
Figure 5.1: Comparison between the Riemann integral on R and the Lebesgue
integral on a measure space.
5.1.1 Non-negative functions
Let (Ω,F , µ) denote a measure space. The integral of non-negative extended
Borel functions against a measure µ is denoted by∫
Ω
f(ω)µ(dω) or
∫
Ω
f dµ
and defined as follows.
The Lebesgue integral is the unique operator with following properties:
• Linear:
∫
Ω(αf + g) dµ = α
∫
Ω f dµ+
∫
Ω g dµ;
• Unitary:
∫
Ω 1A dµ = µ(A);
• Monotone: f > g =⇒ ∫Ω f dµ > ∫Ω g dµ;
• σ-additive:
∫
Ω(
∑
n fn) dµ =
∑
n
∫
Ω fn dµ;
for α ∈ [0,+∞], A ∈ F and non-negative extended Borel functions f, g, fn.
We prove existence and uniqueness of this operator in §5.4. Before that, let us
discuss properties that follow from the above four.
Exercise 5.1. Let (Ω,F , µ) be a measure space and f > 0 an extended Borel
function. Show that
∫
Ω fdµ = 0 if and only if µ({f > 0}) = 0. 4
Solution. Suppose µ({f > 0}) > 0. Then by continuity from below of measures,
there exists ε > 0 and n ∈ N such that µ({f > 1n}) > ε (otherwise we would have
µ({f > 0}) 6 ε for every ε). Let g(ω) = ε if f(ω) > ε and 0 otherwise. Then
0 6 g 6 f , hence
∫
Ω f dµ >
∫
Ω g dµ >
ε
n > 0. Now suppose µ({f > 0}) = 0.
Define g(ω) = +∞ if f(ω) > 0 and 0 otherwise. Then 0 6 f 6 g, hence∫
Ω f dµ 6
∫
Ω g dµ =∞ · 0 = 0.
39
x x
f(x)
f+(x)
f−(x)
Figure 5.2: Separation of extended Borel function f : R → R by f+ and f−
(coloured)
.
5.1.2 Extended Borel functions and random variables
We now allow f to take values on [−∞,+∞], similarly to (1.11). Denote the
positive and negative parts of numbers by decomposing
x = [x]+ − [x]−, [x]+ =
{
x, x > 0,
0, x 6 0,
[x]− =
{
0, x > 0,
−x, x 6 0.
For a function f , denote its positive part f+ and negative part f− by
f±(ω) = [f(ω)]±,
see Figure 5.2 for an illustration of this decomposition. We then define∫
Ω
f dµ =
∫
Ω
f+ dµ−
∫
Ω
f− dµ. (5.2)
unless it gives “ +∞ −∞ ”, in which case ∫Ω f dµ is undefined. When both
integrals are finite, we say that f is integrable.
Exercise 5.3. Let f be an extended Borel function on (Ω,F , µ). Prove that f is
integrable if and only if
∫
Ω |f |dµ <∞. 4
This definition also results in a linear and monotone operator.
Theorem 5.4. The Lebesgue integral has the following properties:
• Homogeneous:
∫
Ω(αf) dµ = α
∫
Ω f dµ
• Additive:
∫
Ω(f + g) dµ = (
∫
Ω f dµ) + (
∫
Ω g dµ)
• Monotone: f > g =⇒ ∫Ω f dµ > ∫Ω g dµ.
for measurable f, g and α ∈ [−∞,+∞] as long as the integrals are defined.
These properties apply as follows. Homogeneity holds if both integrals are
defined. Linearity is true if both integrals are defined and their sum does
not result in “+∞ − ∞.” Monotonicity means that, if ∫ f+ dµ < ∞ or∫
g− dµ <∞, then both integrals are defined and satisfy the inequality.
40
We prove these properties from (5.2) in §5.4.
Exercise 5.5. Let f be a non-negative extended Borel function on (Ω,F , µ).
Show that
∫
Ω f dµ is defined and non-negative. 4
Exercise 5.6. Let f be an extended Borel function on (Ω,F , µ). Suppose that∫
Ω f dµ is defined. Show that |
∫
Ω f dµ| 6
∫
Ω |f |dµ. 4
Exercise 5.7. Let f, g be extended Borel functions on (Ω,F , µ). Suppose that
|f | 6 g and g is integrable. Show that f is integrable. 4
Exercise 5.8. Let f be an extended Borel function on (Ω,F , µ) and A ∈ F be
a measurable set. Suppose that
∫
Ω f dµ is defined. Show that
∫
Ω f1A dµ is
defined. 4
Exercise 5.9. Let f be an extended Borel function on (Ω,F , µ) and A ∈ F be a
measurable set. Suppose that f is integrable. Show that f1A is integrable. 4
We now give a precise and unified definition of expectation.
Definition 5.10 (Expectation). Let (Ω,F ,P) be a probability space. For
an extended random variable X, the expectation of X is defined as
EX =
∫
Ω
X dP.
Exercise 5.11. Let X be an extended random variable such that a 6 X 6 b for
all ω ∈ Ω. Prove that a 6 EX 6 b. 4
Exercise 5.12. Supposed that X is an integrable extended random variable.
Prove that E[X − EX] = 0. 4
Remark 5.13. Notation EX is used when the measure P is arbitrary or clear
form the context. If more than one measure are being considered, one may use
E and E˜ for the integrals against P and P˜, etc. If we need even more measures
than fonts and tildes allow, it may be better to write
∫
ΩX dPα instead. 4
5.2 Main properties of the integral
In this section we present some general properties of the integral, some
inequalities, conditions for an extended Borel function to be integrable, and
the relation with the Riemann integral.
5.2.1 General properties
Given a measurable set B ∈ F , the integral of f on B is given by the integral
of the function f1B which coincides with f on B and is zero outside B:∫
B
f dµ :=
∫
Ω
(f1B) dµ =
∫
Ω
f d(µ|B ) (5.14)
41
The first equality above is the definition of the integral on a subset whenever
the second integral is defined. The second equality says that the integral of f
on B coincides with the integral of f against the measure µ restricted to B, as
defined in Example 2.33, and is valid whenever one of the two is defined.
Preview of proof. We assume that f only takes values on {0, 1} (in §5.4 we will
see a standard procedure to bootstrap arguments for this type of function to
general functions). So f = 1A for some A ∈ F , and∫
Ω
(f1B) dµ =
∫
Ω
1A∩B dµ = µ(A ∩B) = µ|B (A) =
∫
Ω
f d(µ|B )
All this sounds great but at some point we will want to actually compute
an integral. There are many methods, but the most important one is the
Fundamental Theorem of Calculus.
Theorem 5.15 (Fundamental Theorem of Calculus). If F is continuous on
[a, b] and F ′ = f on (a, b), then∫
[a,b]
f dx = F (b)− F (a),
where dx denotes the Lebesgue measure.
Proof. It is is the same as the proof for the Riemann integral. The latter uses the
fact that the integral is monotone, finitely additive, and that
∫
[a,b] 1dx = b− a
for all a < b. All these three properties are satisfied by the Lebesgue integral
against the Lebesgue measure, hence the Fundamental Theorem of Calculus also
holds in this context.
Example 5.16.
∫ pi
0 sin x dx = − cospi + cos 0 = 2 4
Now, going one step further in the diagram of (4.7), if we have an extended
Borel function defined on Ω2, we can define its pull-back on Ω1 by composition:
Ω1
f→ Ω2 g→ R
ω 7→ ω′ 7→ x .
Defining the function f∗g = g ◦ f on Ω1 and get
Ω1
f

σ(f) µ
◦f−1

f∗g
Ω2 F2
f−1
OO
f∗µ g
◦f
OO
42
In summary, from a function that takes points ω ∈ Ω1 to points ω′ ∈ Ω2, we
can pull a σ-algebra on Ω2 back to a σ-algebra on Ω1, then push a measure on
Ω1 forward to a measure on Ω2, and pull an observable defined on Ω2 back to
an observable defined on Ω1.
Theorem 5.17 (Change of variable). For measurable functions f : Ω1 → Ω2
and g : Ω2 → R, we have∫
Ω2
g d(f∗µ) =
∫
Ω1
(f∗g) dµ,
provided one of the two is defined.
Remark 5.18 (u substitution). A familiar instance of this principle is when f
maps “x” to “u” and it becomes∫
A
g(f(x))f ′(x) dx =
∫
B
g(u) du,
where A = f−1(B) and du = f∗dx = f ′(x)dx in a sense that will be made
precise in §6. 4
Preview of proof. We assume that g only takes values on {0, 1} and leave the
complete statement for §5.4. So g = 1C for some C ∈ F2, and f∗g = 1D where
D = f−1(C) ∈ F1. Substituting these identities, we get∫
Ω2
g d(f∗µ) = (f∗µ)(C) = µ(f−1(C)) = µ(D) =
∫
Ω1
(f∗g) dµ.
We say that a property holds µ-almost everywhere, or for µ-almost every ω,
abbreviated by µ-a.e., if there is a set A ∈ F such that µ(Ac) = 0 and such that
this property holds for all ω ∈ A. In case µ is clear from the context, we may
simply say “a.e.” omitting µ. In case of a probability measure, it is customary
to say a.s. or almost surely instead.
Proposition 5.19 (a.e. equal functions). Let (Ω,F , µ) be a measure space
and f, g : Ω→ R extended Borel functions. If f = g µ-a.e., then∫
Ω
f dµ =
∫
Ω
g dµ,
provided one of the two is defined.
Proof. Set E := {ω : f(ω) = g(ω)} ∈ F . Define f1 = f1E and f2 = f1+∞·1Ec .
Then
0 6 f+1 6 f+ 6 f+2 and 0 6 f+1 6 g+ 6 f+2 .
43
Since
∫
Ω f2 dµ =
∫
Ω f1 dµ+∞ · µ(Ec) =
∫
Ω f1 dµ, by monotonicity we get∫
Ω
f+ dµ =
∫
Ω
g+ dµ.
By a similar argument,
∫
Ω f
− dµ =
∫
Ω g
− dµ. Assuming one of these two is
finite, we get
∫
Ω f dµ =
∫
Ω g dµ.
Remark 5.20. Most of measure theory is insensitive to whether two functions
differ on a set of zero measure. In upcoming sections and chapters, the reader
will see that two a.e. functions are basically the same function for (almost!) all
practical purposes. 4
Exercise 5.21. Let f be an extended Borel function on a measure space (Ω,F , µ)
such that µ(f 6= 0) = 0. Show that ∫Ω f dµ = 0. 4
5.2.2 Inequalities
The following inequality is fundamental for many arguments in Measure Theory
and Probability.
Theorem 5.22 (Chebyshev’s inequality). Let (Ω,F , µ) be a measure space
and f : Ω→ R an extended Borel function. Then, for all ε > 0 and α > 0,
µ({|f | > ε}) 6 1
εα
∫
Ω
|f |αdµ.
Proof. Let ε > 0 and α > 0. Then we have that,∫
Ω
|f |αdµ >
∫
{|f |>ε}
|f |αdµ >
∫
{|f |>ε}
εαdµ =
= εα
∫
Ω
1{|f |>ε}dµ = εαµ({|f | > ε}),
and by rearranging, the inequality follows.
Remark 5.23. In the setting of (Ω,F ,P) being a probability space and X : Ω→
R an extended random variable, ∀ε > 0 and α > 0,
P(|X| > ε) 6 1
εα
E|X|α,
which is sometimes referred to as Markov’s inequality. We can, in turn, bound
deviation from the mean using the variance of X by,
P(|X − EX| > ε) 6 1
ε2
VX. (5.24)
This is sometimes also referred to as Chebyshev’s inequality. 4
44
Exercise 5.25. Let X be a random variable such that, almost surely, a 6 X 6 b.
SupposeX is not almost-surely equal to a constant. Prove that a < EX < b. 4
For the next inequality, we recall the notion of a convex function. For an open
interval I ⊂ R, a function f : I → R is called convex if for every x, y ∈ I and
a, b ∈ [0, 1] with a+ b = 1, we have
f(ax+ by) 6 af(x) + bf(y).
This inequality means that any line segment joining two points in the graph of
f stay above the graph. In case f is twice differentiable, it is convex if and only
if f ′′(x) > 0 for every x ∈ I. Examples of convex functions are |x|, ex, ax + b,
x2 and [x]+ on R, as well as x−2, − log x, − arctan x and x−1 on (0,+∞).
Theorem 5.26 (Jensen’s inequality). Let X be an integrable random
variable taking values on an open interval I, and f : I → R a convex function.
Then,
f(EX) 6 Ef(X).
Proof. Since f is convex, by [Kle14, Corollary 7.8(ii)], for each x0 ∈ I, there is
c ∈ R such that
f(x) > f(x0) + c(x− x0), ∀x ∈ I.
Choosing x = X and x0 = EX, we get
f(X) > f(EX) + c · (X − EX), a.s.,
and taking expectations on both sides we get the stated inequality.8
Theorem 5.27 (Cauchy-Schwarz Inequality). If EX2 < ∞ and EY 2 < ∞,
then XY is integrable and
E[XY ] 6
√
EX2
√
EY 2.
Proof. We can assume EX2 = 1 and EY 2 = 1 (why?). The inequality follows
immediately from the trick
0 6 E(X − Y )2 = 2− 2E[XY ]. (5.28)
Example 5.29. Let (Ω,F ,P) be a probability space and X = 1A, Y = 1B
be Bernoulli random variables with parameter p, where A,B are events in F .
Further, assume that A and B are independent, i.e.
P(A ∩B) = P(A)P(B) = p2.
8Based on [Shi16, §2.6.7].
45
Then, we can see that
E[XY ] = E[1A1B ] = E[1A∩B ] = P(A ∩B) = P(A)P(B) = p2.
On the other hand, we have
√
EX2
√
EY 2 =
√
EX
√
EY = p,
and since p ∈ (0, 1) by construction, Cauchy-Schwarz Inequality is verified. 4
5.2.3 Integrability
Proposition 5.30. Suppose there is a countable set A ⊆ [0,+∞] such that
X only takes values in A. Then
EX =
∑
x∈A
xP(X = x).
Proof. Using σ-additivity of the expectation,
EX = E
[∑
x∈A
x · 1[X=x]
]
=
∑
x
xE1[X=x] =
∑
x
xP(X = x).
Example 5.31. If P(X = n) = λne−λn! for n ∈ N0, then
EX =
∞∑
n=0
n
λne−λ
n! =
∞∑
n=1
λne−λ
(n− 1)! =
= λe−λ
∞∑
n=1
λn−1
(n− 1)! = λe
−λ
∞∑
n=0
λn
n! = λe
−λeλ = λ,
giving the familiar formula for the expectation of a Poisson variable. 4
Proposition 5.32. If X only takes values on {0, 1, 2, 3, . . . ,+∞}, then
EX =
∑
n∈N
P(X > n).
Proof. Let A = {0, 1, 2, 3, . . . ,+∞}. Replacing n by a dull sum over k,
EX =
∑
n∈A
nP(X = n) =
∑
n∈A
∑
k∈N
1k6nP(X = n) =
=
∑
k∈N
∑
n∈A
1n>kP(X = n) =
∑
k∈N
∑
n>k
P(X = n) =
∑
k∈N
P(X > k),
where Theorem 1.13 was used to interchange sums.
46
Example 5.33. If P(X ∈ N0) = 1 and P(X > n) = (1 − p)n−1 for all n ∈ N,
where 0 < p < 1, then
EX =
∞∑
n=1
P (X > n) =
∞∑
n=1
(1− p)n−1 =
∞∑
j=0
(1− p)j = 11− (1− p) =
1
p
,
giving the familiar formula for the expectation of a Geometric variable. 4
Remark 5.34. The above propositions and their proof also work for measure
spaces with X replaced by f and E by
∫
Ω · · · dµ. 4
Proposition 5.35. A random variable X is integrable if and only if∑
n
P(|X| > n) <∞.
Above we do not specify whether n = 0 or n = ∞ are included in the sum,
because all that matters is whether the sum is finite.
Proof. Let Z = |X| and Y = bZc. Observe that
EY =
∑
n
P(Y > n) =
∑
n
P(Z > n).
On the other hand, Y 6 Z 6 Y +1. By additivity, we have E[Y +1] = (EY )+1.
By monotonicity, EY 6 EZ 6 (EY )+1. If one of these numbers is finite, all the
others are finite, and if one is infinite, all the others are infinite (why?). This
concludes the proof.
Exercise 5.36. Can you find which parts of the proof break down if we work on a
measure space instead of probability space? Can you find counter-examples? 4
5.2.4 Relation to the Riemann integral
We conclude this section with a comparison with the Riemann integral.
Theorem 5.37. Let f : [a, b] → R be a Borel function. If f is Riemann
integrable then it is Lebesgue integrable and the value of both integrals coincide.
Proof. Using the definition from [Coh13, §2.5], the Riemann integral of f on
[a, b] equals L ∈ R if, for every ε > 0, there exist step functions g and h such
that g 6 f 6 h and L − ε < ∫[a,b] g dx 6 ∫[a,b] hdx < L + ε. But this implies
that g and h (and hence f) are bounded, therefore
∫
[a,b] f dx is defined and
| ∫[a,b] f dx−L| < ε. Since this holds for every ε > 0, we get ∫[a,b] f dx = L.
47
Remark 5.38. The converse is false. A function can be Lebesgue integrable and
not Riemann integrable. In case f is unbounded, it is not Riemann integrable
even if it is continuous at almost every point.
Another popular example is f : [0, 1] → R given by f = 1Q. Since m(Q) = 0,
we have
∫
[0,1] fdx = 0 but this function is not Riemann integrable.
9 4
Remark 5.39. The reader may wonder what happens if f is Riemann integrable
but not Borel measurable. We will never bump into such a function unless
we are looking for it. In any case, if f : [a, b] → R is Riemann integrable
then it is measurable with respect to a σ-algebra larger than B, denoted F∗
in Lemma 3.11, it is Lebesgue integrable and again the value of both integrals
coincide. See [Coh13, 2.5.4] for a proof. 4
5.2.5 The fallacy of improper integrals
Some people like to say that there are functions which are Riemann integrable
but not Lebesgue integrable. Their infamous example is
f(x) = sin x
x
· 1[pi,+∞)(x),
usually followed by the pompous claim that it is Riemann integrable.
The reason why this function causes some commotion is that f is not Lebesgue
integrable (proof omitted) but nevertheless the limit
lim
z→+∞
∫
[pi,z]
f(x)dx
is finite (proof omitted). Some would infer form this that f is Riemann
integrable but not Lebesgue integrable, without realising that the above limit
is the same regardless of whether the integral on [pi, z] is Lebesgue or Riemann.
Anyway, there are some serious problems with the interpretation of this limit.
It is finite, so what? As far as Measure Theory and Probability Theory are
concerned, this limit has no meaning. It takes one particular sequence of sets An
satisfying An ↑ [pi,+∞). By picking another sequence with the same property,
one can make this limit give any number in R (proof omitted). There is no
physical interpretation of this limit in terms of signed areas, centre of mass, etc.
If this limit were to be understood as the value of the integral, Theorem 5.17
would not hold. If it were to correspond to the expectation of a random variable,
the law of large numbers would not hold either.
9Let us prove that it is not Riemann integrable. Consider any partition of [0, 1] into
finitely many non-degenerate intervals, and notice that all intervals will contain both rational
and irrational numbers. Thus, every step function g 6 f must satisfy g 6 0 a.e., and every
step function h > f must satisfy h > 1 a.e. Hence,
∫
gdx 6 0 and
∫
hdx > 1 and taking
ε = 13 we see that there is no number L such that L− ε <
∫
g dx 6
∫
hdx < L+ ε.
48
5.3 Convergence of integral and applications
We can think of the region under the graph of a non-negative extended real
function as having an area, volume or mass. If the function is f(x) = n·1(0, 1n ](x),
or g(x) = 1(n,n+1](x), this mass is always equals to 1 and, nevertheless, it
disappears when we let n → ∞. In both cases, we can say that the mass
“escaped to infinity.” In the former case it disappeared vertically and in the
latter case, horizontally. The three properties studied in this section explain
what can happen to this mass when we take limits.
5.3.1 The main theorems
The Monotone Convergence Theorem says that nothing strange can happen
with the mass of an increasing sequence. More precisely, it says that extra mass
cannot appear in the limit.
Theorem 5.40 (Monotone Convergence Theorem). Let (fn)n be a non-
decreasing sequence of non-negative extended Borel functions on (Ω,F , µ).
Then
lim
n→∞
∫
Ω
fn(ω)µ(dω) =
∫
Ω
( lim
n→∞ fn(ω))µ(dω).
An easier way to remember is: 0 6 fn ↑ f =⇒
∫
fn dµ ↑
∫
f dµ.
Proof. We included σ-additivity as an axiom instead of monotone convergence
for aesthetic reasons, so the proof is a bit dull. Write fn = g1 + · · ·+ gn. Then∑
n gn = f , and by σ-additivity we have∫
Ω
f dµ =
∫
Ω
(
∑
ngn) dµ =
∑
n
∫
Ω
gn dµ = lim
n
∫
Ω
fn dµ.
In §5.4 we will re-prove this theorem in order to prove σ-additivity.
From this theorem, we derive another fundamental property relating integrals
and limits. Fatou’s lemma says that, even though the previous examples show
that even though we can lose mass in a limit, we can never gain mass.
Theorem 5.41 (Fatou’s lemma). Let (Ω,F , µ) be a measure space and
(fn)n>1 a sequence of non-negative extended Borel functions. Then,∫
Ω
(lim inf
n→∞ fn(ω))µ(dω) 6 lim infn→∞
∫
Ω
fn(ω)µ(dω).
To remember the inequality, we consider functions fn = 1[n,∞).
49
Proof. Taking
gn(ω) = inf
k>n
fk(ω)
and defining
g(ω) = lim inf
n→∞ fn(ω),
we have that 0 6 gn ↑ g. By the Monotone Convergence Theorem,
lim
n→∞
∫
Ω
gn dµ =
∫
Ω
g dµ.
Since gn 6 fn, we get
lim inf
n→∞
∫
Ω
fn dµ > lim inf
n→∞
∫
Ω
gn dµ =
∫
Ω
g dµ =
∫
Ω
(lim inf
n→∞ fn) dµ,
concluding the proof.
The Dominated Convergence Theorem says that, if the graphs of a sequence fn
of functions are confined in a region of finite mass, then we cannot lose mass in
the limit either. The reason is that the graph of fn divides this region of finite
mass in two parts, and the fact that each of the two parts cannot gain mass in
the limit implies that the other one cannot lose it.
Theorem 5.42 (Dominated Convergence Theorem). Let (Ω,F , µ) be a
measure space and (fn)n>1 a sequence of extended Borel functions such that
fn(ω) → f(ω) as n → ∞ for a.e. ω. If there exists a non-negative extended
Borel function g that is µ-integrable, such that |fn| 6 g for all n > 1, then f
is µ-integrable, and
lim
n→∞
∫
Ω
fn(ω)µ(dω) =
∫
Ω
(
lim
n→∞ fn(ω)
)
µ(dω).
Proof. By changing fn, f, g on a set of measure zero, we can assume that
convergence holds for every ω. Notice that fn + g > 0 for all n > 1, so by
Fatou’s lemma, we get that∫
Ω
(f + g) dµ 6 lim inf
n→∞
∫
Ω
(fn + g) dµ
and hence ∫
Ω
f dµ 6 lim inf
n→∞
∫
Ω
fn dµ.
Similarly, we have that −fn + g > 0 for all n > 1, which gives∫
Ω
(−f + g) dµ 6 lim inf
n→∞
∫
Ω
(−fn + g) dµ
50
and hence ∫
Ω
f dµ > lim sup
n→∞
∫
Ω
fn dµ.
From these two inequalities, we get
lim
n→∞
∫
Ω
fn dµ =
∫
Ω
f dµ.
5.3.2 Some corollaries and applications
Corollary 5.43. If fn ↓ f > 0 a.e. and f1 is integrable, then
∫
Ω fn dµ →∫
Ω f dµ.
Proof. Exercise.
Corollary 5.44. Let (Xn)n be extended random variables such that
∑
n E|Xn|
converges. Then
∑
nXn converges a.s., and E[
∑
nXn] =
∑
n EXn.
Proof. Exercise.
Corollary 5.45 (Bounded Convergence Theorem). If Xn→X a.s. and there is
M ∈ R such that |Xn| ≤M a.s. for all n, then EXn → EX.
Proof. Apply the Dominated Convergence Theorem with constant g = M .
Corollary 5.46. Suppose 0 6 fn ↑ f , and there is M ∈ R such that
∫
Ω fn dµ <
M for all n. Then f(ω) <∞ for a.e. ω.
Proof. Exercise.
Exercise 5.47. Suppose 0 6 fn 6 g, and g is integrable. Show that
lim sup
n→∞
∫
Ω
fn(ω)µ(dω) 6
∫
Ω
(lim sup
n→∞
fn(ω))µ(dω). 4 (5.48)
Proposition 5.49 (Moment generating function). Let M(t) = EetX and
suppose there exists ε > 0 such that M(t) <∞ for −ε < t < ε. Then
M (k)(0) = EXk
for k = 0, 1, 2, 3, . . . .
Proof. We will show a stronger statement, namely that
dk
dtkMX(t) = E[X
ketX ]
for every t ∈ (−ε,+ε), by induction on k.
51
For k = 0 we have the identity MX(t) = EetX which holds trivially. Suppose
the identity holds for some k ∈ N. Write gx(t) = xketx and g′x(t) = xk+1etx.
Then
d
dtM
(k)
X (t) =
d
dtEgX(t) = limh→0E
gX(t+ h)− gX(t)
h
.
If we can commute the expectation and the limit, we get
M
(k+1)
X (t) = E lim
h→0
gX(t+ h)− gX(t)
h
= Eg′X(t) = E[Xk+1etX ].
Fix a sequence hn → 0. In order to apply the Dominated Convergence Theorem,
it is enough to bound the random term | gX(t+h)−gX(t)h | by an integrable extended
random variable. By the Mean Value Theorem,
gx(t+ h)− gx(t)
h
= g′x(θ),
where θ ∈ [t, t+ h] depends on x, t and h. Taking δ = ε−|t|3 , for |h| < δ we have
|g′x(θ)| ≤ |x|k+1e(ε−2δ)|x| 6 Ce(ε−δ)|x| for every x ∈ R,
where C depends on ε and t. It follows from the assumption that Ee(ε−δ)|X| <
∞, which concludes the proof.
5.4 Construction of the integral
In this last section we prove the existence and uniqueness of the operator f 7→∫
Ω f dµ defined on non-negative extended Borel functions f . In this process, we
will show that non-negative functions can be approximated by simple functions,
which allows us to bootstrap properties proved for binary functions (such as
Theorem 5.17) to general measurable functions.
5.4.1 Simple functions
We say that f is a simple function if it is measurable and only takes finitely
many values. In case f is also non-negative, we define its integral as∫
Ω
f dµ :=
∑
x
x · µ(f = x),
where “µ(f = x)” means µ({ω ∈ Ω : f(ω) = x}). Let a1, . . . , an denote the
distinct values attained by f , and A1, . . . , An ∈ F the sets where f takes these
values. Then the sets A1, . . . , An form a partition of Ω and
f =
n∑
j=1
aj1Aj .
52
This representation is unique except for permutation of the indices {1, . . . , n}.
In this representation, we have∫
Ω
f dµ =
n∑
j=1
ajµ(Aj) (5.50)
for every f non-negative and simple.
Lemma 5.51. This definition is unitary, monotone, and linear.
It is unitary by construction. Monotonicity follows directly from linearity.
Indeed, if f > g > 0 are simple, then h := f −g is non-negative and simple, and∫
Ω f dµ =
∫
Ω g dµ+
∫
Ω hdµ >
∫
Ω g dµ. So it remains to prove linearity.
Lemma 5.52. If A1, . . . , An ∈ F form a partition of Ω and f =
∑n
j=1 aj1Aj ,
then
∫
Ω f dµ =
∑n
j=1 ajµ(Aj), even if a1, . . . , an ∈ [0,+∞] are not distinct.
Idea of proof. Group the sets A corresponding to same value a.
Proof of linearity. Write f =
∑n
j=1 aj1Aj and g =
∑k
i=1 bi1Bi , where both
A1, . . . , An and B1, . . . , Bk form partitions of Ω. After expanding and grouping,
af + bg = · · · =
n∑
j=1
k∑
i=1
(aaj + bbi)1Aj∩Bi .
But the family {Aj ∩ Bi}j=1,...,n;i=1,...,k forms a partition of Ω, so using the
previous lemma we get∫
Ω
(af + bg) dµ =
n∑
j=1
k∑
i=1
(aaj + bbi)µ(Aj ∩Bi).
Splitting the terms in the sum and using that B1, . . . , Bk as well as A1, . . . , An
are partitions, one can eventually gets∫
Ω
(af + bg) dµ =
n∑
j=1
aajµ(Aj) +
k∑
i=1
bbiµ(Bi) = a
∫
Ω
f dµ+ b
∫
Ω
g dµ.
Remark 5.53. We really had no choice for how to define the integral of simple
functions, as (5.50) follows from the integral being unitary and linear. 4
5.4.2 Non-negative extended Borel functions
We define the integral of a non-negative extended Borel function f by∫
Ω
f dµ = sup
{∫
Ω g dµ : 0 6 g 6 f and g is simple
}
. (5.54)
As promised in §5.1, we will now prove the following.
53
Theorem 5.55. This definition is linear, unitary, monotone and σ-additive.
It is to be noted that for simple non-negative functions, this definition coincides
with the previous one, so it really is only extending it. In particular, this integral
is unitary by construction. Monotonicity follows from inclusion: the larger f ,
the more simple functions are allowed in the above supremum.
To prove linearity and σ-additivity, we will use monotone convergence:
0 6 fn ↑ f =⇒
∫
Ω
fn dµ→
∫
Ω
f dµ (5.56)
for every non-decreasing sequence of non-negative extended Borel functions.
This was already “proved” using σ-additivity, but in order to avoid a circular
argument we now prove it directly from the above definition.
Proof of (5.56). By monotonicity, the sequence
∫
Ω fn dµ is non-decreasing and
bounded by
∫
Ω f dµ. Hence, it converges, and
lim
n→∞
∫
Ω
fn dµ 6
∫
Ω
f dµ.
It remains to show the opposite inequality. Let 0 6 g 6 f be simple, taking
positive values a1, . . . , ak. Let 0 < α < 1 and An = {ω : fn(ω) > αg(ω)}. Since
f > g and 0 6 fn ↑ f , for each ω there is n such that fn(ω) > αg(ω), so An ↑ Ω
and∫
Ω
fn dµ >
∫
Ω
(fn1An) dµ >
∫
Ω
(αg1An) dµ =
= α
k∑
j=1
ajµ
({ω ∈ An : g(ω) = aj})→ α k∑
j=1
ajµ
(
g = aj
)
= α
∫
Ω
g dµ.
Hence,
lim
n→∞
∫
Ω
fn dµ > α
∫
Ω
g dµ.
Since this is true for every 0 < α < 1, we get
lim
n→∞
∫
Ω
fn dµ >
∫
Ω
g dµ.
Since this is true for every simple g such that 0 6 g 6 f ,
lim
n→∞
∫
Ω
fn dµ >
∫
Ω
f dµ.
54
g1(x)
g2(x)
g3(x)
g2(y)
x
x y
Figure 5.3: Graph of g2(y) and approximation gk(x) ↑ x for some fixed x.
5.4.3 Approximation by simple functions
In order to bootstrap properties already established for simple functions to
the case of non-negative functions, we start by showing that the latter can be
approximated by the former in a monotone way, and finally apply (5.56).
We define a sequence of functions that approximate every non-negative extended
number in a monotone way. For n ∈ N, let gn : [0,+∞]→ R+ be defined by
gn(x) = 2−n ·max
{
j ∈ {0, 1, . . . , 2nn} : 2−nj 6 x} ,
illustrated in Figure 5.3. Note that, each gn assumes finitely many values and,
for every x ∈ [0,+∞], gn(x) ↑ x as n→∞. For a non-negative extended Borel
function f defined on Ω, the Borel functions fn := gn ◦ f are simple and satisfy
fn ↑ f , see Figure 5.4.
Remark 5.57. The above construction shows that, for every non-negative
extended Borel function f , there exists a sequence of simple functions (fn)n
such that 0 6 fn ↑ f . 4
Remark 5.58. Combining the previous remark with the Monotone Convergence
Theorem and Remark 5.53, we see that any other definition of integral of
non-negative extended Borel functions that is unitary, monotone, linear and
σ-additive, would be equivalent to this one. In other words, the integral is
unique. 4
5.4.4 Proofs of the axioms and other general properties
Proof of linearity. Suppose f > 0 and h > 0 are extended Borel functions. For
homogeneity, given α > 0 and we get
∫
Ω(αf) dµ = α
∫
Ω f dµ directly from (5.54)
since we can multiply each g in the supremum by α. For additivity, consider
simple functions fn ↑ f and hn ↑ h. Using (5.56) three times and linearity for
the integral of simple functions,
55
X(ω)
g1(X(ω))
g2(X(ω))
ω
Figure 5.4: Approximation of an extended Borel function X by g1(X) e g2(X).
Notice the similarity with Figure 5.1.
∫
Ω
(f + g) dµ = lim
n→∞
∫
Ω
(fn + gn) dµ = lim
n→∞
(∫
Ω
fn dµ+
∫
Ω
gn dµ
)
=
= lim
n→∞
∫
Ω
fn dµ+ lim
n→∞
∫
Ω
gn dµ =
∫
Ω
f dµ+
∫
Ω
g dµ.
This concludes the proof.
Proof of σ-additivity. Let (hn)n be a sequence of non-negative extended Borel
functions. Define fn = h1 + · · ·+ hn. Then fn ↑ f :=
∑
j hj , and by (5.56) we
have
∞∑
j=1
∫
Ω
hj dµ = lim
n→∞
n∑
j=1
∫
Ω
hj dµ = lim
n→∞
∫
Ω
n∑
j=1
hj dµ =
= lim
n→∞
∫
Ω
fn dµ =
∫
Ω
f dµ =
∫
Ω
∞∑
j=1
hj dµ,
concluding the proof.
Proof of (5.14). We have already proved this identity when f is an indicator
function. By linearity, it also holds when f is a non-negative simple function.
By (5.56), it holds for any non-negative extended Borel function f . Finally, if
f is an extended Borel function,∫
Ω
(f1B) dµ =
∫
Ω
(f+1B) dµ−
∫
Ω
(f−1B) dµ =
=
∫
Ω
f+ d(µ|B )−
∫
Ω
f− d(µ|B ) =
∫
Ω
f d(µ|B ),
and when one of the two is defined, these expressions do not entail a “∞−∞.”
56
Proof of Theorem 5.17. We have already proved this identity when g is an
indicator function. Just like in the previous proof, by linearity and (5.56),
the identity holds for any non-negative extended Borel function g. Assuming g
is an extended Borel function,∫
Ω2
g d(f∗µ) =
∫
Ω2
g+ d(f∗µ)−
∫
Ω2
g− d(f∗µ) =
=
∫
Ω1
(f∗g)+ dµ−
∫
Ω1
(f∗g)− dµ =
∫
Ω1
(f∗g) dµ,
and when one of the two is defined, these expressions do not entail a “∞−∞.”
We finally give the belated proof of Theorem 5.4 using Theorem 5.55.
Proof. We begin with monotonicity. Suppose f > g and
∫
Ω f
+ dµ < ∞. Then
g+ 6 f+ and f− 6 g−, hence
∫
Ω g
+ dµ 6
∫
Ω f
+ dµ < ∞ and − ∫Ω g− dµ 6− ∫Ω f− dµ. Adding the two last inequalities we get ∫Ω g dµ 6 ∫Ω f dµ without
incurring any “∞−∞” operation. The case ∫ g− dµ <∞ is identical.
We now move to homogeneity. Suppose both integrals are defined. Consider
the case −∞ 6 α 6 0 first. Then∫
Ω
(αf) dµ =
∫
Ω
(−αf−) dµ−
∫
Ω
(−αf+) dµ =
= (−α)
∫
Ω
f− dµ− (−α)
∫
Ω
f+ dµ =
= α
(∫
Ω
f+ dµ−
∫
Ω
f− dµ
)
= α
∫
Ω
f dµ.
These expressions cannot contain “∞−∞” because both integrals are defined.
The case α > 0 is analogous.
We conclude with additivity. Note that
(f + g)+ − (f + g)− = f + g = f+ − f− + g+ − g−,
whence
(f + g)+ + f− + g− = (f + g)− + f+ + g+.
These are all non-negative extended Borel functions. By additivity in this case,∫
Ω
(f + g)+ dµ+
∫
Ω
f− dµ+
∫
Ω
g− dµ =
∫
Ω
(f + g)− dµ+
∫
Ω
f+ dµ+
∫
Ω
g+ dµ.
We are supposing that
∫
Ω f dµ+
∫
Ω g dµ is defined, which implies that
∫
Ω f
− dµ+∫
Ω g
− dµ <∞ or ∫Ω f+ dµ+ ∫Ω g+ dµ <∞. Assume without loss of generality
the former case. Since (f + g)− 6 f− + g−, we have
∫
Ω(f + g)
− dµ < ∞, too,
so we can subtract these three terms from both sides, getting∫
Ω
(f + g)+ dµ−
∫
Ω
(f + g)− dµ =
∫
Ω
f+ dµ−
∫
Ω
f− dµ+
∫
Ω
g+ dµ−
∫
Ω
g− dµ.
57
This concludes the proof.
58
6 Density and Radon-Nikodým Theorem
There are many situations where we want to define or study a measure in terms
of another. Some of the most common uses is to tilt a probability measure
and obtain a new one, to get probability distributions on R by specifying some
density, and to define conditional expectations.
We explore these three cases, while keeping in mind that these are rather general
concepts and tools. In particular, Proposition 6.18 we prove the good old
formula for the expectation of an absolutely continuous random variable.
6.1 Density of measures and continuous variables
In this section we explore the concept of density and how it applies to absolutely
continuous random variables.
6.1.1 Measures defined by a density
Given a measure space and a non-negative extended Borel function, it is possible
to construct a new measure via the Lebesgue integral.
Proposition 6.1. Let (Ω,F , µ) be a measure space and f a non-negative
extended Borel function. Then,
ν(A) :=
∫
A
f dµ =
∫
Ω
f1A dµ, ∀A ∈ F ,
defines a measure on (Ω,F).
Exercise 6.2. Prove the above proposition. 4
Example 6.3. Consider the measure space (R,B,m) and the non-negative Borel
function g(x) = λe−λx1[0,∞)(x) for x ∈ R, where λ > 0. Then,
Pλ(B) :=
∫
B
g dm =
∫
R
g1B dm, ∀B ∈ B,
defines a measure on (R,B), by Proposition 6.1. In fact, Pλ is a probability
measure, since
Pλ(R) =
∫
R
λe−λx1[0,∞)(x)m(dx) =
∫ ∞
0
λe−λx dx = 1. 4 (6.4)
A probability mass function assigns mass to elements of a discrete set, which can
be seen as changing the weights assigned by the counting measure. In the above
example we changed the weights assigned by the Lebesgue measure instead.
59
Definition 6.5 (Radon-Nikodým derivative). Let (Ω,F , µ) be a measure
space and f : Ω→ [0,+∞] a non-negative extended Borel function. If ν is a
measure on (Ω,F) such that
ν(A) :=
∫
A
f dµ, ∀A ∈ F , (6.6)
we say that f is the Radon-Nikodým derivative (or density) of ν with respect
to µ, and we write f = dνdµ .
Remark 6.7. Given two measures µ, ν on (Ω,F), we say that dνdµ exists if there
is a non-negative extended Borel function f such that (6.6) holds. 4
Remark 6.8. We say the Radon-Nikodým derivative because it is unique, in the
following sense. Given two measures µ and ν, and two non-negative extended
Borel functions f and g such that (6.6) holds, then f = g for µ-a.e. ω. 4
Remark 6.9. Given a measure µ and a non-negative extended Borel function f ,
by Proposition 6.1 we can define a unique ν such that dνdµ = f for µ-a.e. ω. 4
Example 6.10. Let Ω = {1, 2, 3, 4, 5, 6}, F = P(Ω) and P(A) = #A6 . Take
the function g(n) = 27n. Then by Proposition 6.1 the function P given by
P(A) =
∫
Ω(g1A)dP defines another measure on (Ω,F). 4
Exercise 6.11. Show that the above P is a probability measure. 4
Exercise 6.12. Prove the statement in Remark 6.8. 4
Proposition 6.13 (Chain rule of the Radon-Nikodým derivative). Let ν, µ
be measures on a given measurable space (Ω,F) such that dνdµ exists, and let
g : Ω→ [0,+∞] be a non-negative extended Borel function. Then,∫
Ω
g dν =
∫
Ω
g dνdµ dµ.
Proof. First, suppose that g = 1A for some A ∈ F . In this case, we have∫
Ω
1A dν = ν(A) =
∫
A
dν
dµ dµ =
∫
Ω
1A
dν
dµ dµ
as claimed. By linearity, it also holds when g is a non-negative simple function.
By (5.56), it holds for any non-negative extended Borel function g.
6.1.2 Absolutely continuous random variables
We now give the proper definition of a very familiar object.
60
Definition 6.14 (Absolutely continuous variables). Given a random variable
X on a probability space (Ω,F ,P), we say that X is an absolutely continuous
random variable if dPXdx exists, where dx denotes the Lebesgue measure on
(R,B). The Radon-Nikodým derivative dPXdx is called the probability density
function of X and usually denoted fX : R→ R. Note that fX is determined
by X and P but the latter is omitted in the notation.
The probability measure of Example 6.3 is known the Exponential distribution
with parameter λ, and g is called the density of this distribution.
Example 6.15. The unique probability measure P on (R,B) such that
dPX
dx (x) =
e−x
2/2
√
2pi
is called the standard normal distribution on R. 4
Example 6.16. The unique probability measure P on (R,B) such that
dPX
dx (x) = 1[0,1](x)
is called the uniform distribution on [0, 1]. 4
Remark 6.17. The density of an absolutely continuous random variable is unique
up to almost-everywhere equality. For instance, 1(0,1)(x) is also the density of
the uniform distribution, as it differs from 1[0,1](x) on a set of zero Lebesgue
measure. 4
Proposition 6.18 (Expectation). If X is an absolutely continuous random
variable, then, for every non-negative extended Borel function g, we have
Eg(X) =
∫
R
g(x)fX(x) dx.
Proof. First, notice that Theorem 5.17 gives
Eg(X) =
∫
Ω
g(X(ω))P(dω) =
∫
R
g(x)PX(dx) =
∫
R
g dPX
Now, since X is absolutely continuous, we can apply Proposition 6.13 to get∫
R
g dPX =
∫
R
g dPXdx dx =
∫
R
g(x)fX(x) dx,
which concludes the proof.
Example 6.19. If X ∼ Exp(λ), then EX = 1λ . Indeed, the density of PX with
respect to the Lebesgue measure is λe−λx1[0,∞)(x). By the above proposition,
we have EX =
∫
R xλe
−λx1[0,∞)(x) dx = · · · = 1λ . 4
61
6.1.3 Tilting a distribution
Below we see how to change the distribution of a random variable without
changing the variable itself.
Example 6.20 (Tilting a probability measure). Let (Ω,F ,P) be a probability
space and X ∼ N (0, 1), i.e. X is a standard Normal random variable with
probability distribution, that is,
P[X ∈ A] =
∫
A
1√
2pi
e−
x2
2 dx, ∀A ∈ B(R).
Fix some θ ∈ R. We want to define another measure P on (Ω,F), such that
X ∼ N (θ, 1), under P.
Take dPdP (ω) =
eθX(ω)
e
θ2
2
and by Proposition 6.13, we get
E[et(X−θ)] = E[et(X−θ)eθX− θ
2
2 ]
= e−θt− θ
2
2 E[e(t+θ)X ]
= e−θt− θ
2
2 e
(t+θ)2
2 = e t
2
2 , t ∈ R.
First, taking t = 0 we see that P(Ω) = E[1] = E[e0] = 1, so P is a indeed
a probability measure. Also, by uniqueness of moment generating functions,
X − θ ∼ N (0, 1) under P. 4
Remark 6.21. The fact that P is a probability measure comes from our careful
choice of Radon-Nikodým derivative where the denominator equals the P-
expectation of the numerator. In general, if g is a non-negative extended Borel
function and Eg(X) <∞, then we can define dPdP = g(X)Eg(X) , and the distribution
of X under P equals the distribution of X under P biased by the function g. The
resulting measure is a probability measure because, taking A = Ω and f = g(X)
in (6.6) will give
P(Ω) =
∫
Ω
fdP = 1
Eg(X)
∫
Ω
g(X)dP = 1. 4
Example 6.22 (Size-biasing). Let (Ω,F ,P) be a probability space. Suppose
X ∼ Poisson(µ). Defining dPdP = Xµ , the measure P is a probability measure
and the distribution of X − 1 under P is Poisson with the same parameter. 4
Example 6.23 (Size-biasing). Let (Ω,F ,P) be a probability space. Suppose
Y ∼ Exp(λ). Defining dPdP = λY , the measure P is a probability measure and
the distribution of Y under P is Γ(2, λ). 4
Example 6.24 (Size-biasing). In Example 6.10, we have the experiment of
rolling a die, where P corresponds to sampling one of the six faces with equal
probability. The measure P corresponds to sampling one of the 21 dots with
equal probability, and observing the number of dots in its face. 4
62
6.2 The Radon-Nikodým Theorem
In this section we study when a measure ν can be expressed in terms of another
measure µ, deformed by a variable factor dνdµ that we call Radon-Nikodým
derivative.
6.2.1 Absolute continuity and the Radon-Nikodým Theorem
The following is a useful property for comparing measures.
Definition 6.25 (Absolute continuity between measures). Let ν, µ be
measures on a given measurable space (Ω,F). We say that ν is absolutely
continuous with respect to µ if, for all A ∈ F
µ(A) = 0 =⇒ ν(A) = 0.
We denote this relation by ν µ.
Example 6.26. Consider the measurable space (N,P(N)), and recall the Dirac
measure δk at a point k ∈ N, defined in Example 2.26. Denote by µC the
counting measure on N, as defined in Example 2.28. If a set A ⊆ Z has µC
measure 0, then it has to have δk measure 0. Hence, δk is absolutely continuous
with respect to µC , i.e. δk µC . 4
Exercise 6.27. In the above example, can you think of a function f : N→ [0,∞]
that would be the Radon-Nikodým derivative dδkdµC of δk with respect to µC? 4
Example 6.28. Let µC be the counting measure on (R,B), so that:
• If µC(A) =∞, then A is an infinite Borel set.
• If µC(A) = N <∞, then A = {a1, ..., aN} for some a1, ..., aN ∈ R.
• If µC(A) = 0, then A = ∅.
From the above cases, if µC(A) = 0, then m(A) = 0, where m denotes the
Lebesgue measure. Therefore, m µC . 4
Exercise 6.29. In the above example, show that the converse is not true, that
is, µC is not absolutely continuous with respect to m. 4
Let µ and ν be measures on (Ω,F) and suppose the Radon-Nikodým derivative
f of ν with respect to µ exists, that is, ν(A) =
∫
A
f dµ for ever A ∈ F . By
Exercise 5.21 we have ν µ. We wonder whether the converse implication
holds, that is, if ν µ, does dνdµ exist?
Theorem 6.30 (The Radon-Nikodým Theorem). Let ν and µ be σ-finite
measures on a given measurable space (Ω,F). Then, ν µ if and only if
63
there exists a non-negative extended Borel function f , such that
ν(A) =
∫
A
f dµ, ∀A ∈ F .
Proof. See [Coh13, 4.2.2].
We say that a random variable X is absolutely continuous if its distribution PX
assigns measure zero to Borel sets of Lebesgue measure zero, that is PX m.
From the above theorem, we see that any absolutely continuous random variable
X must have a density!
6.2.2 Continuous random variables and densities
We say that a random variableX is continuous if its distribution assigns measure
zero to every fixed number x ∈ R. Since finite sets have Lebesgue measure zero,
every absolutely continuous variable is also continuous (showing that the adverb
“absolutely” has been used properly, unlike outer measures and finitely additive
measures which are not measures).
One may wonder about what could be a random variable which is continuous
but not absolutely continuous: in fact it is possible to construct such variables,
and their distributions assign positive probability to a set which is uncountable
but has zero Lebesgue measure.
If we are given a non-negative function f and we can check that
PX
(
(−∞, x]) = ∫ x
−∞
f(s) ds
for all x ∈ R, then we can conclude that X is absolutely continuous and
has density f . Indeed, the measure P(B) =
∫
B
f dx coincides with PX on
the pi-system {(−∞, x]}x∈R which generates B, and the claim follows from
Theorem 2.43.
The converse is more delicate, and there is a theory that decomposes PX into
absolutely continuous and singularly continuous parts. We will bypass the
theory by taking a more modest approach that is very useful in practice.
Suppose we are given a distribution function F and want to test if it is absolutely
continuous. We take f to be its almost-everywhere derivative, and check whether
F (x) =
∫ x
−∞
f(s) ds
for every x ∈ R. If this is the case, we can conclude that f is the density of F
with respect to the Lebesgue measure.
Exercise 6.31. Suppose U has the uniform distribution on [0, 1] and we take
X = eU . Find the density of X. (Answer: x−11[1,e](x).) 4
64
6.2.3 Conditional expectation
The Radon-Nikodým Theorem is a fundamental theorem in Measure Theory and
Probability Theory. We now show one of the its many non-trivial applications.
Definition 6.32 (Conditional expectation). Suppose (Ω,F ,P) is a
probability space, X a an extended random variable, and G ⊆ F another
σ-algebra. We say that an extended random variable Z is the conditional
expectation of X given G if
Z : Ω→ R is G-measurable (6.33)
and ∫
A
Z dP =
∫
A
X dP for all A ∈ G. (6.34)
The above identity means whenever one of the integrals is defined, the other
one is also defined and they coincide.
Exercise 6.35. Suppose that two extended random variables Z and W
satisfy (6.33) and (6.34). Prove that P(Z = W ) = 1. 4
Remark 6.36 (Uniqueness). The above definition says the conditional
expectation because, if there is such a function, it is unique in the a.s. sense.
Conversely, if Z satisfies (6.34) andW = Z a.s., thenW also satisfies (6.34). 4
Theorem 6.37. In the setup of Definition 6.32, if X is a.s. finite and EX is
defined, then the conditional expectation of X given G is defined.
Proof (non-negative integrable case). Suppose that X is integrable, and define
the measure ν on (Ω,G) by
ν(A) =
∫
A
X dP
for A ∈ G. Then ν(Ω) = EX < ∞, so ν is σ-finite. By the Radon-Nikodým
Theorem, dνdP exists as a non-negative extended Borel function on (Ω,G). Define
Z := dνdP and note that it satisfies both (6.33) and (6.34).
Proof (non-negative case). Now suppose that X is a non-negative finite random
variable, not necessarily integrable. Then X can be written as X =
∑
nXn
where each Xn is an integrable random variable. From the previous case, each
Xn has a conditional expectation, which we denote Zn.
We take Z :=
∑
n Zn. Note that Z is also non-negative and G-measurable. By
the previous case and σ-additivity of the integral, for every A ∈ G,∫
A
Z dP =
∑
n
∫
A
Zn dP =
∑
n
∫
A
Xn dP =
∫
A
X dP,
65
which concludes the proof.
Proof (general case). We can assume that EX− <∞. From the previous case,
there exist two G-measurable functions Z± such that ∫
A
Z± dP =
∫
A
X± dP for
all A ∈ G. In particular, ∫Ω Z− dP = ∫ΩX− dP < ∞, so by Z− is a.s. finite,
and we can assume that Z−(ω) < ∞ for every ω ∈ Ω. Define Z = Z+ − Z−,
which is well-defined because Z− is always finite. Then for all A ∈ G, we have∫
A
Z dP =
∫
A
Z+ dP−
∫
A
Z− dP =
∫
A
X+ dP−
∫
A
X− dP =
∫
A
X dP
which are well-defined because Z− and X− are integrable.
66
7 Product measures and independence
Product measures are a fundamental tool in Measure Theory. Among its many
applications, this topic is useful for computing high-dimensional integrals one
variable at at time, computing the expectation of an integral as the integral
of the expected value, and to construct new probability spaces from old ones
by running two experiments (or even two copies of the same experiment)
independently.
The proofs in the first three sections are based on [Coh13, §5].
7.1 Product σ-algebras and their properties
Let (Ω1,F1) and (Ω2,F2) be measurable spaces, and let Ω1×Ω2 be the Cartesian
product of Ω1 and Ω2. We say that a subset of Ω1 × Ω2 is a rectangle with
measurable sides if it is of the form A×B, for some A ∈ F1 and B ∈ F2.
Definition 7.1 (Product of σ-algebras). We define the product of the σ-
algebras F1 and F2, denoted as F1 ⊗ F2, as the σ-algebra generated by the
class
F1 ×F2 = {A×B ⊆ Ω1 × Ω2 : A ∈ F1, B ∈ F2}
on the sample space Ω1 × Ω2.
In other words, F1 ⊗ F2 is the σ-algebra generated by the collection of all
rectangles with measurable sides.
It is important to note that F1 ⊗F2 is much larger than the Cartesian product
F1 ×F2 of F1 and F2. Let us discuss the case of R2.
Example 7.2. Consider the sample space R2 = R × R. Then the product of
Borel σ-algebras B(R)⊗ B(R) equals the Borel σ-algebra B(R2). 4
Proof. [Not examinable] First, we show B(R2) ⊆ B(R)⊗ B(R). Let
E := {(a, b] ∩ R : −∞ 6 a 6 b 6 +∞},
E2 := E × E , and recall that B(R2) = σ(E2). Since E2 ⊆ B(R)× B(R), we have
B(R2) ⊆ σ(B(R)× B(R)) = B(R)⊗ B(R).
We now show the other direction. Note that
{A×B : A ∈ σ(E), B = R} = σ({A×B : A ∈ E , B = R})
as the set B is being pinned to B = R plays no role in any of the theoretic
operations involved. Likewise,
{A×B : A = R, B ∈ σ(E)} = σ({A×B : A = R, B ∈ E}).
67
In particular, for A,B ∈ B(R) = σ(E), we have
A×B = (A× R) ∩ (R×B) ∈ σ(E × E).
Thus, σ(E)× σ(E) ⊆ σ(E × E) and hence
B(R)× B(R) = σ(E)× σ(E) ⊆ σ(E × E) = σ(E2) = B(R2),
Therefore, B(R)⊗ B(R) = σ(B(R)× B(R)) ⊆ B(R2).
Remark 7.3. It is not true that B(R2) is equal to B(R) × B(R). To see this,
notice that the set
{(x, y) : 0 6 x2 + y2 < 1}
lies in B(R2), but not in B(R)× B(R). 4
Given a subset or a function on Ω1 × Ω2, we need some notation involving the
separate parts on Ω1 and Ω2.
Definition 7.4 (Sections of subsets and functions on Ω1×Ω2). Let Ω1×Ω2 be
the Cartesian product of given sample spaces Ω1 and Ω2, E a subset of Ω1×Ω2
and f a function on Ω1 × Ω2. Fix x ∈ Ω1 and y ∈ Ω2.
(i) The sections Ex and Ey of E are the subsets of Ω2 and Ω1, respectively,
given by
Ex = {y˜ ∈ Ω2 : (x, y˜) ∈ E} and Ey = {x˜ ∈ Ω1 : (x˜, y) ∈ E}.
(ii) The sections fx and fy of f are the functions on Ω2 and Ω1, respectively,
given by
fx(y˜) = f(x, y˜) and fy(x˜) = f(x˜, y).
Naturally, we want to study subsets and functions that are measurable with
respect to a product of σ-algebras.
Lemma 7.5. Let (Ω1,F1) and (Ω2,F2) be measurable spaces.
(i) If E is F1⊗F2-measurable, then for each x ∈ Ω1 and y ∈ Ω2, the sections
Ex and Ey are F2-measurable and F1-measurable, respectively.
(ii) If f : Ω1 × Ω2 → R is F1 ⊗ F2-measurable, then for each x ∈ Ω1 and
y ∈ Ω2, the sections fx and fy are F2-measurable and F1-measurable,
respectively.
Proof. [Not examinable] For the first statement, we show that each section Ex
is F2-measurable. A symmetric reasoning can be used to show that each section
Ey is F1-measurable. Fix x ∈ Ω1, and define the collection G by
G = {E ⊆ Ω1 × Ω2 : Ex ∈ F2}.
By construction, G contains F1 × F2. It remains to show that G is a σ-algebra
on Ω1 × Ω2, so that it contains σ(F1 ×F2) = F1 ⊗F2. Indeed, we have
68
(i) Ω1 × Ω2 ∈ G, since (Ω1 × Ω2)x = Ω2 ∈ F2.
(ii) Let E ∈ G. Since Ex ∈ F2, then (Ec)x = (Ex)c ∈ F2, so that Ec ∈ G.
(iii) Let E1, E2, · · · ∈ G. Then, (Ej)x ∈ F2 for each j ∈ N. This in turn gives( ∞⋃
i=1
Ei
)
x
=
∞⋃
i=1
(Ei)x ∈ F2,
so that
⋃∞
i=1Ei ∈ G.
where we use the identities (Ec)x = (Ex)c and (
⋃∞
i=1Ei)x =
⋃∞
i=1(Ei)x.
We now prove the second statement. Again, we only prove that each fx is F2-
measurable. Note that (fx)−1(D) = (f−1(D))x for any D ∈ B(R). Combined
with the first statement, this implies that (fx)−1(D) ∈ F2 for every D ∈ B,
concluding the proof.
7.2 Product measure
We now establish existence and uniqueness of the product measure defined by
σ-finite measure spaces. We start with a basic lemma.
Lemma 7.6. Let (Ω1,F1, µ) and (Ω2,F2, ν) be two σ-finite measure spaces. If
E is F1 ⊗ F2-measurable, then the functions x 7→ ν(Ex) and y 7→ µ(Ey) are
F1-measurable and F2-measurable, respectively.
Proof. [Not examinable] We show that x 7→ ν(Ex) is F1-measurable, the other
statement is analogous.
First, suppose that the measure ν is finite. Define the collection G by
G = {E ∈ F1 ⊗F2 : x 7→ ν(Ex) is F1-measurable}.
We will show that G = F1 ⊗F2.
Note from Lemma 7.5 (i) that, given E ∈ F1 ⊗ F2, each section Ex lies in F2,
so that the function x 7→ ν(Ex) is well-defined. If A ∈ F1 and B ∈ F2, we have
ν((A×B)x) = ν(B)1A(x), (7.7)
which implies that G contains F1 × F2. Note that F1 × F2 defines a pi-system
on Ω1 × Ω2. We now show that G is a λ-system. Indeed, we have
(i) Ω1 × Ω2 ∈ G.
(ii) For E ∈ G, we can write
ν((Ec)x) = ν((Ex)c) = ν(Ω2)− ν(Ex),
and so Ec ∈ G.
69
(iii) Let E1, E2, · · · ∈ G, where Ei ∩ Ej = ∅ for i 6= j. Then, by countable
additivity of ν, we get
ν
(( ∞⋃
i=1
Ei
)
x
)
= ν
( ∞⋃
i=1
(Ei)x
)
=
∞∑
i=1
ν((Ei)x),
so that
⋃∞
i=1Ei ∈ G.
Therefore, the pi-λ Theorem gives F1 ⊗ F2 = σ(F1 × F2) ⊆ G ⊆ F1 ⊗ F2,
concluding the proof.
Now, suppose that ν is σ-finite, and let {Di}∞i=1 be a sequence of disjoint events
in F2 with ν(Dj) <∞, ∀j ∈ N , and
⋃∞
i=1Di = Ω2. Define the finite measures
ν1, ν2, . . . by
νk(B) = ν(B ∩Dk), k ∈ N.
Given E ∈ F1 ⊗F2, each x 7→ νk(Ex) is F1-measurable, hence
ν(Ex) =
∞∑
i=1
ν(B ∩Di) =
∞∑
i=1
νi(Ex),
is also F1-measurable.
Theorem 7.8 (Product measure). Let (Ω1,F1, µ) and (Ω2,F2, ν) be two σ-
finite measure spaces. Then, there is a unique measure µ⊗ν on the σ-algebra
F1 ⊗F2 such that
(µ⊗ ν)(A×B) = µ(A)ν(B), ∀A ∈ F1, B ∈ F2.
It is given by
(µ⊗ ν)(E) =
∫
Ω1
ν(Ex)µ(dx) =
∫
Ω2
µ(Ey)ν(dy), ∀E ∈ F1 ⊗F2.
The measure µ⊗ ν is called the product of µ and ν.
Proof. [Not examinable] We start with uniqueness of the measure µ⊗ ν. Since
F1 × F2 is a pi-system that generates F1 ⊗ F2, we can apply Theorem 2.43
directly to get uniqueness. Moreover, to get existence as well as the second
formula, it is enough to show that each of those two integrals yield a measure
which satisfies the first one.
We know from Lemma 7.6 that, given E ∈ F1 ⊗ F2, the functions x 7→ ν(Ex)
and y 7→ µ(Ey) are F1-measurable and F2-measurable, respectively.
So we can define the set functions (µ ⊗ ν)1 : F1 ⊗ F2 → [0,∞] and (µ ⊗ ν)2 :
F1 ⊗F2 → [0,∞] by
(µ⊗ ν)1(E) =
∫
Ω1
ν(Ex)µ(dx) and (µ⊗ ν)2(E) =
∫
Ω2
µ(Ey)ν(dy).
70
Let us check that (µ⊗ ν)1 is a measure:
(i) (µ⊗ ν)1(∅) =
∫
Ω1 ν(∅)µ(dx) = 0.
(ii) Let E1, E2, · · · ∈ F1 ⊗ F2, where Ei ∩ Ej = ∅ for i 6= j. By countable
additivity of ν, we have
(µ⊗ ν)1(
∞⋃
i=1
Ei) =
∫
Ω1
ν((
∞⋃
i=1
Ei)x)µ(dx) =
∫
Ω1
ν(
∞⋃
i=1
(Ei)x)µ(dx) =
=
∞∑
i=1
∫
Ω1
ν((Ei)x)µ(dx) =
∞∑
i=1
(µ⊗ ν)1(Ei).
By analogous argument, (µ⊗ ν)2 is also a measure.
Moreover, for every A×B ∈ F1 ×F2, by (7.7) we have
(µ⊗ ν)1(A×B) = µ(A)ν(B),
and analogously we can prove (µ ⊗ ν)2(A × B) = µ(A)ν(B). As noted in the
first paragraph, this concludes the proof.
Example 7.9. We return to the case of R2 = R×R. We know from Example 7.2
that B(R2) = B(R)⊗B(R). Let m1 and m2 denote the Lebesgue measure on R
and R2, respectively. For each rectangle of the form
(a1, b1]× (a2, b2], −∞ < ai 6 bi < +∞, i = 1, 2,
its measure is given by
m2((a1, b1]× (a2, b2]) = (b1 − a1)(b2 − a2) =
= m1((a1, b1])m1((a2, b2]) = (m1 ⊗m1)((a1, b1]× (a2, b2]). (7.10)
Therefore, we can deduce that
m2 = m1 ⊗m1 on B(R2).
This is precisely the motivation for the construction of the Lebesgue measure
m2 on R2. 4
7.3 Fubini and Tonelli Theorems
7.3.1 Fubini and Tonelli Theorems
Having constructed product measures, we move on to the fundamental theorems
that allow us to integrate a measurable function via iterated integrals, which we
can compute.
71
Theorem 7.11 (Tonelli Theorem). Let (Ω1,F1, µ) and (Ω2,F2, ν) be two σ-
finite measure spaces, and let f : Ω1×Ω2 → [0,∞] be a non-negative F1⊗F2-
measurable function. Then, the function x 7→ ∫Ω2 fx dν is F1-measurable, the
function y 7→ ∫Ω1 fy dµ is F2-measurable, and∫
Ω1×Ω2
f d(µ⊗ ν) =
∫
Ω1
(∫
Ω2
fx dν
)
µ(dx) =
∫
Ω2
(∫
Ω1
fy dµ
)
ν(dy).
Proof. We show this for the case of f = 1E , for some E ∈ F1⊗F2, as linearity of
the integral extends this to any non-negative simple function, and applying the
Monotone Convergence Theorem generalises the theorem to any non-negative
measurable function.
Fix x ∈ Ω1 and y ∈ Ω2, and notice that
fx = (1E)x = 1Ex and fy = (1E)y = 1Ey .
Thus, we get ∫
Ω2
fx dν = ν(Ex) and
∫
Ω1
fy dµ = µ(Ey).
Note that measurability of the functions x 7→ ∫Ω2 fx dν and y 7→ ∫Ω1 fy dµ
follows from Lemma 7.6. Equality of the three integrals in the statement follows
from Theorem 7.8.
Theorem 7.12 (Fubini Theorem). Let (Ω1,F1, µ) and (Ω2,F2, ν) be two
σ-finite measure spaces, and let f : Ω1 × Ω2 → R be an F1 ⊗ F2-measurable
and µ⊗ν-integrable function. Then, the section fx is ν-integrable µ-a.e. and
the section fy is µ-integrable ν-a.e. In addition,∫
Ω1×Ω2
f d(µ⊗ ν) =
∫
Ω1
(∫
Ω2
fx dν
)
µ(dx) =
∫
Ω2
(∫
Ω1
fy dµ
)
ν(dy).
In the above iterated integrals, we can take the integrand to be zero on the
(zero-measure) set where the inner integral is undefined.
Proof. Let f+ and f− denote the positive and negative part of f , respectively.
From Lemma 7.5 (ii), we know that each fx is F2-measurable, and so is each
(f+)x and (f−)x. Theorem 7.11 and integrability of f (i.e. integrability of f+
and f−) imply that the functions x 7→ ∫Ω2(f+)x dν and x 7→ ∫Ω2(f−)x dν are
F1-measurable and µ-integrable, and so finite µ-a.e. Thus, x 7→
∫
Ω2 fx dν is
finite µ-a.e., i.e. fx is ν-integrable µ-a.e. Now, consider the set N given by
N = {x ∈ Ω1 :
∫
Ω2
|fx|dν =∞}.
72
As argued above, N ∈ F1 and µ(N) = 0. By Theorem 7.11 and linearity of the
integral, we get∫
Ω1×Ω2
f d(µ⊗ ν) =
∫
Ω1×Ω2
f+ d(µ⊗ ν)−
∫
Ω1×Ω2
f− d(µ⊗ ν)
=
∫
Ω1
(∫
Ω2
(f+)x dν
)
µ(dx)−
∫
Ω1
(∫
Ω2
(f−)x dν
)
µ(dx)
=
∫
Nc
(∫
Ω2
(f+)x dν
)
µ(dx)−
∫
Nc
(∫
Ω2
(f−)x dν
)
µ(dx)
=
∫
Nc
(∫
Ω2
fx dν
)
µ(dx) =
∫
Ω1
(∫
Ω2
fx dν
)
µ(dx),
where we use that the integral on a set of measure zero is zero.
The proof that
∫
Ω1×Ω2 f d(µ⊗ ν) =
∫
Ω2
( ∫
Ω1 f
y dµ
)
ν(dy) is analogous.
If one of its positive part f+ or negative part f− is not integrable, then the
theorem fails to hold. However, if f is non-negative as in Theorem 7.11, then
the integral is allowed to be infinite.
7.3.2 Applications of Fubini and Tonelli Theorems
Example 7.13. Let (Ω,F , µ) be a σ-finite measure space, and let m be
the Lebesgue measure on (R,B(R)). Consider a non-negative F-measurable
function f : Ω→ [0,∞], and let E be the set given by
E = {(ω, y) ∈ (Ω,R) : 0 6 y < f(ω)}.
Observe that E encapsulates the area under f . Moreover, E lies in the product
σ-algebra F ⊗ B(R), since f is F-measurable. Then, we can compute that
(µ⊗m)(E) =
∫
Ω
m(Eω)µ(dω) =
=
∫
Ω
m({y ∈ R : 0 6 y < f(ω)})µ(dω) =
∫
Ω
f(ω)µ(dω).
On the other hand, we have
(µ⊗m)(E) =
∫
R
µ(Ey)m(dy) =
∫ ∞
0
µ({ω ∈ Ω : 0 6 y < f(ω)}) dy,
which gives the equality∫
Ω
f(ω)µ(dω) =
∫ ∞
0
µ({ω ∈ Ω : 0 6 y < f(ω)}) dy.
This relation tells us that we can integrate a non-negative measurable function
f over a σ-finite measure µ by instead looking at the function y 7→ µ(Ey) (with
E as defined above), whose integral can sometimes be easier to compute. 4
73
To grasp more intuition on Theorems 7.11 and 7.12, we revisit two important
properties of doubly-indexed sequences.
Proof of Theorem 1.13. Let µ denote the counting measure on (N,P(N)), where
µ is σ-finite, and consider the function f : N× N→ [0,∞] given by
f(m,n) = xm,n, m, n ∈ N.
Since P(N) ⊗ P(N) = P(N × N), the function f is P(N) ⊗ P(N)-measurable.
Moreover, ∫
N×N
f d(µ⊗ µ) =
∑
(m,n)∈N2
xm,n,
∫
N
(∫
N
fm dµ
)
µ(dm) =
∞∑
m=1
∞∑
n=1
xm,n,
and ∫
N
(∫
N
fn dµ
)
µ(dn) =
∞∑
n=1
∞∑
m=1
xm,n,
which are all equal by Theorem 7.11, completing the proof.
Proof of Theorem 1.14. Let µ denote the counting measure on (N,P(N)), and
consider the function f : N× N→ R given by
f(m,n) = xm,n, m, n ∈ N.
Since P(N) ⊗ P(N) = P(N × N), the function f is P(N) ⊗ P(N)-measurable.
Moreover, the sum
∑
(m,n)∈N2 xm,n converges absolutely if and only if f is µ⊗µ-
integrable. Thus, by Theorem 7.12, we can conclude as for Theorem 1.13, in
that ∞∑
m=1
∞∑
n=1
xm,n =
∑
(m,n)∈N2
xm,n =
∞∑
n=1
∞∑
m=1
xm,n.
In other words, for absolutely convergent (not necessarily non-negative) doubly-
indexed series, the order of summation can be reversed.
7.4 Independence
In this section, we introduce various concepts of independence and how they
relate to each other.
74
7.4.1 Independence of σ-algebras
Let (Ω,F ,P) be a probability space. Typically, the σ-algebra F on Ω is called
a family of events, so that an event is an F-measurable subset of Ω.
We know that two events A and B are said to be independent if their joint
probability equals the product of their probabilities, that is
P(A ∩B) = P(A)P(B), (7.14)
and two random variables X and Y are said to be independent if
P(X ∈ C, Y ∈ D) = P(X ∈ C)P(Y ∈ D), ∀C,D ∈ B(R). (7.15)
We also know it is possible to have A1 independent of A2, A2 independent of
A3, and A3 independent of A1, without P(A1 ∩ A2 ∩ A3) = P(A1)P(A2)P(A3).
This notion of pairwise independence can be useful at times (for instance, the
variance of the sum is the sum of the variances, etc.), but here we are interested
in a stronger notion of independence.
For A1, A2, A3 to be independent, we require that
P(B1 ∩B2 ∩B3) = P(B1)P(B2)P(B3),
for all choices of Bj ∈ {∅, Aj , Acj ,Ω}. This motivates the following definition.
Definition 7.16 (Independent σ-algebras). Let (Ω,F ,P) be a probability
space and G1, . . . ,Gn a collection of sub-σ-algebras of F . We say that
G1, . . . ,Gn are independent if
P(A1 ∩ · · · ∩An) =
n∏
i=1
P(Ai), ∀Ai ∈ Gi, i = 1, . . . , n.
The abstract formulation of σ-algebras turns out to be very powerful and
versatile, as we will see now.
Definition 7.17 (Independent random variables). Let (Ω,F ,P) be a
probability space and X1, . . . , Xn a collection of random variables. We say
that X1, . . . , Xn are independent if the sub-σ-algebras of F given by
σ(X1), . . . , σ(Xn)
are independent.
Remark 7.18. The random variables X1, . . . , Xn on a given probability space
(Ω,F ,P) are independent if and only if
P(X1 ∈ B1, . . . , Xn ∈ Bn) =
n∏
i=1
P(Xi ∈ Bi), ∀Bi ∈ B(R), i = 1, . . . , n.
75
Indeed, the events in σ(Xj) are exactly of the form {Xj ∈ Bj} for Bj ∈ B. 4
Exercise 7.19. Suppose X1 and X2 are independent random variables. Suppose
also that g1 and g2 : R→ R are Borel-measurable functions. Show that g1(X1)
and g2(X2) are independent random variables. 4
Finally, the definition of independent events is conveniently formulated using
the definition of independent random variables.
Definition 7.20 (Independent events). Let (Ω,F ,P) be a probability
space and A1, . . . , An a collection of events. We say that A1, . . . , An are
independent if the random variables given by
1A1 , . . . ,1An
are independent.
Remark 7.21. Events A1, . . . , An on a given probability space (Ω,F ,P) are
independent if and only if the sub-σ-algebras of F given by
G1, . . . ,Gn
are independent, where Gi = σ({Ai}) = {∅,Ω, Ai, Aci}, i = 1, . . . , n. 4
7.4.2 Independence and product measures
We now turn to product measures as discussed in previous sections.
Remark 7.22 (Longer products). Given σ-finite measure spaces (Ωj ,Fj , µj) for
j = 1, . . . , n, we can define F1 ⊗ · · · ⊗ Fn := σ(F1 × · · · × Fn). We can also
define µ1 ⊗ · · · ⊗ µn as the unique measure ν such that ν(A1 × · · · × An) =
µ1(A1) · · ·µn(An). The theory studied in previous sections is a particular case
when n = 2. Uniqueness follows from the pi-λ Theorem as before, and existence
follows by taking (µ1 ⊗ · · · ⊗ µn−1)⊗ µn. 4
We define the distribution of a collection of random variables X1, . . . , Xn defined
on (Ω,F ,P) by
PX1,...,Xn(A) := P
[
(X1, . . . , Xn) ∈ A
]
, A ∈ B(Rn).
From Remarks 7.18 and 7.22, X1, . . . , Xn are independent if and only if their
distribution satisfies
PX1,...,Xn = PX1 ⊗ · · · ⊗ PXn . (7.22a)
We now establish a useful and well-known criterion that shows independence of
random variables via their cumulative distribution functions.
76
Proposition 7.23 (Criterion for independent random variables). Let
X1, . . . , Xn be random variables on a probability space (Ω,F ,P). Then, the
variables X1, . . . , Xn are independent if and only if
P(X1 6 a1, . . . , Xn 6 an) =
n∏
i=1
P(Xi 6 ai), ∀a1, . . . , an ∈ R.
Proof. The direct implication is immediate. Now suppose the above equality
holds for all a1, . . . , an ∈ R. Define Aa1,...,an := (−∞, a1] × · · · × (−∞, an]
and E := {Aa1,...,an}a1,...,an∈R. The above equality says that PX1,...,Xn and
PX1⊗· · ·⊗PXn coincide on E . Now note that E is a pi-system and σ(E) = B(Rn).
By Theorem 2.43, PX1,...,Xn = PX1 ⊗ · · · ⊗ PXn , completing the proof.
Exercise 7.24. Suppose X and Y are independent integrable random variables.
Show that XY is an integrable random variable and E[XY ] = (EX)(EY ). 4
Hints. Let h(ω) = (X(ω), Y (ω)) ∈ R2. Let f(x, y) = x and g(x, y) = y. You
will need to combine Theorem 5.17, (7.22a), Tonelli to show that fg is integrable
against h∗P, and Fubini to justify iterated integrals.10
10A less abstract but more laborious approach would be to start from the non-negative case,
take independent simple approximations, expand the expression for integral, use MCT, and
finally bootstrap to the integrable case.
77
8 Convergence of measurable functions
Convergence is related to approximation, and as such it is at the core of Measure
Theory and Probability Theory. To say that a number x is close to a fixed
number y has a unique meaning: their difference is small. More precisely,
xn → y if |xn − y| can be made as small as possible by requiring n to be large.
The situation becomes considerably more involved when we replace numbers
by functions. If we are considering Borel functions fn and g on a measure
space (Ω,F , µ), or random variables Xn and Y that can be observed on a given
probabilistic model, there are several different notions of convergence, each one
with their own meaning and implications.
8.1 Modes of convergence
8.1.1 Convergence of Borel functions
Definition 8.1 (a.e. convergence). Let (Ω,F , µ) be a measure space and
(fn)n>1, f be Borel functions. We say that (fn)n>1 converges to f µ-a.e. if
µ{ω : fn(ω) 6→ f(ω)} = 0.
Then, we write fn
a.e.−−→ f . The measure µ is implicit in the notation.
Let us consider another closely-related definition and then move to some
examples to see how they compare.
Definition 8.2 (Convergence in measure). Let (Ω,F , µ) be a measure space
and (fn)n>1, f be Borel functions. We say that (fn)n>1 converges to f in
measure if for all ε > 0,
lim
n→∞µ{|fn − f | > ε} = 0.
Then, we write fn
µ−→ f .
Example 8.3. Consider the measure space ([0, 1],B([0, 1]),m|[0,1]) and (fn)n>1 a
sequence of Borel functions given by
fn(x) = 1[0, 1n ), x ∈ [0, 1], n > 1.
Then, we can see that
m|[0,1](fn 6→ 0) = m|[0,1]({0}) = 0,
78
1 1
2
1
2
1
1
4
1
4
1
2
1
2
3
4
3
4
1
f0,0 f1,0 f1,1
f2,0 f2,1 f2,2 f2,3
Figure 8.1: First elements of the ‘dancing wave.’
and so, fn
a.e.−−→ 0. Moreover, for any 0 < ε < 1,
m{|fn| > ε} = 1
n
→ 0,
and for ε > 1 this measure is zero for all n, so fn m−→ 0. 4
Example 8.4 (Travelling wave). Consider the measure space (R,B(R),m), and
consider the sequence of Borel functions (fn)n>1, where fn = 1[n,n+1), which
resembles a travelling wave. Notice that fn(x) → 0 for every x, in particular
fn(x)→ 0 for a.e. x. However, we have that
m{|fn| > ε} = 1 6→ 0
for ε = 12 , which implies that fn does not converge to 0 in measure. 4
Example 8.5 (Dancing wave). Consider the probability space (R,B(R),m|[0,1]),
and consider the sequence of Borel functions (fn,k)n>0, where fn,k =
1( k2n ,
k+1
2n ]
, k = 0, . . . , 2n − 1, which resembles a dancing wave. Figure 8.1
illustrates the construction of fn,k. As n gets larger, each sub-interval gets
shorter, so that for any 0 < ε < 1 and any p > 1, we have
m{( k2n , k+12n ]} = 2−n → 0
On the other hand, given n > 0, letting k range from 0 to 2n − 1 ensures that
fn,k still oscillates between 0 and 1 for every x ∈ (0, 1]. We now order the pairs
(n, k) sequentially as follows. For each j > 1, there is a unique pair n > 0 and
0 6 k < 2n such that j = 2n + k. Write gj for the corresponding fn,k. From
the above observations, gj → 0 in measure as j →∞, but the set of x such that
gj(x) converges does not contain points in (0, 1]. 4
Definition 8.6 (Convergence in the vector space Lp). Let (Ω,F , µ) be a
measure space. For p > 1, we define Lp(µ) as the space of Borel functions
79
f such that
∫
Ω |f |p dµ < ∞. Given f and (fn)n>1 in Lp(µ), we say that
(fn)n>1 converges to f in Lp if
lim
n→∞
∫
Ω
|fn − f |p dµ = 0.
In this case, we write fn
Lp−−→ f .
Remark 8.7. To see that Lp is a vector space, use |f + g|p 6 |2f |p + |2g|p. 4
Example 8.8 (Dancing wave revisited). Consider the ‘growing dancing wave’
given by f˜n,k = 2n/2fn,k, and define g˜j as the corresponding sequence. Then
for p = 1 we have ∫
[0,1]
|f˜n,k|dm = 2−n/2 → 0,
so g˜j → 0 in L1 as j →∞. On the other hand,∫
[0,1]
|f˜n,k|2 dm = 1 6→ 0,
so g˜j 6→ 0 in L2 as j →∞. 4
Example 8.9 (Integrating the tail). Let f(x) = e−|x| on (R,B(R),m), and take
the sequence fn(x) = 1nf(
x
n ). We can think of fn as ‘spreading’ f by a factor
of n. Then fn(x)→ 0 for every x ∈ R, and∫
R
|fn(x)|2 dm = 1
n
∫
R
|f(x)|2 dm→ 0,
so fn → 0 in L2 as n→∞. On the other hand,∫
R
|fn(x)|dm =
∫
R
|f(x)|dm = 2 6→ 0,
so fn 6→ 0 in L1 as n → ∞. Note that for every t ∈ R, 0 6 fn(t) 6 1n , so
fn
a.e.−−→ 0. 4
8.1.2 Convergence of random variables
When the measure spaces are probability spaces, we use terminology more
suitable to random variables. To this end, let (Ω,F ,P) be a probability space
and (Xn)n>1, X be random variables.
(i) We say that (Xn)n>1 converges to X P-a.s. if
P( lim
n→∞Xn = X) = 1.
In this case, we write Xn
a.s.−−→ X (the measure P is implicit).
80
(ii) We say that (Xn)n>1 converges to X in probability, if, for every ε > 0,
lim
n→∞P(|Xn −X| > ε) = 0.
In this case, we write Xn
P−→ X.
(iii) Finally, we say that (Xn)n>1 converges to X in Lp (or in p-th moment)
if X ∈ Lp(P), p ∈ [1,∞), i.e. E|X|p <∞, and
lim
n→∞E|Xn −X|
p = 0.
In this case, we write Xn
Lp−−→ X.
Remark 8.10. For a probability space, if Xn
Lp−−→ X, then EXn → EX. 4
Proof. Using Exercise 5.6 and Jensen’s inequality:∣∣EXn − EX∣∣p 6 (E|Xn −X| )p 6 E|Xn −X|p → 0.
Recall that a random variable X on a probability space (Ω,F ,P) is equipped
with a cumulative distribution function FX(x) = P(X 6 x), x ∈ R. In
particular, we can discuss convergence of a sequence of random variables via
their distribution functions.
Definition 8.11 (Convergence in distribution). Let (Ω,F ,P) be a
probability space and (Xn)n>1 and X be random variables with (FXn)n>1
and FX their corresponding distribution functions. We say that (Xn)n>1
converges to X in distribution if
lim
n→∞FXn(x) = FX(x),
for every point x ∈ R at which FX is continuous. Then, we write Xn d−→ X.
Remark 8.12. Convergence in distribution is only concerned with the associated
distribution functions. In particular, a sequence of random variables (Xn)n>1
converging in distribution to some random variable X might even be defined on
a different probability space from that of X. 4
Example 8.13. Let (Ω,F ,P) be a probability space and (Xn)n>1 a sequence of
random variables, each Geometric distributed with probability of success pn,
that is,
FXn(x) = 1− (1− p)x, x ∈ N, n > 1.
Further, assume that npn → λ > 0, as n → ∞. We would like to show
that (Xnn )n>1 converges in distribution to the exponential distribution with
parameter λ. First, for x ∈ [0,∞) and n > 1, we can write
81
FXn
n
(x) = P(Xn 6 nx) = P(Xn 6 bnxc) =
= 1− (1− pn)bnxc = 1− (1− npn
n
)nx(1− npn
n
)bnxc−nx.
Now, by observing that
(1− npn
n
)nx n→∞−−−−→ e−λx and (1− npn
n
)bnxc−nx n→∞−−−−→ 1,
we get
FXn
n
(x) n→∞−−−−→ 1− eλx,
where the limiting distribution is precisely that of the exponential distribution
with parameter λ, and so we are done. 4
Example 8.14. Let (Ω,F ,P) be a probability space and (Xn)n>1 a sequence of
random variables, each with 1n and 1 being equiprobable outcomes, that is,
P(Xn =
1
n
) = P(Xn = 1) =
1
2 , n > 1.
Under this construction, we can expect that (Xn)n>1 converges in distribution
to a Bernoulli random variable X with parameter 12 . Indeed, for each x ∈ R,
we have
FXn(x) =

0, x < 1n
1
2 , x ∈ [ 1n , 1)
1, x > 1
−−−−→
n→∞

0, x 6 0
1
2 , x ∈ (0, 1)
1, x > 1
where the right hand side is precisely the distribution function of X, except
at x = 0. But this is a discontinuity point of FX , so we can still deduce that
Xn
d−→ 0. 4
Example 8.15. Let (Ω,F ,P) be a probability space and (Xn)n>1 be a sequence
of random variables, each Cauchy distributed with location parameter 0 and
scale parameter 1n , that is,
fXn(x) =
1
1
npi(1 + n2x2)
= n
pi(1 + n2x2) , x ∈ R, n > 1.
From Figure 8.2, we can see that as n gets larger, fXn(x) becomes closer to
f(x) = 0, so we claim that (Xn)n>1 converges in distribution to 0. For x ∈ R
and n > 1, we have
FXn(x) = P(Xn 6 x) =
∫ x
−∞
n
pi(1 + n2y2) dy =
=
[
arctan(ny)
pi
]x
−∞
= 12 +
arctan(nx)
pi
82
f3(x)
f2(x)
f1(x)
x
Figure 8.2: Probability density function of Cauchy distribution with location
parameter 0 and scale parameter n = 1, 12 ,
1
3 .
Now, taking limit as n→∞ gives
lim
n→∞FXn(x) =

0, x < 0
1
2 , x = 0
1, x > 0
where the limiting distribution is that of a constant X = 0 random variable,
except at x = 0, which is a point of discontinuity of FX , and hence we can still
deduce that Xn
d−→ 0. In addition, we show convergence in probability. For each
ε > 0, we have
P(|Xn| > ε) = 2
∫ ∞
ε
n
pi(1 + n2x2) dx = 2
[
arctan(nx)
pi
]∞
ε
= 1− 2arctan(nε)
pi
.
Taking limit as n→∞ yields
lim
n→∞P(|Xn| > ε) = 0,
thus, Xn
P−→ 0. 4
Remark 8.16. In the above example, convergence in distribution follows from
Proposition 8.42 that we will see later on. 4
Theorem 8.17 (Convergence in distribution in terms of test functions). For
a family of random variables X and (Xn)n, we have Xn
d−→ X if and only if
Ef(Xn)→ Ef(X) for every bounded continuous function f : R→ R.
Proof. See [Kle14, Theorem 13.23].
83
We shall later see that convergence in distribution is the weakest among these
modes of convergence. However, it is quite often useful in practice.
8.2 Almost sure convergence and Borel-Cantelli
To get to convergence of real-valued random variables, for “most” or “almost all”
points ω ∈ Ω, we need a systematic way of representing convergence Xn → X
in terms of certain events that are amenable to study.
8.2.1 Infinitely often and eventually
Definition 8.18 (Infinitely often and eventually). Suppose (Ω,F) is a
measurable space and (An)n>1 is a sequence of events (i.e. measurable sets).
We define the event “An infinitely often,” abbreviated “An i.o.,” also denoted
lim sup
n→∞
An, as
lim sup
n→∞
An :=
∞⋂
n=1
∞⋃
m=n
Am.
This event is the set of all ω such that ω ∈ An for infinitely many An’s,
that is, the set of ω such that, for all n > 1 there exists m > n such that
ω ∈ Am. Occurrence of the event lim supnAn means that infinitely many
events among {An}n occur.
We define the event “An eventually,” also denoted lim inf
n→∞ An, as
lim inf
n→∞ An :=
∞⋃
n=1
∞⋂
m=n
Am
This event is the set of all ω such that ω ∈ An for all but finitely many An’s,
that is, the set of ω such that, for some n > 1 and all m > n one has ω ∈ Am.
Occurrence of the event lim supnAn means that all but finitely many events
among {An}n occur.
Remark 8.19. By De Morgan’s laws, we have
{An i.o.}c = {Acn eventually} and {Acn i.o.} = {An eventually}c. 4
Exercise 8.20. Prove the above identities. 4
Let us point out already how the notions of eventually and infinitely often are
related to convergence. A sequence of numbers (xn)n converges to a number x
if, for every ε > 0, we have |xn− x| < ε for all n sufficiently large, which means
|xn − x| < ε for all but finitely many n’s, which means |xn − x| < ε eventually.
Likewise, xn 6→ x if for some ε > 0 we have |xn − x| > ε infinitely often.
84
Proposition 8.21 (Criterion for a.e. convergence). Let (Ω,F , µ) be a
measure space and (fn)n>1, f be Borel functions. Then, (fn)n>1 converges
to f µ-a.e., if and only if
µ
({|fn − f | > ε i.o.}) = 0 for all ε > 0.
Proof. We have the following chain of equivalent properties:
µ{fn 6→ f} = 0
m definition of limit
µ{∃ε > 0, |fn − f | > ε i.o.} = 0
m take 1k+1 < ε 6 1k
µ{∃k ∈ N, |fn − f | > 1k i.o.} = 0
m continuity from below of µ
lim
k→∞
µ{|fn − f | > 1k i.o.} = 0
m non-decreasing numbers
∀k ∈ N, µ{|fn − f | > 1k i.o.} = 0
m take 1k+1 < ε 6 1k
∀ε > 0, µ{|fn − f | > ε i.o.} = 0.
This completes the proof.
8.2.2 Borel-Cantelli lemmas
From the above proposition, we have enough motivation to try and estimate the
probability that a sequence of events occur infinitely often.
Lemma 8.22 (Borel-Cantelli Lemma I). Let (Ω,F , µ) be a measure space
and (An)n>1 a sequence of measurable sets. If
∞∑
n=1
µ(An) <∞,
then
µ
(
lim sup nAn
)
= 0.
Proof. The idea is to set appropriate upper bounds that converge to 0, so as
make the relevant conclusions. Using continuity from above and countable sub-
85
additivity of µ, we have
µ({An i.o.}) = µ
( ∞⋂
n=1
∞⋃
m=n
Am
)
= lim
n→∞µ
( ∞⋃
m=n
Am
)
6 lim
n→∞
∞∑
m=n
µ(Am) = 0.
The last limit is zero because it is the tail of a convergent series.
Lemma 8.23 (Borel-Cantelli Lemma II). Let (Ω,F ,P) be a probability space
and (An)n>1 a sequence of independent events. If
∞∑
n=1
P(An) =∞,
then
P
(
lim sup nAn
)
= 1.
Proof. We look instead at {An i.o.}c. By Remark 8.19, continuity from below
and from above, and independence of (Acn)n>1, we get
P({An i.o.}c) = P
( ∞⋃
n=1
∞⋂
m=n
Acm
)
= lim
n→∞P
( ∞⋂
m=n
Acm
)
= lim
n→∞ limk→∞P
(
k⋂
m=n
Acm
)
= lim
n→∞ limk→∞
k∏
m=n
(1− P(Am))
6 lim
n→∞ limk→∞
k∏
m=n
e−P(Am)
= lim
n→∞ limk→∞ e
−
∑k
m=n
P(Am) = e−∞ = 0,
where the upper bound follows from inequality 1− x 6 e−x for x > 0.
86
Remark 8.24. In practice, we can apply Borel-Cantelli Lemma I to deduce
almost everywhere convergence, that is, if we can show that for every ε > 0,
∞∑
n=1
µ
(|fn − f | > ε) <∞
then by Proposition 8.21 we can conclude that fn
a.e.−−→ f . 4
Example 8.25. Let (Ω,F ,P) be a probability space and (Xn)n>1 be a sequence
of independent random variables, each uniformly distributed on [0, 1]. Consider
the sequence of random variables (Yn)n>1 given by
Yn = min{X1, . . . , Xn}, n > 1.
We show that Yn
a.s.−−→ 0. Given n ∈ N, independence and construction of
X1, . . . , Xn for each ε ∈ (0, 1) yields
P(|Yn − Y | > ε) = P(|Yn| > ε) = P(X1 > ε, . . . ,Xn > ε)
= P(X1 > ε)n
= (1− ε)n,
and consequently, we get that
∞∑
n=1
P(|Yn| > ε) =
∞∑
n=1
(1− ε)n = 1− ε
ε
<∞.
Hence, for each ε ∈ (0, 1), by Borel-Cantelli Lemma I, it follows that
P({|Yn| > ε i.o.}) = 0,
so by Proposition 8.21, we have Yn
a.s.−−→ 0. 4
Exercise 8.26. Let (Ω,F ,P) be a probability space and (Xn)n>1 be a sequence
of random variables with probability distribution given by
P(Xn = an) = pn and P(Xn = 0) = 1− pn, n > 1,
for sequences (an)n and (pn)n with
∑∞
n=1 pn <∞. Show that Xn a.s.−−→ 0. 4
Example 8.27. Suppose an ↑ ∞ and pn ↓ 0 with
∑
n pn =∞, and that (Xn)n is
a sequence of independent random variables distributed as
P(Xn = an) = pn and P(Xn = 0) = 1− pn, n > 1.
Then P(Xn → 0) = 0 but Xn P→ 0 as n→∞. 4
Remark 8.28. When applying Borel-Cantelli Lemma II to argue that almost
sure convergence does not hold, we must be careful to check independence of
the concerned random variables. 4
87
Example 8.29. Let (Ω,F ,P) = ([0, 1],B,m|[0,1]), Xn(ω) = 1[0, 1n ](ω) and X(ω) =
0 ∀ω. Then P(Xn → 0) = 1 since Xn(ω) → 0 for all ω 6= 0. However,
P(Xn = 1) = 1n which is non-summable. If we used the same reasoning as in
the previous example, without noticing that the variables (Xn)n are dependent,
we would conclude that P(Xn → 0) = 0, which is false. 4
Proposition 8.30. If P(An i.o.) = 0 then P(An)→ 0 as n→∞.
Proof. Suppose P(An i.o.) = 0, and take Bk = ∪n>kAn. Since Bk ↓ [An i.o.] as
k → ∞, we have P(Bk) → P(An i.o.) = 0. On the other hand, Bk ⊇ Ak, thus
P(Ak) 6 P(Bk), and therefore P(An)→ 0 as n→∞.
8.3 Laws of Large Numbers
The topic of Laws of Large number dispenses introductions.
Let X1, X2, X3, . . . be a sequence of random variables. Define
Sn := X1 + · · ·+Xn.
We say that the sequence (Xn)n satisfies the law of large numbers if
Sn − ESn
n
→ 0
as n → ∞. We say that (Xn)n satisfies weak LLN if the above convergence is
in probability, and the strong LLN in case convergence is almost-sure.
8.3.1 Laws of Large Numbers with second and fourth moments
Theorem 8.31 (Chebyshev’s Weak Law of Large Numbers). Let (Xn)n be
a sequence of random variables such that Cov(Xi, Xj) = 0 for every i 6= j
and supnVXn <∞. Then (Xn)n satisfies the weak law of large numbers.
Proof. By Chebyshev’s inequality (5.24),
P
(∣∣∣∣Sn − ESnn
∣∣∣∣ > ε) 6 V(Snn )ε2 = VSnε2n2 =
∑n
i=1VXi
ε2n2
6 n · supk VXk
ε2n2
→ 0,
concluding the proof.
Theorem 8.32 (Cantelli’s Strong Law of Large Numbers). Let X1, X2, . . .
be independent random variables such that supn EX4n < ∞. Then (Xn)n
satisfies the strong law of large numbers.
88
Proof. We can suppose that EXk = 0 for all k. Expand S4n as
S4n = (X1 + · · ·+Xn)4 =
∑
i,j,k,l
XiXjXkXl =
∑
i
X4i +
4!
2!2!
∑
iX2iX
2
j+
+ 4!3!
∑
i 6=k
X3iXk +
4!
2!
∑
ji 6=j,k
X2iXjXk + 4!
∑
iXiXjXkXl.
By independence, we have
ES4n =
∑
i
EX4i + 6
∑
iE[X2iX2j ]+
+
∑
k
E
[
4
∑
X3i + 12
∑
X2iXj + 24
∑
XiXjXl
]
EXk.
Since EXk = 0, the second row is zero. Write M := supk EX4k < ∞. By the
Cauchy-Schwarz inequality, E(X2iX2j ) 6
√
M
√
M = M . Hence,
ES4n 6 nM + 6
(
n
2
)
M = (3n2 − 2n)M 6 3n2M.
By Markov’s inequality, we have
P
(∣∣Sn
n
∣∣ > ε) 6 ES4n
ε4n4
6 3M
ε4n2
,
and by Borel-Cantelli Lemma I we get Snn
a.s.−−→ 0.
8.3.2 Laws of Large Numbers with first moment
Theorem 8.33 (Komogorov’s Strong Law of Large numbers). Let
X1, X2, . . . be pairwise independent and identically distributed integrable
random variables. Then (Xn)n satisfies the strong law of large numbers.
Proof. See [Kle14, Theorem 5.17] for the proof found by Etemadi in 1981.
It should be emphasised that, when (Xn)n are i.i.d., the condition of X1 being
integrable is the weakest possible. It is in fact equivalent to X1+···+Xnn
a.s.→ µ for
some µ ∈ R, as shown by the following proposition.
Proposition 8.34. Suppose (Xn)n are i.i.d. and not integrable. Then, a.s.,
the sequence Snn does not converge to any real number.
89
Proof. For every fixed k ∈ N, the variable X1k is not integrable. Since (Xn)n are
identically distributed, by Proposition 5.35 we have∑
n
P( |Xn|k > n) =
∑
n
P( |X1|k > n) =∞.
By Borel-Cantelli Lemma II, almost surely the event |Xn| > kn occurs for
infinitely many values of n. Taking the complement and union over k, we get
P
(
for every k, |Xn| > kn for infinitely many values of n
)
= 1.
We now show that, for a sequence (xn)n satisfying the above condition, we have
lim supn | snn | = +∞, where sn := x1 + · · ·+ xn. This implies the proposition.
So we have to show that, for every r ∈ N, there exists m such that | smm | > r.
Suppose (xn)n satisfies the above condition and let r ∈ N be given. Take k = 2r
and n such that |xn| > kn. If | sn−1n−1 | > r, we take m = n− 1 and we are done.
So suppose | sn−1n−1 | 6 r. Then
|x1 + · · ·+ xn|
n
> |xn|
n
− |x1 + · · ·+ xn−1|
n
>
> |xn|
n
− |x1 + · · ·+ xn−1|
n− 1 >
|xn|
n
− r > k − r = r,
and we can take m = n. This concludes the proof.
8.4 Relation between modes of convergence
8.4.1 Almost-sure convergence and subsequences
Proposition 8.35 (Convergence in measure implies a.e. in a subsequence).
Let (Ω,F , µ) be a measure space and (fn)n>1, f Borel functions with fn µ−→ f .
Then, there exists a subsequence such that fnk
a.e.−−→ f as k →∞.
Proof. First for each n > 1 and k ∈ N, set An,k := {|fn − f | > 1k}. Since
fn
µ−→ f , then for each k ∈ N,
lim
n→∞µ(An,k) = limn→∞µ(|fn − f | >
1
k ) = 0.
In other words, for each k ∈ N, there exists nk ∈ N such that for any n > nk,
µ(An,k) = µ(|fn − f | > 1k ) < 1k2 .
We can also assume that nk > nk−1 for k > 2. Set Bk := Ank,k, and notice that
∞∑
k=1
µ(Bk) <
∞∑
k=1
1
k2
<∞,
and so µ(Bk i.o.) = 0 by Borel-Cantelli Lemma I, thus fnk
a.e.−−→ f , as k →∞.
90
Convergence a.e. does not imply convergence in measure, as shown in
Example 8.4. However, this is the case in finite measure spaces.
Proposition 8.36 (a.s. convergence implies convergence in probability). Let
(Ω,F ,P) be a probability space and (Xn)n>1, X be random variables. If Xn →
X a.s. as n→∞, then Xn → X in probability as n→∞.
Proof. Since Xn
a.s.−−→ X, by Proposition 8.21, we have
P(|Xn −X| > ε i.o.) = 0, ∀ε > 0.
Then, we can apply Proposition 8.30 to get
lim
n→∞P(|Xn −X| > ε) = 0, ∀ε > 0,
so that Xn
P−→ X, as required.
8.4.2 Convergence in Lp and in expectation
Proposition 8.37 (Convergence in Lp implies convergence in measure). Let
(Ω,F , µ) be a measure space and (fn)n>1, f be Borel functions. If fn → f in
Lp as n→∞, then fn → f in measure as n→∞.
Proof. Since fn
Lp−−→ f , for some p > 1, then
lim
n→∞
∫
Ω
|fn − f |p dµ = 0.
By Chebyshev’s inequality (Theorem 5.22), for all ε > 0, it follows that
µ(|fn − f | > ε) 6 1
εp
∫
Ω
|fn − f |p dµ n→∞−−−−→ 0,
thus fn converges to f in measure, i.e. fn
µ−→ f .
Proposition 8.38 (Comparison of Lp and Lq). For a probability space, if
q > p > 1 and Xn L
q
−−→ X, then Xn L
p
−−→ X.
Proof. The proof goes as follows:
(E|Xn −X|p)1/p 6 (E|Xn −X|q)1/q → 0,
91
so we only need to justify the first inequality. Let Y be a non-negative random
variable. Take (Zn)n as integrable random variables such that 0 6 Zn ↑ Y p
(for instance, Zn = min{Y, n}p). The function f : [0,∞) → [0,∞) given by
f(x) = xq/p is convex. Hence, by Jensen’s inequality, we have
(EZn)q/p 6 EZq/pn 6 EY q.
Now, we can conclude that (EY p)q/p 6 EY q by the Monotone Convergence
Theorem. This completes the proof.
Example 8.39. Let (Ω,F ,P) = ([0, 1],B,m), and take Xn = n1[0, 1n ). Then
Xn → 0 a.s., but
∫
Ω |Xn|dP = 1 so Xn 6→ 0 in Lp. 4
Examples 8.8 and 8.39 show that neither a.s. convergence implies convergence
in Lp, nor convergence in Lp implies a.s. convergence. Combined with
Propositions 8.36 and 8.37, convergence in probability implies neither of them.
Theorem 8.40 (Dominated Convergence Theorem revisited). Let (Ω,F , µ)
be a measure space, and (fn)n>1 a sequence of Borel functions such that
fn
µ−→ f . If there exists a non-negative Borel function g ∈ Lp(µ), p ∈ [1,∞),
such that |fn| 6 g for all n > 1, then fn L
p
−−→ f .
If moreover p = 1 or µ is a probability measure, then
lim
n→∞
∫
Ω
fn dµ =
∫
Ω
f dµ.
Proof. Convergence of the integral follows from Remark 8.10 if µ(Ω) = 1, and
from Exercise 5.6 if p = 1, so we only need to prove that fn → f in Lp.
Let us first suppose that fn
a.e.−−→ f . By noticing that
|fn(ω)− f(ω)|p 6 2p|g(ω)|p, ω ∈ Ω,
we can use Theorem 5.42 to deduce that
lim
n→∞
∫
Ω
|fn − f |p dµ = 0. (8.41)
Now, let us assume that fn
µ−→ f . Consider a subsequence (fnk)k. Then fnk
µ−→ f
as k → ∞. By Proposition 8.35, there is a sub-subsequence (fnkj )j such that
fnkj
a.e.−−→ f as j → ∞. By the previous case, we have ∫Ω |fnkj − f |p → 0 as
j →∞. Using Lemma 0.1, we get (8.41), concluding the proof.
92
8.4.3 Convergence in distribution
Convergence in distribution is actually the weakest concept, and it is implied
by all modes of convergence.
Proposition 8.42 (Convergence in probability implies convergence in
distribution). Let (Ω,F ,P) be a probability space and (Xn)n>1, X be random
variables. If Xn → X in probability, then Xn → X in distribution as n→∞.
Proof. Let f : R → R be a bounded continuous function. By Theorem 8.17, it
is enough to show that
Ef(Xn)→ Ef(X) (8.43)
as n→∞. First, suppose that Xn a.s.−−→ X. Since f is continuous, we have that
{ω : Xn(ω)→ X(ω)} ⊆ {ω : f(Xn(ω))→ f(X(ω))},
hence f(Xn)
a.s.−−→ f(X). Since f is bounded, |f(Xn)| 6 sups∈R |f(s)| < ∞ for
all n > 1. Applying Theorem 8.40 with constant g, we get (8.43).
If instead we suppose Xn
P−→ X, we can get (8.43) by taking sub-subsequences
and invoking Lemma 0.1, as we did in the proof of Theorem 8.40.
Example 8.44. Let Ω = {0, 1}, P = 12δ0 + 12δ1, Xn(ω) = ω and Y (ω) = 1 − ω.
Then, as n→∞, Xn → Y in distribution but not in probability. 4
So convergence in distribution does not imply convergence in probability in
general, but there is one situation where it does.
Proposition 8.45 (Convergence in distribution to constant limit). Suppose
Xn
d−→ X and X = c a.s. for some c ∈ R. Then Xn P−→ c.
Proof. First, notice that by assumption FXn(x) → 0 if x < c and FXn(x) → 1
if x > c. Let ε > 0. Then
P(|Xn − c| > ε) 6 P(Xn 6 c− ε) + P(Xn > c+ ε) =
= FXn(c− ε) + 1− FXn(c+ ε)→ 0 + 1− 1 = 0,
so Xn
P−→ c, as claimed.
In summary, for probability spaces, the relationship between the four different
93
notions of convergence are represented below:
a.s.

P +3
subsequence
XX
dominated case
yy
d
constant limit
}}
Lp+s +3 Lp
CK
The general implications shown in the diagram are Propositions 8.36, 8.37,
8.38 and 8.42. No other implication holds in general, as shown by Examples 8.8,
8.39 and 8.44. The dotted lines refer to implications that hold in particular cases,
which are Propositions 8.35 and 8.45, and Theorem 8.40.
94
References
[Bar95] R. G. Bartle. The elements of integration and Lebesgue measure.
Wiley Classics Library. John Wiley & Sons, Inc., New York, 1995.
doi.
[Coh13] D. L. Cohn. Measure theory. Birkhäuser Advanced Texts: Basel
Textbooks. Birkhäuser/Springer, New York, second edn., 2013. doi.
[Dem20] A. Dembo. Probability theory: STAT310/MATH230, 2020. Lecture
Notes, Stanford. Available online as of June 2020.
[Edg98] G. A. Edgar. Integral, probability, and fractal measures. Springer-
Verlag, New York, 1998. doi.
[Kle14] A.Klenke. Probability theory. Universitext. Springer, London, second
edn., 2014. doi.
[Kub15] C. S. Kubrusly. Essentials of measure theory. Springer, Cham, 2015.
doi.
[Shi16] A. N. Shiryaev. Probability. 1, vol. 95 of Graduate Texts in
Mathematics. Springer, New York, third edn., 2016. doi.
[Tao11] T. Tao. An introduction to measure theory, vol. 126 of Graduate
Studies in Mathematics. American Mathematical Society, 2011. doi.
Available online as of June 2020.
[Tay73] S. J. Taylor. Introduction to Measure and Integration. Cambridge
University Press, 1973. doi.
95

欢迎咨询51作业君