程序代写案例-ST 302

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
ST 302 –Stochastic Processes
Thorsten Rheinlander
London School of Economics and Political Science
August 17, 2006
1 Information and Conditioni
ng
1.1 Sigma-Algebras and Filtrations
Let us recall that a -algebra is a collection F of subsets of the sample space

such that
(i) ;,
2 F
(ii) if A 2 F then also Ac 2 F (here Ac denotes the complementary set to A)
(iii) if we have a sequence A1, A2, ... of sets in F , then [1i=1Ai 2 F .
We say that a collection of sets generates a -algebra F if F is the smallest
-algebra which contains all the sets. The trivial -algebra is the one containing
only ; and
.
Interpretation: We perform a random experiment. The realized outcome is an
element ! of the sample space
. Assume that we are given some partial infor-
mation in form of a -algebra F . If F does not contain all subsets of
, we might
not know the precise !, but we may narrow down the possibilities. We know with
certainty whether a set A 2 F either contains ! or does not contain it –these sets
are resolved by our given information.
Example 1.1 We toss a coin three times, with the result of each toss being H
(head) or T (tail). Our time set consists of t = 0; 1; 2; 3. The sample space
is
the set of all eight possible outcomes. At time 0 (just before the …rst coin toss),
1
we only know that the true ! does not belong to ; and does belong to
, hence
we set
F0 = f;;
g :
At time 1, after the …rst coin toss, in addition the following two sets are resolved:
AH = fHHH;HHT;HTH;HTTg
AT = fTHH; THT; TTH; TTTg :
We can say with certainty whether ! belongs to AH or AT . In contrast, the
information about the …rst coin toss only is not enough to determine with certainty
whether ! is contained e.g. in fHHH;HHTg –for this we would have to wait
until the second coin toss. As the complement of AH is AT , and the union of AH
and AT equals
, we set
F1 = f;;
; AH ; ATg :
At time 2, after the second coin toss, in addition to the sets already contained in
F1, the sets
AHH = fHHH;HHTg ; AHT = fHTH;HTTg ;
ATH = fTHH; THTg ; ATT = fTTH; TTTg
get resolved, together with their complements and unions. Altogether, we get
F2 =
;;
; AH ; AT ; AHH ; AHT ; ATH ; ATT ; AcHH ; AcHT ; AcTH ; AcTT ;
AHH [ ATH ; AHH [ ATT ; AHT [ ATH ; AHT [ ATT

:
At time 3, after the third coin toss, we know the true realization of !, and can
therefore tell for each subset of
whether ! is a member or not. Hence
F3 = The set of all subsets of
:
As we have F0 F1 F2 F3 these four -algebras are said to form a …ltration.
The random variable
X = number of tails among the rst two coin tosses
is said to be F2-measurable, but not F1-measurable (as we cannot determine the
value of X after only one coin toss).
2
Consider now a discrete time stochastic process X = (X0; X1; X2; :::), i.e. a
collection of random variables indexed by time. Hence X is a function of a chance
parameter ! and a time parameter n. We would write Xn(!) for a function
value, but typically suppress the chance parameter, otherwise the notation gets
too heavy. Moreover, the two variables play quite di¤erent roles, since the chance
parameter ! comes from the sample space
(which can be a very large set, with
no natural ordering) whereas the time parameter n is an element of the ordered
set N+.
We denote with Fn a family of sets containing all information about X up
to time n. In more detail, Fn is the -algebra generated by sets of the form (we
assume that the starting value X0 is just a deterministic number)
fX1 = i1; X2 = i2; :::; Xn = ing
if the state space is discrete. If the state space is R then Fn is the -algebra
generated by sets of the form
fX1 2 (a1; b1) ; X2 2 (a2; b2) ; :::; Xn 2 (an;bn)g
with intervals (a1; b1), (a2; b2), ... We have
F0 = f;;
g (at time zero, we know nothing)
F0 F1 F2 ::: Fn (the more time evolves, the more we know)
(Fn) = (F0;F1;F2; :::) is called a …ltration. Sometimes we write
FXn to specify
that we are collecting information about the stochastic process X.
We say a random variable H is Fn-measurable if it depends on information
about X up to time n only. For example, if f is some function, then f (Xn) is
Fn-measurable; and so is max1inXi. Or, in other words, H is Fn-measurable if
it is a function of X1,...,Xn.
We say that a stochastic process Y = (Yn)n0 is adapted to (Fn) if each Yn is
Fn-measurable.
FYn is called the …ltration generated by Y if it is the smallest
…ltration where Y is adapted to. Summing up,
The random variable H is measurable with respect to a -algebra F if the
information in F is su¢ cient to determine H, and F can even contain more
information than just about H.
The stochastic process X is adapted to a …ltration (Fn) if the information in
(Fn) is su¢ cient to determine all the Xn’s, and (Fn) can even contain more
info.
FXn contains just the information about all the Xn’s. More precisely,
the -algebra FXn contains exactly all the information about X0, X1, ..., Xn.
3
1.2 Conditional Expectation
We consider a -algebra F and want to make a prediction about some random
variable H based on the information contained in F . If H is F-measurable, then
we can precisely determineH from the given information in F . IfH is independent
of F , then the information in F is of no help in predicting the outcome of H. In
the intermediary case, we can try to use the information in F to make an ’educated
guess’about H, without being able to completely evaluate H.
You cannot avoid this section. Its material is absolutely essential for every-
thing that follows. To quote Thomas Mikosch, "The notion of conditional ex-
pectation is one of the most di¢ cult ones in probability theory, but it is also one
of the most powerful tools... For its complete theoretical understanding, measure-
theoretic probability theory is unavoidable." However, it is perfectly possible, and
not even di¢ cult, to get an operational understanding of conditional expectation.
You have just to grasp the following intuitive properties, they will be given for the
forementioned reason without proof.
De…nition 1.2 If E [H2] < 1, then the conditional expectation of H given F ,
written E [H j F ], is the least-squares-best F-measurable predictor of H: amongst
all F-measurable random variables bH (i.e. amongst all predictors which can be
computed from available information), it minimizes
E
bH H2 :
It turns out that one can de…ne (by the partial averaging property below) the
conditional expectation also in the more general case where we only assume that
E [jHj] < 1 (so E [H2] < 1 need not be satis…ed). We give a list of properties
of E [H j F ]. Here A denotes the indicator function of the set A: A (!) = 1 if
! 2 A, A (!) = 0 if ! =2 A.
(a) (Measurability) E [H j F ] is F-measurable
(the conditional expectation is a predictor based on available information)
(b) (Partial Averaging) For every set A 2 F ,
E [AE [H j F ]] = E [AH]
(on every set A 2 F , the conditional expectation of H given F has the same
expectation asH itself). In particular (takeA =
),
E [E [H j F ]] = E [H]
4
Properties (a) and (b) characterize the random variable E [H j F ] uniquely (mod-
ulo null sets).
(c) (Linearity) E [a1H1 + a2H2j F ] = a1E [H 1j F ] + a2E [H2 j F ]
(d) (Positivity) If H 0, then E [H j F ] 0
(e) (Jensen’s Inequality) If f is a convex function such that
E [jf(H)j] <1, then
f (E [H j F ]) E [f (H) j F ]
(f) (Tower Property) If G is a sub -algebra of F (contains less information
than F) then
E [E [H j F ] j G] = E [H j G] :
In case we deal with a …ltration, this property has the following form: for two
time points s t (hence Fs Ft) we have
E [E [H j Ft] j Fs] = E [H j Fs] :
(nested conditional expectations are evaluated by taking only one conditional
expectation with respect to the earliest time point)
(g) (Taking out what is known) If G is a random variable which is F-
measurable, and such that E [jGHj] <1, then
E [GH j F ] = GE [H j F ]
In particular (choose H = 1),
E [G j F ] = G
(if G does only depend on available information then we can predict G fully)
(h) (Rôle of independence) If H is independent of F , then
E [H j F ] = E [H]
(in that case, the information in F is useless in predicting H)
Let X be any random variable. We will sometimes use the notation E [H jX] if
we predict H by using the information about X only (more formally, we condition
with respect to the -algebra generated by X). Moreover, if A
we write
P (Aj F) for E [Aj F ]. An important special case of (h) is if we consider the
trivial -algebra F0 = f;;
g. In that case, the conditional expectation with
respect to F0 is just the common expectation (so the result is a number):
E [H j F0] = E [H] :
5
2 Martingales in Discrete Time
2.1 De…nition
Let X = (Xn)n0 be a discrete time process with E [jXnj] < 1 for all n, and let
(Fn) be some …ltration. We assume that X is adapted to (Fn), that is, each Xn
is Fn-measurable.
De…nition 2.1 X is called a martingale (relative to (Fn), P ) if for all n 1
E [Xnj Fn1] = Xn1;
a submartingale if
E [Xnj Fn1] Xn1;
and a supermartingale if
E [Xnj Fn1] Xn1:
Interpretation: let Xn be the wealth at time n of a player engaged in some casino
game. The martingale property, rewritten as
E [ (Xn Xn1)j Fn1] = 0
(here we have used properties (c): Linearity and (g): Taking out what is known
of the conditional expectation), tells us that, based on all the information about
the game’s progress so far, one can predict that our increase in wealth will be on
average zero: we are playing a fair game. In reality, our wealth will be described in
most games by a supermartingale: there is a tendency to loose on average. There
are some games like Black Jack, however, where you can make a very clever use of
past events (like memorizing the cards which have already been taken out of the
deck) to achieve that your wealth process evolves like a submartingale: you are
engaged in a favourable game.
We will later on introduce the notion of Markov processes: these are stochastic
processes where future predictions depend on the past only via the present state,
they are memoryless. A martingale need not be a Markov process (because future
outcomes may depend on the whole past) and a Markov process need not be a
martingale (since it may be related to an unfair game).
Exercise 2.2 Prove the following statements by applying the properties of
conditional expectation. State in each step which property you use. All random
variables are assumed to be su¢ ciently integrable.
6
Fix some time horizon T 2 N0, and let H be a FT -measurable random
variable. De…ne a stochastic process X for n = 0; 1; 2; :::; T as
Xn = E [Hj Fn] :
Show that X is a martingale: check that X is adapted and that it ful…lls
the martingale property.
Let f be a convex function and X a martingale. Show that f (X) is a
submartingale and f(X) a supermartingale.
A stochastic process X is called predictable if for each n we have that Xn
is Fn1-measurable. Show that every predictable process is (Fn)-adapted,
and that every predictable martingale is constant.
Let Y be a predictable process, and X be a martingale. Show that then the
martingale transform Y X, de…ned as ((Y X)0 = 0 and)
(Y X)n =
nX
i=1
Yi (Xi Xi1)
is a martingale as well.
2.2 Discrete-Time Examples
Example 2.3 Sums of independent zero-mean RVs (random variables). Let
X1, X2,... be a sequence of independent RVs with E [jXkj] <1, 8k, and
E [Xk] = 0; 8k:
De…ne (S0 := 0 and)
Sn := X1 +X2 + :::+Xn:
Let (Fn) =
FXn , the …ltration which is generated by X. Note that Sn is Fn-
measurable (depends only on (X1; X2; :::; Xn)). Then we have for n 1
E [Snj Fn1] = E [Sn1j Fn1] + E [Xnj Fn1] by linearity (c)
= Sn1 + E [Xnj Fn1] taking out what is known (g)
= Sn1 + E [Xn] by independence (h)
= Sn1:
This proves that S = (Sn) is a martingale.
7
Example 2.4 Products of non-negative independent RVs of mean 1. Let
Y1, Y2,... be a sequence of independent non-negative RVs with
E [Yn] = 1; 8n:
De…ne (M0 := 1 and)
Mn := Y1Y2:::Yn:
Let (Fn) =
FYn , the …ltration which is generated by Y . Then, for n 1, we
have
E [Mnj Fn1] = E [Mn1Ynj Fn1]
= Mn1E [Ynj Fn1] taking out what is known (g)
= Mn1E [Yn] by independence (h)
= Mn1:
This proves that M is a martingale.
Example 2.5 Exponential martingale. Let X1, X2,... be a sequence of inde-
pendent and identically distributed RVs with moment-generating function
' () = E [exp (Xi)] <1:
We set (S0 := 0 and) Sn = X1 + ::: +Xn. Let (Fn) =
FXn , the …ltration which
is generated by X.
Then
Mn =
exp(Sn)
' ()n
is a martingale. This follows from the previous example, just put
Yk =
exp (Xk)
' ()
:
Example 2.6 Exponential martingale for random walk. In the setting of
the previous example, denote the distribution of the Xn’s as X where
P (X = +1) = P (X = 1) = 1
2
:
The stochastic process S = (Sn)n0 is called a symmetric random walk. Then
' () = E [exp (X)] =
1
2

e+ + e

= cosh :
8
We get that
Mn =

1
cosh
n
exp (Sn)
is a martingale. In case the walk is not symmetric, then we have
' () = pe+ + (1 p)e:
Choose now such that e = (1 p)=p, then ' () = (1 p) + p = 1 and the
exponential martingale simpli…es to
Mn = exp (Sn) = ((1 p) =p)Sn :
2.3 Optional Stopping Theorem
A martingale has constant expected value (why?). However, in practice one could
try to get on average a better gain by using a more sophisticated approach. For
example, let X be a stochastic process modelling the level of some stock price, we
assume it is a martingale. Often traders are basing their decisions on technical
analysis (e.g. they buy after some particular candlestick pattern has occurred
or after the stock has broken upwards through the 200 days moving average).
Their trading decision thus is triggered by the occurence of some random event.
However, unless they have visionary abilities, this occurence at time n can be
determined by looking at the values X1,...,Xn of the price process up to that time.
This leads to the following
De…nition 2.7 Consider a discrete-time …ltration (Fn). A stopping time is a
mapping T :
! f0; 1; 2; :::;1g such that the set f! : T (!) ng is contained in
Fn, 8n 1 (equivalently, the set f! : T (!) = ng is contained inFn, 8n 1).
Note that T can achieve the value 1.
To get an example for a stopping time, we consider a stochastic process X0,
X1,... with X0 = 0. Then the smallest time n such that Xn surpasses a given
level a > 0 is a stopping time Ta: if Ta (!) = n, then all the X0, X1, ..., Xn1 are
smaller than a whereas Xn is larger than a.
Not all random times are stopping times, though. Let Xn be the stock price
of BMW at time n. We now set U to be the last time where the stock price of
BMW was $100 or lower. U cannot be a stopping time since you must know all
the future stock prices of BMW to determine whether U has occured or not. In
case you would really know that you are at time U , you could make quite some
fortune in a completely riskless manner!
9
Of more practical relevance is therefore the question whether you can turn a
game which is considered to be fair on a turn by turn base (your wealth coming
from this game is modelled as a martingale) by using a clever stopping time-based
strategy into something favourable for you. In fact, at …rst sight even a rather
dull strategy would serve this purpose, as the following example suggests:
Example 2.8 Riskless gain based on a stopping time strategy Assume
our wealth from gambling evolves like a symmetric random walk S where we start
with $0 (we are allowed to get into minus). We play this game until the stopping
time
T1 = min fn 1 : Sn = 1g ;
that is, we stop once we have gained one pound and walk happily away. As you
can imagine, life is not that easy. The next result tells us that this phenomenon
cannot happen when our bank agent o¤ers us a …nite credit line only.
Theorem 2.9 Optional stopping theorem Let T be a stopping time. Let M
be a martingale. Then MT is integrable and
E [MT ] = E [M0]
in each of the following situations:
(i) T is bounded (for some N > 0, T (!) N 8!)
(ii) E [T ] <1, and, for some K 0, jMn (!)Mn1 (!)j K 8 (n; !)
Finally, based on the optional stopping theorem, we are able to give many inter-
esting applications.
Example 2.10 Symmetric random walk. Imagine a game based on fair coin
tossing, that is, the gambler’s fortune is modelled by a symmetric random walk
Sn = X1 + ::: + Xn with S0 = x. Let a < b and x 2 (a; b). We are interested in
the probability that we are reaching the higher gain b …rst. To calculate this, we
consider the stopping time
T = min fn : Sn =2 (a; b)g ;
the …rst time the random walk exits from the interval (a; b). One can show that
E [T ] < 1, so condition (ii) of the optional stopping theorem is ful…lled. We
therefore get
x = E [ST ] = aP (ST = a) + bP (ST = b) :
Since P (ST = a) = 1 P (ST = b), we get that x = a+ (b a)P (ST = b), hence
the probability that starting in x we reach the level b before a is (x a) = (b a).
10
Example 2.11 Gambling game. Same setting as in the last example, but here
we have (asymmetric walk) that
P (Xi = +1) = p; P (Xi = 1) = 1 p , q
(typically p < q). Using the martingale of example 2.6, we get
q
p
x
=

q
p
a
P (ST = a) +

q
p
b
P (ST = b) :
Since P (ST = a) = 1 P (ST = b), we get
q
p
x
=

q
p
a
+
(
q
p
b


q
p
a)
P (ST = b) :
We solve this to get that the probability that starting in x we reach the level b is
q
p
x


q
p
a

q
p
b


q
p
a
:
Example 2.12 Duration of fair games. Consider a symmetric simple random
walk Sn = X1 + :::+Xn with S0 = 0. For a < 0 < b, let
T = min fn : Sn =2 (a; b)g :
Then we have
E [T ] = ab:
Proof: We have that Mn = S2n n is a martingale. Indeed:
Mn+1 Mn = (Sn +Xn+1)2 (n+ 1) S2n + n
= 2SnXn+1 +X
2
n+1 1:
Now, Sn is Fn-measurable, and Xn+1 is independent of Fn, with E [Xn+1] = 0 and
E

X2n+1

= 1. This yields
E [Mn+1 Mnj Fn] = E

2SnXn+1 +X
2
n+1 1
Fn
= 2SnE [Xn+1j Fn] + E

X2n+1
Fn 1
= 2SnE [Xn+1] + E

X2n+1
1
= 0:
One can show that E [T ] <1, so condition (ii) of the optional stopping theorem
is ful…lled. Applying this to the martingale S2n n yields
E [T ] = E

S2T

= a2
b
b a + b
2 a
b a =
ab2 + ba2
b a = ab:
11
3 Discrete-Time Markov Chains
3.1 Basic De…nitions
Example 3.1 (Gambling game) Consider a gambling game in which on any
turn you win $1 with probability p = 0:4 or lose $1 with probability 1 p = 0:6.
Let Xn be the amount of money you have after n turns. You stop the game if
you are either bankrupt (Xn = 0) or if you have reached a prespeci…ed gain of
$N . Then X0, X1, X2,... is a stochastic process which has the following ’Markov
property’: Given the present state Xn, any other information about the past is
irrelevant in predicting the next state Xn+1.
In the following, we …x a …nite set S, called the state space. The stochastic
processes considered in this chapter are assumed to have values in S unless other-
wise stated.
Notation 3.2 Let Y be a random variable. If we condition with respect to the
-algebra (Y ) generated by Y , i.e. the smallest -algebra with respect to which
Y is measurable, then we often write E [XjY ] instead of E [Xj (Y )]. If Y takes
values in S, then we have that (Y ) is generated by sets of the form f! : Y (!) = ig
where i 2 S.
De…nition 3.3 A discrete time stochastic process X = (Xn) is called aMarkov
chain with respect to a given …ltration (Fn) if for any function h : S! R we have
E [h (Xt+s)j Ft] = E [h (Xt+s)jXt]
for any s; t 2 N+. If this conditional expectation depends only on the time incre-
ment s, but not on t, then the chain is said to be homogenous.
We shall from now on focus on homogenous chains. Let i; j 2 S. We can then
form the quantities
p(i; j) = P (Xt+1 = jjXt = i)
which do not depend on t since the chain is homogenous. p (i; j) is called the one-
step transition probability from state i to state j. We can form the transition
matrix (p (i; j))i;j.
In the gambling game example, the transition probability for 0 < i < N is
p (i; i+ 1) = 0:4;
p (i; i 1) = 0:6:
12
Moreover, we have p (0; 0) = 1 and p (N;N) = 1. The transition matrix for N = 4
is (here the state space is S = f0; 1; 2; 3; 4g)
0 1 2 3 4
0 1:0 0 0 0 0
1 0:6 0 0:4 0 0
2 0 0:6 0 0:4 0
3 0 0 0:6 0 0:4
4 0 0 0 0 1:0
Properties of the transition matrix:
(1) we have 0 p (i; j) 1
(2) all rows sum up to one:
P
j p (i; j) = 1
Any matrix with these properties can be a transition matrix of some Markov chain
and is also called a stochastic matrix.
3.2 Multistep Transition Probabilities
We want to compute the probability of going from i to j in m > 0 steps (the
transition probability gives us this for one step). This probability is denoted by
pm (i; j).
Key idea: To go from i to j in m + n steps, we have to go from i to some
intermediary state k in m steps and then from k to j in n steps. The Markov
property implies that the two parts of our journey are independent.
This observation leads to the Chapman-Kolmogorov equation:
pm+n (i; j) =
X
k
pm (i; k) pn (k; j)
For example, for m, n = 1 we get
p2(i; j) =
X
k
p (i; k) p (k; j)
That is, the two step transition matrix is the matrix product of the one step
transition matrix with itself. It follows by induction
Theorem 3.4 The m step transition matrix pm = (pm(i; j))i;j is the mth power
of the one step transition matrix.
13
Example 3.5 (Coin tossing) Each turn, a fair coin is tossed. We have 3 classes,
hence S = f1; 2; 3g. If you score head, you move up one class unless you are in
1st class. If you score tail you move down unless you are 3rd class. The transition
matrix is
1 2 3
1 0:5 0:5 0
2 0:5 0 0:5
3 0 0:5 0:5
The two-step transition matrix is obtained by matrix multiplication0@ 0:5 0:5 00:5 0 0:5
0 0:5 0:5
1A
0@ 0:5 0:5 00:5 0 0:5
0 0:5 0:5
1A =
0@ 0:5 0:25 0:250:25 0:5 0:25
0:25 0:25 0:5
1A
For the 3step transition matrix we have to multiply0@ 0:5 0:25 0:250:25 0:5 0:25
0:25 0:25 0:5
1A
0@ 0:5 0:5 00:5 0 0:5
0 0:5 0:5
1A =
0@ 0:375 0:375 0:250:375 0:25 0:375
0:25 0:375 0:375
1A
and so on.
3.3 Classi…cation of States
De…nition 3.6 A state i is called recurrent if
P (Xn = i for some future n > 0jX0 = i) = 1;
otherwise it is called transient.
Example 3.7 S = f1; 2; 3g. The transition matrix is
1 2 3
1 1=3 1=3 1=3
2 0 1=2 1=2
3 0 3=4 1=4
Here 2 and 3 are recurrent, while 1 is transient (once you have gone to 2 or 3,
there is no way back to 1).
14
De…nition 3.8 We say that state i communicates with state j (write i ! j)
if one can get with positive probability from i to j, that is, pn (i; j) > 0 for some
n > 0. If i ! j and j ! i then i and j intercommunicate. If all states in a
chain intercommunicate with each other, then the chain is called irreducible.
In the previous example, state 1 communicates with states 2 and 3, but does not
intercommunicate with them. States 2 and 3 intercommunicate. So the example
chain is not irreducible.
Theorem 3.9 If i communicates with j, but j does not communicate with i, then
i is transient.
The explanation is that with positive probability we get from i to j but can never
go back.
De…nition 3.10 A set C S of states is called closed if it is impossible to get
out. That is, if i 2 C and j =2 C then p (i; j) = 0. A chain is irreducible if there
exists no closed set other than the set of all states. So it is irreducible if every
state can be reached from every other state.
Exercise 3.11 A seven-state chain: consider the transition matrix
1 2 3 4 5 6 7
1 :3 0 0 0 :7 0 0
2 :1 :2 :3 :4 0 0 0
3 0 0 :5 :5 0 0 0
4 0 0 0 :5 0 :5 0
5 :6 0 0 0 :4 0 0
6 0 0 0 0 0 :2 :8
7 0 0 0 1 0 0 0
Give a graphical representation of the chain in drawing arrows from i to j if i
communicates with j. Identify the closed sets. Which states are transient? Is the
chain irreducible?
Theorem 3.12 If C is a …nite, closed, and irreducible set, then all states in C
are recurrent.
The explanation consists of two steps: 1) In a …nite closed set C there has to be one
recurrent state: since C is closed, the chain spends an in…nite amount of time in
C; if it would spend only a …nite amount of time at each of the …nitely many states
15
of C, we would have a contradiction. 2) If i is recurrent and intercommunicates
with j, then j is also recurrent.
Theorem 3.13 (Decomposition Theorem) If the state space S is …nite, then
S can be uniquely written as a disjoint union
S = T [R1 [ ::: [Rk;
where T is a set of transient states and the Ri’s are closed sets of recurrent states.
De…nition 3.14 It is a theoretical possibility that the probability of coming back
is non-zero only after every nth turn and zero otherwise, i.e. pn (i; i) > 0 and
pm (i; i) = 0 for m = 1; 2; :::; n 1. The period of a state i is the largest number
that divides all n for which pn (i; i) > 0. If all states have period 1, then the chain
is called aperiodic. To exclude this technical nuisance, our chains will always
be assumed to be aperiodic. This phenomenon does not occur in continuous-time
Markov chains.
3.4 Limit Behaviour
Our goal in this section is to identify the long time behaviour of a Markov chain -
we are interested in (j) , limn!1 pn (i; j) in case that limit exists. The following
concept is the key to this problem.
De…nition 3.15 A stationary distribution is a vector with 0 (j) 1,P
j (j) = 1 which solves X
i
(i) p (i; j) = (j)
or shorter in matrix notation p = .
Theorem 3.16 If X0 has distribution , i.e., P (X0 = j) = (j), then so does
Xn for all n 1.
To see this, we denote with m the distribution after m steps. As a consequence
of the Chapman-Kolmogorov equation we have
m = p
m
where pm is themth power of the transition matrix. By de…nition of the stationary
distribution we have 1 = p = , 2 = 1p = p = , and so on.
16
Theorem 3.17 (Convergence theorem) Suppose we have an irreducible, ape-
riodic chain which has a stationary distribution (it follows from irreducibility
that must be unique). Then as n!1, pn (i; j)! (j). If the state space S is
…nite, then there exists such a . Moreover, if i is the expected no. of steps it
takes to go from i back to i, then (i) = 1=i.
Example 3.18 Consider a chain with the following transition matrix p:
1 2 3
1 :4 :6 0
2 :2 :5 :3
3 :1 :7 :2
To determine the stationary distribution, we have to solve the matrix equation
p = which is in fact three equations for the three unknowns (1), (2), (3):
0:4(1) + 0:2(2) + 0:1(3) = (1)
0:6(1) + 0:5(2) + 0:7(3) = (2)
0:3(2) + 0:2(3) = (3)
In addition, we have (1)+ (2)+ (3) = 1, so one of the equations is redundant.
The solution is
(1) = :22353; (2) = :56471; (3) = :21176
The number n of turns does not have to be very large for pn (i; j) to be close to
its limiting value (j). For example, already after 4 steps we have
p4 =
:2278 :5634 :2088
:2226 :5653 :2121
:2215 :5645 :2140
3.5 In…nite State Space
We study here the example of the re‡ecting random walk. This is a Markov
chain with state space S = f0; 1; 2; :::g. We have as transition probabilities (for
some p 2 (0; 1))
p (i; i+ 1) = p when i 0
p (i; i 1) = 1 p when i 1
p (0; 0) = 1 p
17
To …nd a stationary distribution we solve the equationX
i
(i) p (i; j) = (j) with
X
j
(j) = 1:
In this case p (i; j) = 0 if ji jj > 1. So we end up with
(0) (1 p) + (1) (1 p) = (0)
(0) p+ (2) (1 p) = (1)
(1) p+ (3) (1 p) = (2)
::: = :::
(i 1) p+ (i+ 1) (p 1) = (i)
and so on. We set (0) = c for some c to be determined later. By induction, the
solution to our equation is
(i) = c

p
1 p
i
:
We have three cases:
1. p > 1=2: then p= (1 p) > 1 and (i) increases exponentially fast. HenceP
i (i) = 1 cannot be satis…ed, and consequently, there does not exist a
stationary distribution .
2. p = 1=2: p= (1 p) = 1, so (i) = c and Pi (i) = 1. Again, there is no
stationary distribution.
3. p < 1=2: = p=(1 p) < 1, so Pi (i) = cPi i is a convergent geometric
series. Recall that we have for such a series
1X
i=0
i =
1
1 =
1 p
1 2p;
so taking c = (1 2p) = (1 p) we get that is a probability distribution.
One can show that the re‡ecting random walk is aperiodic. It is clearly irreducible.
Therefore the convergence theorem implies that for p < 1=2, the probability that
the walk is after n steps in state i converges for n ! 1 to (i). In this case,
the chain is recurrent. One can show that it is even positively recurrent: the
18
expected return time i to state i (i.e., the expected no. of steps it takes to go
from i back to i) equals i = 1= (i) <1.
On the other hand, if p > 1=2 then the chain is transient: once we leave a state,
with positive probability we will never come back (the chain ’explodes’).
The case p = 1=2 is a borderline case between recurrence and transience: the state
0 is recurrent, but 0 =1. In such a situation one says that 0 is null-recurrent.
4 Poisson-Processes
From now on, we will study continuous-time stochastic processes. The time set
is now R+ so that a stochastic process is a collection of random variables X =
(Xt)t0. The information about X is contained in a …ltration (Ft) = (Ft)t0 where
Fs Ft for s < t. Sometimes we consider …ltrations which contain not only
information about the process X but also about some other events or processes.
To express that an arbitrary …ltration (Ft) contains all information about X (and
possibly more) we use the expression ’X is adapted to (Ft)’which means that
for each t 0, Xt is Ft-measurable.
We say that a stochastic processM is a martingale relative to (Ft) if E [jMtj] <1
for all t and
(i) M is adapted to (Ft)
(ii) we have for all (s; t) with s < t that
E [Mtj Fs] =Ms:
Let T be a random time. As the event fT = tg often has probability zero, we
formulate the stopping time property now as follows: T is a stopping time if for
all t 0
fT tg 2 Ft:
There is also an optional stopping theorem in continuous time. However, some-
times it is a bit technical to check whether its assumptions are satis…ed. We will
tell you when that is the case.
Suppose f is a (deterministic) function of time. We often want to approximate
f (t) for small values of t and collect some remainder terms which are negligible
for small t in a ’bin’which we shall denote with the symbol o (t). For example,
consider the series expansion of the exponential function,
et = 1 + t+
1
2
t2 +
1
6
t3 + :::
= 1 + t+ o (t)
19
re‡ecting that for very small t we have
et t 1 + t:
More formally, a function r (t) is said to be o (t) (read: ’small oh of t’) if
lim
t#0
r(t)
t
= 0;
that is, r (t) goes faster to zero for t # 0 than the function f (t) = t.
4.1 Standard Poisson Process
Imagine you are running a car insurance company. Claims after accidents occur at
random times T1, T2, T3,... By time t we know whether the ith claim has arrived
or not, so the Tn are all stopping times (relative to the …ltration (Ft) which re‡ects
all the info your company’s agents and secret agents have gathered). We always
set T0 = 0.
Take now the process N = (Nt)t0 which counts the number of claims (at this
stage we are not yet looking whether it was a Rolls Royce or a Beetle which was
involved into the accident),
Nt $ number of claims which have arrived up to time t:
More formally we can de…ne N as follows: recall …rst the indicator function
fTntg =

1; if t Tn (!)
0; if t < Tn (!) :
Let now N be given as
Nt =
X
n1
fTntg;
N is called the counting process associated to the stopping time sequence (Tn).
As the Tn are stopping times, we have fTn tg 2 Ft for all n. Therefore Nt is
Ft-measurable for all t, and hence adapted to the …ltration (Ft).
De…nition 4.1 A counting process N is a Poisson process with intensity if
(i) its increments are independent from the past
(ii) it has stationary increments
(iii) E [Nt] = t for all t 0.
20
Here (i) means that for any s < t we have that NtNs is independent of Fs, and
(ii) that for all s; t and u; v such that t s = v u we have that the probability
distribution of Nt Ns is the same as that of Nv Nu (and hence the same as
that of Nts, since N0 = 0).
Theorem 4.2 Let N be a Poisson process with intensity . Then
(i) Nt t is a martingale (the so-called compensated Poisson process)
(ii) (Nt t)2 t is a martingale.
Proof. (i)We have already seen that N is adapted, hence this is also true for the
compensated Poisson process (we are subtracting a deterministic quantity from
N). As for the martingale property, we have for all s < t
E [ (Nt t) (Ns s)j Fs] = E [Nt Nsj Fs] t+ s
= E [Nt Ns] t+ s by independence
= t s t+ s by (iii)
= 0:
As Nss is Fs-measurable, we get E [ (Ns s)j Fs] = Nss (taking out what
is known), so we can conclude that
E [ (Nt t)j Fs] = Ns s:
(ii) This is an exercise.
We now want to relate the Poisson process to the Poisson distribution. For this we
need one result from analysis: Let the function f be a right-continuous solution
to the functional equation
f (t+ s) = f (t) f (s) ; for all 0 s < t:
Then either f (t) = exp (ct) for some constant c or f (t) = 0.
Theorem 4.3 Let N be a Poisson process with intensity . Then for n 0
P (Nt = n) =
et (t)n
n!
:
That is, Nt has the Poisson distribution with parameter t.
21
Proof. Step 1. For all t 0, P (Nt = 0) = et.
Since fNt = 0g = fNs = 0g \ fNt Ns = 0g for 0 s < t we get by the indepen-
dence of increments that
P (Nt = 0) = P (Ns = 0)P (Nt Ns = 0)
= P (Ns = 0)P (Nts = 0)
by the stationarity of increments. Let f (t) = P (Nt = 0). We have proven that
f (t) = f (s) f (t s) ; for all 0 s < t: As we can exclude the case f (t) = 0
(otherwise Nt =1 for all t which is absurd), it follows that P (Nt = 0) = ect for
a constant c which can be easily identi…ed at the end of the proof with .
Step 2. P (Nt 2) is o (t). That is, for very small t is the probability that the
Poisson process has more than one jump negligibly small. (We are not going into
details here)
Step 3.
lim
t#0
P (Nt = 1)
t
= lim
t#0
1 P (Nt = 0) P (Nt 2)
t
= lim
t#0
1 et + o (t)
t
= :
Step 4. Conclusion. For 0 < < 1, set
' (t) = E

Nt

:
We have by the independence and stationarity of the increments
' (t+ s) = E

Nt+s

= E

Nt+sNt+Nt

= E

Nt+sNt

E

Nt

= E

Ns

E

Nt

= ' (s)' (t) :
Again by our functional equation, this implies that there exists some (), de-
pending on , such that ' (t) = e () t. On the other hand, since ' (t) = E

Nt

,
' (t) =
X
n0
nP (Nt = n)
= P (Nt = 0) + P (Nt = 1) +
X
n2
nP (Nt = n) :
22
Moreover, () = '0 (0), the derivative of ' at zero. Therefore
() = lim
t#0
' (t) 1
t
= lim
t#0

P (Nt = 0) 1
t
+
P (Nt = 1)
t
+
o (t)
t

= + :
We get that
' (t) = e () t = et+t;
hence by the exponential series
' (t) =
X
n0
nP (Nt = n) = e
tX
n0
(t)n n
n!
:
Equating coe¢ cients of the two in…nite series yields
P (Nt = n) = e
t (t)
n
n!
for n 0;
as desired.
Recall that the claims in our model arrive at the times T1, T2, T3,... If the Poisson
process N is the associated counting process to this sequence of stopping times,
then we get by step 1 of the previous proof
P (T1 t) = P (Nt 1)
= 1 P (Nt = 0)
= 1 et:
By the stationarity of increments of a Poisson process, we also get that for each
n 0,
P (Tn+1 Tn t) = 1 et:
That is, the waiting time up to the next arrival of a claim is exponentially
distributed. Let us recall the density function f of the exponential distribution:
f (t) =

et for t 0
0 for t < 0:
It is often useful to think of f (t) dt as ’P (T = t) dt’. Using the density, we can
calculate the expected value of T = Tn+1 Tn, the waiting time for the arrival of
23
the next claim, as follows (use integration by parts):
E [T ] =
Z
t P (T = t) dt =
Z 1
0
t et dt
= tet1
0
+
Z 1
0
et dt =
1

:
Similarly,
E

T 2

=
2
2
;
so we get for the variance
var (T ) = E

T 2
(E [T ])2 = 1
2
:
4.2 Compound Poisson Process
Coming back to our car insurance model, so far we have only a model of how
many claims arrive up to a given time (as speci…ed by Nt), but we do know
nothing about the size of the claims (accident involved Rolls Royce or Beetle). To
take that size also into account, we will associate to each claim an independent
and identically distributed random variable Yi. By independent we mean that the
Yi are independent of each other and of the Poisson process of arrivals. The Yi
model the amount of money we have to pay for each claim. Naturally, we are
interested in the total amount of money we have to pay up to time t.
De…nition 4.4 Let N be a standard Poisson process, and Y1, Y2, ... as above.
The compound Poisson process S is then de…ned as (St = 0 if Nt = 0 and)
St = Y1 + :::+ YNt :
We calculate (if E [Nt] <1) that (with being the intensity of N)
E [St] =
X
n0
E [St jNt = n] P (Nt = n)
=
X
n0
nE [Yi] P (Nt = n)
= E [Yi]
X
n0
nP (Nt = n)
= E [Yi] E [Nt] = tE [Yi] :
24
By a similar computation one can get (if E [N2t ] <1) that
var (St) = var (Yi)E [Nt] + var (Nt) (E [Yi])
2
= t var (Yi) + (E [Yi])2 = tE (Yi)2 :
Let us recall that the moment-generating function for the Poisson distribution
(here for the random variable Nts) is given as
E

ecNts

= exp ((ec 1) (t s)) : (4.1)
We denote with MY (r) the moment-generating function for the random variable
Yi (which does not depend on i),
MY (r) = E

erYi

:
Theorem 4.5 We have
E
"
exp

r
NtX
i=Ns+1
Yi
!#
= e(MY (r)1)(ts):
Proof. Recall that we can write ax = exp (x log (a)). By the stationarity and
independence of the increments of N , as well as that the Yi’s are independent of
N , we get
E
"
exp

r
NtX
i=Ns+1
Yi
!#
= E
"
exp

r
NtsX
i=1
Yi
!#
(multiplicative property of exp) = E
"
NtsY
i=1
exp (rYi)
#
(Yi are independent of N) = E
"
NtsY
i=1
E [exp (rYi)]
#
= E
"
NtsY
i=1
MY (r)
#
= E
h
(MY (r))
Nts
i
= E [exp (Nts log (MY (r)))] :
We now set c = log (MY (r)) into the expression (4.1) for the moment generating
function for Nts to end up with
E [exp (Nts log (MY (r)))] = e(MY (r)1)(ts):
25
4.3 Ruin Probabilities for the Classical Risk Process
Let us imagine an insurance company which starts with an initial capital u, col-
lects premiums at a rate c, and has payments modelled by a compound Poisson
process, as discussed above. Then we get the following model for the surplus of
the insurance portfolio:
Ct = u+ ct
NtX
i=1
Yi:
C is called the classical risk process. We assume that Yi 0 and that the Yi’s
are independent and identically distributed as in the last subsection. Let
= E [Yi] <1:
A somewhat unpleasant event for the company happens at the ruin time
= inf ft > 0 : Ct < 0g ;
where we follow the convention that inf ? = 1. Therefore, the board of the
company as well as regulators might want to have a close look on the (ultimate)
ruin probability
(u) = P ( <1) :
The ruin probability is a function of the initial capital u. If (u) is large, then
regulators might force the company to start business with more initial reserves,
that is, to work with a larger u. Obviously, (u) is decreasing in u. We are
interested in obtaining some description of the dependence of on u. Let us
impose one more assumption, the so-called net pro…t condition:
c > :
We can interpret this condition by recalling from last section that the expected
value of a compound Poisson process is given as
E
"
NtX
i=1
Yi
#
= t:
Therefore, the net pro…t condition just tells that the premium rate (our income) is
higher than the expected rate of our claim payments. It is a necessary requirement
to get an insurance company to be interested into business. Moreover, we assume
26
that the claims are not too heavy-tailed by requiring the moment-generating func-
tion of the claim distribution to be …nite,
MY (r) = E [exp (rYi)] <1:
The crucial observation which will lead to the description of the ruin probability
will be obtained by solving the following problem:
Find r > 0 such that exp (rCt) is a martingale!
In other words, we are looking for an exponential martingale related to the risk
process C. We compute by using the independence and stationarity of increments,
the de…nition of the risk process C, and the result of Theorem 4.5 that
E

erCt
Fs
= E

er(CtCs)erCs
Fs
(taking out + independence) = erCsE

er(CtCs)

(denition of C) = erCsrct+rcsE
"
exp

r
NtX
i=Ns+1
Yi
!#
(stationarity) = erCsrct+rcsE
"
exp

r
NtsX
i=1
Yi
!#
(Theorem 4:5) = erCsrct+rcs exp ( (MY (r) 1) (t s))
We now de…ne a quantity (r) as
(r) = (MY (r) 1) rc
such that we get
E

erCt
Fs = erCs exp ( (r) (t s)) :
Therefore, exp (rCt) is a martingale provided that
(r) = 0:
One solution to this equation is r = 0 which is useless since exp (rCt) is just
a constant for r = 0. Some analysis, however, shows that there might be one
additional solution R > 0 to this equation. In case this solution exists we call it
the adjustment coe¢ cient R.
27
Example 4.6 Let the claims Yi be Exp ()-distributed. One computes
MY (r) = E [exp (rYi)] =

r :
For (R) = 0 we have to solve



R 1

Rc = 0:
Here a non-zero solution R exists:
R =
c
;
we have R > 0 (equivalent to c > =) because of the net pro…t condition (re-
call that the exponential distribution has a mean of 1=). However, often the
exponential distribution is a rather poor approximation to the ’real’ claim size
distribution, so we need to consider a more general situation.
Let us reformulate the result which we obtained above as follows.
Theorem 4.7 Assume that the adjustment coe¢ cientR exists. Then exp (RCt)
is a martingale.
Before we proceed further, let us collect one more result from martingale theory
(without proof).
Theorem 4.8 Stopped martingales are martingales. Let (Mt) be a martin-
gale, a stopping time, ^ t = min ( ; t). Then (M^t) is a martingale as well (it
is equal to (Mt) up to time , afterwards it stays constant at level M ).
Having found in Theorem 4.7 an exponential martingale related to the risk process
C, the proof of the following celebrated theorem is surprisingly easy.
Theorem 4.9 Assume that the adjustment coe¢ cient R exists. Then
(u) < eRu:
Proof. Recall that (u) = P ( <1) where is the time of ruin. By the optional
stopping theorem, applied to the stopped martingale exp (RCt), we get
eRu = eC0 = E

eRC^t
E eRC^tt = E eRCt :
28
Letting now t tend to in…nity, we get (note that C 0, hence eRC 1)
eRu E eRC<1 > E [<1] = P ( <1) = (u) :
It results that the ruin probability (u) decreases rather quickly, namely expo-
nentially, with increasing initial reserve u. So running an insurance company is
not a too risky business, provided the company is equipped with plenty of initial
capital.
5 Stochastic Calculus for Piecewise Di¤erentiable
Processes
5.1 Stochastic Integration
Consider a risky asset whose price is modelled by a continuous-time stochastic
process (St). A portfolio (t) is a stochastic process describing the amount t of
the asset held by the investor at time t. If the investor follows a dynamic trading
strategy, she would buy or sell at certain random transaction dates modelled
as stopping times 0 = T0 < T1 < T2 < ::: < Tn < Tn+1 = T . Between two
transaction dates Ti and Ti+1 the portfolio remains unchanged and we will denote
its composition by i. The portfolio may then be expressed as
t = 0t=0 +
nX
i=0
i ]Ti;Ti+1](t):
The new portfolio i is chosen based on the information available at time Ti, it
is thus FTi-measurable. On the other hand, when the market maker sets the new
price at time Ti, the portfolio is still described as i1; it takes its new value i
right after Ti. Therefore, the indicator function must be of the form ]Ti;Ti+1] (left-
continuous) as opposed to [Ti;Ti+1[ (right-continuous). We say that is a simple
predictable process.
Between Ti and Ti+1 we hold i shares of the asset, with the asset price moving
by STi+1 STi. Hence the outcome of our trading strategy is given byZ t
0
u dSu := 0 S0 +
nX
i=0
i

STi+1^t STi^t

:
29
We call this expression the stochastic integral of with respect to S. Thus, the
stochastic integral
R t
0
dS represents the capital accumulated between 0 and t by
driving the trading strategy .
Example 5.1 (Trading strategies have to be predictable)
Let St = t Nt where Nt denotes a Poisson process with intensity ; it is a
martingale. Denote by T1 the time of its …rst jump. Consider now the strategy
which consists in buying one unit of the asset S at t = 0 for the price of zero and
selling it right before the price falls down, i.e. t = [0;T1[ (t). This strategy is
right-continuous, not left-continuous, and thus not simple predictable. The capital
gain associated to this strategy is given byZ t
0
u dSu = t for t < T1;
= T1 for t T1:
This strategy is a so-called arbitrage opportunity: it requires zero initial capital,
but leads to a strictly positive outcome without any downside risk. Such strategies
are extremely unrealistic and should be ruled out in any reasonable model. In
particular, this strategy would allow to sell right before the crash (which in reality
could not be foreseen).
Proposition 5.2 (Stochastic integration preserves the martingale prop-
erty)
If S is a martingale, then for any simple predictable process the stochastic
integral
R
dS is a martingale as well.
Proof. We will just take a look at the special case where we trade only at one
stopping time s < T1 < t. Then we have by de…nition of the stochastic integral
E
Z t
0
u dSu
Fs = 0S0 + E [0 (ST1 S0) + 1 (St ST1)j Fs] :
For the …rst term we use that S is a martingale (and the optional stopping theo-
rem):
E [0 (ST1 S0)j Fs] = 0 (E [ST1 j Fs] S0)
= 0 (Ss S0) :
30
We evaluate the second term by using the tower property together with ’taking
out what is known’and the martingale property of S:
E [1 (St ST1)j Fs] = E [E [1 (St ST1)j FT1 ]j Fs]
= E [1E [ (St ST1)j FT1 ]j Fs]
= 0:
Summing up, we get
E
Z t
0
u dSu
Fs = 0S0 + 0 (Ss S0)
=
Z s
0
u dSu:
Consider now a right- or left-continuous adapted stochastic process . We par-
tition now our interval [0; t] by using a ’time grid’, i.e. random partition points
0 = 0 < 1 < :::n = t, where the i are stopping times. If we now consider …ner
and …ner partitions, then one can show that
0 S0 +
nX
i=0
i

S i+1^t S i^t
! Z t
0
u dSu; (5.1)
where u = lim"#0 u". More precisely, one can de…ne a stochastic integralR t
0
u dSu as the limit of the approximating sums in (5.1). Important here is
again that the integrand has to be left-continuous, not right-continuous – the
reason why we work with u instead of u. This guarantees in particular that if
S is a martingale, then the stochastic integral
R
dS is a martingale as well (the
last statement is not completely rigorous, but essentially true).
5.2 The Chain Rule and Integration by Parts
Let us consider a stochastic process X which is right-continuous, i.e.
Xt = Xt+ := lim
"#0
Xt+"
and has left limits, i.e.
Xt = lim
"#0
Xt"
31
exists. One also abbreviates this by saying that X is an RCLL process. In case
Xt 6= Xt the process exhibits a jump of size
Xt = Xt Xt
We furthermore assume that every path of X is piecewise di¤erentiable (PD), that
is, it is di¤erentiable with the possible exception of a countable set of points. We
may then write
Xt = X0 +
Z t
0
xu du+
X
0Xu
or in di¤erential notation
dXt = xt dt+Xt
Let now f be a real-valued function with continuous derivative. We can form the
composed function f (Xt). At the exceptional points f (Xt) changes only at jump
points, and then it jumps by f (Xt) f (Xt). In the open intervals where X is
continuously di¤erentiable we have the usual chain rule at our disposal. In total
we get
df (Xt) = f
0 (Xt)xt dt+ f (Xt) f (Xt)
or in integral form
Theorem 5.3 (Itô’s formula for piecewise di¤erentiable processes)
f (Xt) = f (Xs) +
Z t
s
f 0 (Xu)xu du+
X
sff (Xu) f (Xu)g :
Given two stochastic processes X; Y and a time grid on [0; t] we can de…ne the
’realized covariance’X
i

X i+1 X i

Y i+1 Y i

=
X
i

X i+1Y i+1 X iY i Y i

X i+1 X i
X i Y i+1 Y i
= XtYt X0Y0
X
i

Y i

X i+1 X i

+X i

Y i+1 Y i

! XtYt X0Y0
Z t
0
Yt dXt
Z t
0
Xt dYt:
32
Considering …ner and …ner grid size, we call the limit the quadratic covariation of
X and Y , denoted by [X; Y ]. Therefore,X
i

X i+1 X i

Y i+1 Y i
! [X; Y ]t ;
and we have proved
Theorem 5.4 (Integration by parts formula)
XtYt X0Y0 =
Z t
0
Xt dYt +
Z t
0
Yt dXt + [X; Y ]t
Note that, in contrast to the version of the Itô-formula we have proved so far,
X and Y do not have to be PD for the integration by parts formula to be valid.
However, if X and Y are PD, then we get from the approximation
[X; Y ]t =
X
0st
XsYs.
In particular, since we have for a Poisson process N (jump size one!) that
Nt =
X
0st
Ns =
X
0st
(Ns)
2 ;
we calculate that [N;N ]t = Nt. Moreover, we have for a compensated Poisson
process that [N t;N t]t = Nt as well.
5.3 The Stochastic Exponential
We consider a RCLL and PD stochastic process X. The stochastic exponential of
X, denoted by Z = E (X), is de…ned as
Zt = exp (X
c
t )
Y
0(1 + Xs) ;
where Xct = Xt
P
stXs.
Theorem 5.5 The stochastic exponential Z is the unique solution to the equation
Zt = 1 +
Z t
0
Zs dXs;
or, written in di¤erential form,
dZt = Zt dXt:
33
Proof. Let us assume that everywhere Xt = 0. Then we have Zt = exp(Xt),
and the statement is a well-known result from classical analysis. If, on the other
hand, dXt = 0 with the possible exception of countably many jump points where
dXt = Xt, then Zt=Zt = 1 +Xt, and we get
dZt
Zt
=
Zt Zt
Zt
= (1 + Xt) 1 = dXt:
The general case is a combination of the two forementioned special cases.
As an application we calculate the moment generating function of a Poisson dis-
tribution:
Proposition 5.6 Let N be a Poisson process with intensity . Then
E

ecNt

= exp ((ec 1)t) :
Proof. Let eNt = Nt t denote the compensated Poisson process, it is a martin-
gale. For > 1 we denote Xt = eNt. Clearly, Xct = t. Consider now the
di¤erential equation
dZt = Zt d eNt = Zt dXt
Z0 = 1:
Its solution is the stochastic exponential Z = E (X) where
Zt = exp (X
c
t )
Y
0(1 + Xs)
= et
Y
st
(1 + Ns)
= et (1 + )Nt
= exp (Nt ln (1 + ) t)
= exp (cNt (ec 1)t)
if we set c = ln (1 + ), hence = ec 1. Now Z is a martingale since it is
the stochastic integral of a compensated Poisson process (which is a martingale),
Zt = 1 +
R t
0
Zs d eNs, with Z0 = 1. Therefore E [Zt] = E [Z0] = 1, hence
E

ecNt

= exp ((ec 1)t) :
34
5.4 Jump Times and Hazard Rates
We are given a stochastic process X of state variables which in‡uence the economy
via stochastic interest rates r(Xt), and via a process (Xt) which will later be
interpreted as hazard rate process. The …ltration generated by X will be denoted
by (Gt). Let E1 be a random variable independent of (Gt) which is exponentially
distributed. Let (Ft) be the larger …ltration which encompasses (Gt) and the
information about E1. Then we can construct a (Ft)-stopping time as follows:
de…ne
= inf

t :
Z t
0
(Xs) ds E1

:
In case (Xs) = (constant ) this would give us the …rst jump time of a Poisson
process with intensity . In this sense the de…nition of is a generalization to the
case of stochastic intensity. Recall now that, since E1 Exp(1), we have for an
arbitrary time T that
P (T < E1) = exp (T ) :
Therefore, we get by independence and by the fact that (Xs) is Gt-measurable
for all s < t that
P ( > tj Gt) = P
Z t
0
(Xs) ds < E1
Gt
= exp


Z t
0
(Xs) ds

:
This identity gives us, in analogy with the deterministic case studied in survival
analysis, the interpretation of (Xt) as hazard rate process. Our goal is now to
calculate the expected discounted value p (0; t) of a survivor bond, which pays out
one unit if the policyholder has survived up to time t (actuarial interpretation), or
of a defaultable bond, which pays out one unit if t is smaller than the default time
(credit risk interpretation). In both cases the payout function of the claim is
f>tg. We get (recall that E

f>tg
Gt is just another notation for P ( > tj Gt))
p(0; t) = E

exp


Z t
0
r (Xs) ds

f>tg

= E

E

exp


Z t
0
r (Xs) ds

f>tg
Gt
= E

exp


Z t
0
r (Xs) ds

E

f>tg
Gt
35
= E

exp


Z t
0
r (Xs) ds

exp


Z t
0
(Xs) ds

= E

exp


Z t
0
(r + ) (Xs) ds

:
Here the interest rate r (Xs) has been replaced by the actuarial interest rate
(r + ) (Xs). Let now Nt = ftg be the survival process associated to . It
is possible (but rather tedious) to show that
Mt = Nt
Z t
0
u f>ug du = Nt
Z t^
0
u du
is a martingale with respect to (Ft). We call
R t^
0
u du the compensator of N .
6 Brownian Motion
6.1 De…nition and First Properties
De…nition 6.1 B is a standard (one-dimensional) Brownian motion if it satis…es
the following conditions: (B0 = 0 and)
(i) Its increments are independent of the past
(ii) It has stationary increments
(iii) The mapping t! Bt is continuous
(iv) Bt is N (0; t) distributed (normal distribution with mean 0 and variance t)
Theorem 6.2 Brownian motion exists.
De…nition 6.3 (B1; :::; Bn) is a standard n-dimensional Brownian motion if each
of its components are independent one-dimensional Brownian motions.
Let us recall that a normal distribution is characterized by its mean and its
variance . Multidimensional normal distributions (Z1; :::; Zn) are characterized
by giving the vector of means i = E [Z
i] and the covariance matrix:
ij = E

Zi i

Zj j

Let us now list some properties.
36
Self-similarity. The processes (Bct) and (
p
cBt) have the same distribution (c >
0): speeding up time by a factor of c does not change properties (i) (iii), but Bct
has variance ct. This can also be achieved by multiplying a standard Brownian
motion by
p
c.
Covariance formula. E [BsBt] = min (s; t): let s < t. Writing Bt = Bs +
(Bt Bs) and using the fact that Bs and (Bt Bs) are independent with mean
0, we have
E [BsBt] = E

B2s

+ E [Bs (Bt Bs)] = s = min (s; t) :
Strong Markov property. If T is a stopping time, then BT+t BT , t 0 is a
Brownian motion independent of Br, r T .
Quadratic variation. Note that we had de…ned the quadratic covariation [X;Y ]
of two processes X, Y as the limit (the ti’s form a partition n: 0 = t0 < t1 <
::: < tn of [0; t]) X
i

Xti+1 Xti

Yti+1 Yti
! [X; Y ]t :
Our goal is now to determine [B;B]t. We set mesh(n) = maxi jti+1 tij. Let us
now consider …ner and …ner partitions m n for m < n such that mesh(n)!
0 for n!1. We have to evaluate the limit of
Qn(t) =
X
i

Bti+1 Bti
2
:
Note …rst that
E [Qn(t)] =
X
i
E
h
Bti+1 Bti
2i
=
X
i
(ti+1 ti) = t:
Furthermore, recall that var(X) = E [X2] (E [X])2. By the independence of
increments of a Brownian motion we thus get
var (Qn(t)) =
X
i
var

Bti+1 Bti
2
=
X
i

E
h
Bti+1 Bti
4i (ti+1 ti)2 :
By self-similarity, we know that Bti+1Bti is distributed like
p
(ti+1 ti)B1 where
B1 N (0; 1). Recall that for the N (0; 1)-RV B1 we have E

(B1)
4 = 3. Hence
E
h
Bti+1 Bti
4i
= E

(ti+1 ti)2 (B1)4

= 3 (ti+1 ti)2 ;
37
and consequently
var (Qn(t)) =
X
i

3 (ti+1 ti)2 (ti+1 ti)2

= 2
X
i
(ti+1 ti)2 :
Letting now mesh(n)! 0 we obtain
var (Qn(t)) 2mesh (n)
X
i
(ti+1 ti) = 2mesh (n) t! 0:
Since var (Qn(t)) = E

(Qn(t) t)2

we have proved that Qn(t) converges to t in
mean-square. One can show that this implies that [B;B]t = t, since [B;B]t is the
limit of Qn(t) (in an appropriate sense). This result is remarkable: we have seen
earlier on that for a PD stochastic process X,
[X;X]t =
X
0st
(Xs)
2 :
As Brownian motion has continuous paths, B = 0 everywhere, and hence the
paths of a Brownian motion cannot be piecewise di¤erentiable.
6.2 Martingales related to Brownian motion
Let (Ft) be the …ltration generated by the Brownian motion B, i.e. Ft contains
all information about B up to time t.
Theorem 6.4 B is a martingale.
Proof. Given Fs, the value of Bs is known while Bt Bs is independent and has
mean 0.
E [Btj Fs] = E [ (Bt Bs) +Bsj Fs]
= E [Bt Bs] +Bs
= Bs:
Let now
Ta = min ft 0 : Bt = ag
38
be the …rst time the Brownian motion hits a. One can show that
P (Ta <1) = 1:
In the following, we will apply the following version of the optional stopping the-
orem:
Theorem 6.5 Let M be a martingale with continuous paths. Suppose T is a
stopping time with
P (T <1) = 1;
and there is a constant K such that jMT^tj K for all t. Then
E [MT ] = E [M0] :
Example 6.6 Exit distribution from an interval. De…ne the exit time by
(a < 0 < b)
= min ft : Bt =2 (a; b)g :
Then we have
P (B = b) =
a
b a and P (B = a) =
b
b a:
Proof. Let us …rst check that is a stopping time. We have
f sgc = f > sg = fBr 2 (a; b) for all r sg ;
and the last event is certainly Fs-measurable. Since Fs is a -algebra, it contains
with f sgc also f sg, hence is indeed a stopping time. The fact that
P ( <1) = 1 can be shown by noting that Ta while P (Ta <1) = 1.
Moreover, jB^tj jaj+ b, so we may use the optional stopping theorem to get
0 = E [B ] = aP (B = a) + bP (B = b) :
Our result follows since P (B = a) = 1 P (B = b) and solving the linear equa-
tion.
Theorem 6.7 (B2t t) is a martingale.
39
Proof. Given Fs, the value of Bs is known while Bt Bs is independent with
mean 0 and variance t s. We get
E

B2t
Fs = E (Bs + (Bt Bs))2Fs
= B2s + 2BsE [Bt Bsj Fs] + E

(Bt Bs)2
Fs
= B2s + 0 + t s:
Subtracting t from both sides yields the result.
Example 6.8 Mean exit time from an interval. For a < 0 < b, de…ne the
exit time by
= min ft : Bt =2 (a; b)g :
Then
E [ ] = ab:
Proof. It is possible (but more di¢ cult to show than in the last example) to
apply the conclusion of the optional stopping time. This gives with the formula
for the exit distribution from the last example
0 = E

B2

= b2 a
b a + a
2 b
b a E [ ] :
Rearranging now gives
E [ ] =
b2a+ ba2
b a = ba:
We can derive one surprising conclusion from this example. Recall that Ta =
min ft 0 : Bt = ag and let a;N = min ft : Bt =2 (a;N)g, so Ta a;N and there-
fore
E [Ta] E [a;N ] = aN !1
as N !1. The same conclusion holds for a > 0 by symmetry, so we have for all
a 6= 0 that
E [Ta] =1:
40
7 Stochastic Calculus for Brownian Motion
7.1 Stochastic Integral of Elementary Strategies
Let Bt denote the price of some stock at time t, we assume that B is a Brownian
motion (historically, this was one of the oldest models for stock prices). Assume
we buy a random amount of shares of this stock at time s for the price Bs, wait
until time t > s and sell it then for a price Bt. It is possible that is negative
(this amounts to short-selling the stock). We assume that we have no insider
information about future stock prices. Therefore should be a Fs-measurable
random variable. Our gain (or loss) from this strategy is given asZ t
s
dBu := (Bt Bs) :
We are interested in calculating conditional mean and variance of our gain in this
case. We have by the tower property, the fact that is Fs-measurable and the
independence of Bt Bs from Fs
E [ (Bt Bs)j Fs] = E [ (Bt Bs)j Fs]
= E [Bt Bs]
= 0:
Moreover,
E

2 (Bt Bs)2
Fs = 2E (Bt Bs)2Fs
= 2E

(Bt Bs)2

= 2(t s):
We now assume that we can trade at n random time points 0 = T1 ::: Tn = T .
At each time ti we hold an amount of i shares where i is Fti-measurable. Our
gain (or loss) over the whole trading period is then given asZ T
0
u dBu :=
nX
i=1
i

BTi+1 BTi

We call this the stochastic integral of the strategy with respect to the Brownian
motion B. In this de…nition it is essential that i is FTi-measurable. Otherwise
we could predict future increments of the stock price and make a riskless pro…t.
41
For instance, if we can foresee the future then we can follow the strategy #i =
sign(BTi+1 BTi) to getZ T
0
#u dBu =
nX
i=1
sign(BTi+1 BTi)

BTi+1 BTi

=
nX
i=1
BTi+1 BTi 0
The stochastic integral
R T
0
u dBu is a random variable. We can de…ne a stochastic
integral process as follows for any t 2 [0; T ]:Z t
0
u dBu :=
nX
i=1
i

BTi+1^t BTi^t

Note that if t < Ti < Ti+1, then BTi+1^t BTi^t = 0. If Ti < t < Ti+1 then
BTi+1^tBTi^t = BtBTi, and if Ti < Ti+1 < t then BTi+1^tBTi^t = BTi+1BTi.
We always assume that our strategy is not too ’wild’in the sense that
E
"
nX
i=1
2i (Ti+1 Ti)
#
<1:
Generalizing our considerations above (trading at one time point only), one can
prove the following result.
Theorem 7.1 The stochastic integral process
R t
0
u dBu is a martingale with con-
tinuous paths. Moreover,
E
"Z t
s
u dBu
2Fs
#
= E
Z t
s
2u du
Fs
Note that we are only trading at n discrete time points, this formula (for s = 0)
can also be written as
E
24 nX
i=1
i

BTi+1^t BTi^t
!235 = E " nX
i=1
2i (Ti+1 Ti)
#
42
7.2 Stochastic Integral for Left-Continuous Integrands. Itô’s
Formula.
Let us now consider a left-continuous trading strategy which has right-limits and
which is adapted to (Ft), hence for all t we have that t is Ft-measurable. It is
possible to approximate such a strategy by the elementary strategies considered
in the last section. One can show that the corresponding elementary stochastic
integrals then also approximate a random variable which we call the stochastic
integral of with respect to B and denote by
R t
0
u dBu. If we consider it as
a function of t, then we get a stochastic process
R
dB. It has the same two
properties as in the last section.
Theorem 7.2 Let be left-continuous with right-limits, and such thatE
hR T
0
2u du
i
<
1. Then
(i)
R
dB is a martingale with continuous paths, in particular (for 0 < s < t < T )
E
Z t
s
u dBu
Fs = 0
(ii) We have
E
"Z t
s
u dBu
2Fs
#
= E
Z t
s
2u du
Fs
Please note that since we assume that F0 is the trivial -algebra (containing only

and ?), in case s = 0 the two formulas read as
E
Z t
0
u dBu

= 0;
E
"Z t
0
u dBu
2#
= E
Z t
0
2u du

:
Quadratic variation. Regarding the quadratic variation for stochastic integrals,
one can show thatZ
dB;
Z
# dB

t
=
Z t
0
u#u d [B;B]u =
Z t
0
u#u du:
On the other hand, if one of the integrals is PD (i.e., an integral with respect to
du), then we get Z
dB;
Z
# du

t
= 0:
43
We would like to develop a calculus for the stochastic integral with respect to
Brownian motion. In classical calculus, the key for doing computations is the
fundamental theorem relating integration and di¤erentiation, and we want to gen-
eralize this. Let f now be a two times continuously di¤erentiable function, and
consider the following heuristic motivation. Taylor’s theorem tells us that we have
f(y) f(x) = f 0(x) (y x) + 1
2
f 00(x) (y x)2 +R
where R is some remainder term. We now partition the intervall [0; t] as 0 = t0 <
t1 < ::: < tn = t and write as a telescoping sum
f (Bt) f (B0) =
nX
i=0

f

Bti+1
f (Bti)
=
nX
i=0
f 0 (Bti)

Bti+1 Bti

+
1
2
nX
i=0
f 00(Bti)

Bti+1 Bti
2
+R
If we now make the partition …ner and …ner, the …rst term on the right-hand side
is approximately
R t
0
f 0(Bu) dBu. What about the second term? We have by the
tower property
E
"
nX
i=0
f 00(Bti)

Bti+1 Bti
2#
= E
"
nX
i=0
E
h
f 00(Bti)

Bti+1 Bti
2Ftii
#
= E
"
nX
i=0
f 00(Bti)E
h
Bti+1 Bti
2Ftii
#
= E
"
nX
i=0
f 00(Bti)E
h
Bti+1 Bti
2i#
= E
"
nX
i=0
f 00(Bti) (ti+1 ti)
#
and the last sum inside of the expectation tends with …ner and …ner partitions toR t
0
f 00(Bu) du. But one can get even more: it is possible to prove that on a set with
probability one we have (at least for certain subsequences of partitions) that
nX
i=0
f 00(Bti)

Bti+1 Bti
2 ! Z t
0
f 00(Bu) du:
44
Moreover, the remainder term R goes to zero. We have motivated the most im-
portant result in stochastic calculus.
Theorem 7.3 Itô’s formula Let f be a two times continuously di¤erentiable
function, B a Brownian motion. Then
f (Bt) f (B0) =
Z t
0
f 0(Bu) dBu +
1
2
Z t
0
f 00(Bu) du:
This structure of this formula looks like the fundamental theorem of classical
calculus,
f(t) f(0) =
Z t
0
f 0(u) du;
however, here we have an additional term involving the second derivative of f .
Therefore, the formulas in stochastic calculus are di¤erent from those in classical
calculus. Let us now show how to apply Itô’s formula to evaluate stochastic
integrals.
Example 7.4 Take f (x) = x2. We have f 0(x) = 2x, f 00(x) = 2. Itô’s formula
yields (assuming B0 = 0)
B2t =
Z t
0
2Bu dBu +
1
2
Z t
0
2 du
= 2
Z t
0
Bu dBu + t
Rearranging gives us Z t
0
Bu dBu =
1
2
B2t
1
2
t
which is indeed surprising if we compare it withZ t
0
u du =
1
2
t2:
There is also a multidimensional, time-dependent generalization of Itô’s formula.
However, for the purpose of this lecture, we will only need the following corollaries
from this.
45
Theorem 7.5 Time-dependent Itô’s formula. Let f (t; x) be continuously
di¤erentiable: once in t, twice in x. Then
f (t; Bt)f (0; B0) =
Z t
0
@
@t
f (u;Bu) du+
Z t
0
@
@x
f(u;Bu) dBu+
1
2
Z t
0
@2
@x2
f(u;Bu) du:
Example 7.6 Let f (t; x) = exp (x t=2). Then @
@t
f (t; x) = 1
2
exp (x t=2),
@
@x
f(t; x) = exp (x t=2), @2
@x2
f(u;Bu) = exp (x t=2). It follows that
exp (Bt t=2) = 1 1
2
Z t
0
exp (Bu u=2) du+
Z t
0
exp (Bu u=2) dBu
+
1
2
Z t
0
exp (Bu u=2) du
= 1 +
Z t
0
exp (Bu u=2) dBu:
Theorem 7.7 Let h (t; x) be a space-time harmonic function. That is, it ful…lls
the partial di¤erential equation
@
@t
h (t; x) +
1
2
@2
@x2
h (t; x) = 0:
Let moreover E
hR t
0

@
@x
h (u;Bu)
2
du
i
<1. Then h (t; Bt) is a martingale, and
h (0; 0) = E [h (t; Bt)] :
Proof. By the time-dependent Itô’s formula, we have
h (t; Bt) = h (0; B0) +
Z t
0
@
@x
h(u;Bu) dBu;
and by property (i) of the stochastic integral,
R t
0
@
@x
h(u;Bu) dBu is a martingale
with
E
Z t
0
@
@x
h(u;Bu) dBu

= 0:
Theorem 7.8 Integration by parts. We recall the formula from the previous
chapter:
XtYt X0Y0 =
Z t
0
Xt dYt +
Z t
0
Yt dXt + [X; Y ]t
46
Let , # be strategies such that the stochastic integrals
R
dB,
R
# dB are well-
de…ned. The derivation of the integration by parts formula in the previous chapter
carries over to the Brownian case as well, so we get, noting that all processes
involved here are left-continuous, and the stochastic integrals have zero initial
value, that (setting X =
R
dB, Y =
R
# dB)Z t
0
u dBu
Z t
0
#u dBu =
Z t
0
Z u
0
v dBv #u dBu+
Z t
0
Z u
0
#v dBv u dBu+
Z t
0
u#u du:
Example 7.9 We calculate the iterated integral
R t
0
R u
0
Bv dBv dBu by using the
integration by parts formula for = 1, # = B:Z t
0
Z u
0
Bv dBv dBu = Bt
Z t
0
Bu dBu
Z t
0
B2u dBu
Z t
0
Bu du: (7.1)
We have already computed Z t
0
Bu dBu =
1
2
B2t
1
2
t: (7.2)
Moreover, applying Itô’s formula to f (x) = x3 yields
B3t = 3
Z t
0
B2u dBu + 3
Z t
0
Bu du;
hence Z t
0
B2u dBu =
1
3
B3t
Z t
0
Bu du: (7.3)
Plugging (7.2), (7.3) into (7.1), we getZ t
0
Z u
0
Bv dBv dBu =
1
2
B3t
1
2
tBt 1
3
B3t
=
1
6
B3t
1
2
tBt:
Example 7.10 Ornstein-Uhlenbeck process. Let
Xt = e
t
Z t
0
es dBs:
47
We have by integration by parts (et = R t
0
es ds)
et
Z t
0
es dBs =
Z t
0
eses dBs
Z t
0
Z s
0
eu dBu e
s ds
= Bt
Z t
0
Xs ds:
Hence X satis…es the stochastic integral equation
Xt =
Z t
0
Xs ds+Bt:
Example 7.11 Brownian bridge. We consider now the stochastic integral
equation (for t < 1)
Xt =
Z t
0
Xs
1 s ds+Bt;
and show by integration by parts that a solution is given as
Xt = (1 t)
Z t
0
1
1 s dBs:
Indeed:
(1 t)
Z t
0
1
1 s dBs =
Z t
0
1 s
1 s dBs
Z t
0
Z s
0
1
1 udBu dt
= Bt
Z t
0
Xs
1 s dt:
We have X0 = 0. One can show that limt!1Xt = 0 as well. To motivate this, we
calculate (for t < 1)
E [Xt] = (1 t)E
Z t
0
1
1 s dBs

= 0
and
E

X2t

= (1 t)2E
"Z t
0
1
1 s dBs
2#
= (1 t)2E
Z t
0
1
(1 s)2 ds

= (1 t)2

1
1 t 1

! 0 as t! 1:
48
7.3 Itô-Processes and Stochastic Di¤erential Equations
Itô-processes: these are stochastic processes of the form
Xt = X0 +
Z t
0
s ds+
Z t
0
s dBs;
where and are adapted stochastic processes (which includes the special case
when they are deterministic). We sometimes write this formula shorter in di¤er-
ential notation as
dXt = t dt+ t dBt:
We can calculate the quadratic variation of an Itô-process as
[X]t = [X;X]t =
Z t
0
2s ds;
or in di¤erential notation
d [X]t =
2
t dt:
We have an Itô-formula for these kind of processes as follows:
f (Xt) f(X0) =
Z t
0
f 0(Xs) dXs +
1
2
Z t
0
f 00(Xs) d [X]s
=
Z t
0
f 0(Xs _)s ds+
Z t
0
f 0(Xs _)s dBs +
1
2
Z t
0
f 00(Xs)2s ds:
Stochastic di¤erential equations (SDE’s): Let two functions (t; x) and
(t; x) be given. A SDE is an equation of the form
Xt = X0 +
Z t
0
(s;Xs) ds+
Z t
0
(s;Xs) dBs;
or in di¤erential notation
dXt = (t;Xt) dt+ (t;Xt) dBt:
One can state certain conditions on the functions and which guarantee the
existence of a unique solution, and there is a vast body of knowledge of e¢ cient
numerical algorithms to approximate this solution.
49
Examples:
The SDE
dXt = Xt dt+ dBt
has as solution the Ornstein-Uhlenbeck process
Xt = e
t

X0 +
Z t
0
e+s dBs

:
The SDE
dXt = Xt dt+ Xt dBt
has as solution geometric Brownian motion
Xt = X0 exp

1
2
2

t+ Bt

:
7.4 The Formula of Feynman and Kac
Theorem 7.12 (Feynman-Kac) Consider the stochastic di¤erential equation
dXt = Xt dt+ Xt dBt:
Fix T > 0, and let 0 t T . For any function h, we de…ne a function F (t; x) by
considering the conditional expectation
F (t;Xt) = E

er(Tt)h (XT )
Ft : (7.4)
Then F (t; x) satis…es the partial di¤erential equation
@
@t
F (t; x) +
@
@x
F (t; x) +
1
2
2
@2
@x2
F (t; x) = rF (t; x);
with the terminal condition (for all x)
F (T; x) = h(x):
50
Proof. We have
ertF (t;Xt) = E

erTh (XT )
Ft :
By the tower property, and the de…nition of F ,
E

ertF (t;Xt)
Fs = E E erTh (XT )FtFs
= E

erTh (XT )
Fs
= ersF (s;Xs):
It follows that ertF (t;Xt) is a martingale. By Itô’s formula, the di¤erential of
this martingale is
d

ertF (t;Xt)

= ert

rF + @
@t
F +
@
@x
F +
1
2
2
@2
@x2
F

dt+ ert
@
@x
F dB:
Because we have a martingale, the dt-term must be equal to zero, and the partial
di¤erential equation follows.
Remark 7.13 Often the converse of this statement is used: given a solution to
the forementioned PDE, it can be shown under certain conditions that it equals
the conditional expectation (7.4).
8 Continuous-Time Markov Chains
8.1 Jump Rates and Transition Probabilities
Let (Xt)t0 be a collection of random variables taking values from a discrete set
S.
Example. Health-Sickness Model. The following model is often used in life
insurance: people can be in three states, H (healthy), S (sick), or D (dead). If a
person is in states H or S, then he/she stays there for an exponentially distributed
amount of time with rate H resp. S, then goes to either of the two other states
with a certain transition probability. Once you have reached state D, then you
stay there forever. D is called an absorbing or cemetery state (the latter expression
…ts well for this example).
Example. The previous example generalizes easily in the following way: Take
a discrete-time Markov chain and assign to each state a rate i. If Xt is in a
state i with i = 0 then Xt stays there forever. If i > 0, Xt stays at i for an
51
exponentially distributed amount of time with rate i, then goes to state j with
transition probability p (i; j). The lack of memory property of the exponential
distribution implies that given the present state, the rest of the past is irrelevant
for predicting the future.
De…nition 8.1 A continuous time stochastic processX = (Xt) is called aMarkov
chain with respect to a given …ltration (Ft) if for any function h : S! R (here S
denotes the discrete state space) we have
E [h (Xt+s)j Ft] = E [h (Xt+s)jXt]
for any s; t 2 R+. If this conditional expectation depends only on the time incre-
ment s, but not on t, then the chain is said to be homogenous. The transition
probabilities are given as pt(i; j) = P (Xt = jjX0 = i). Let now Pt be a matrix
that has the pt(i; j)’s as elements.
Properties of the family of matrices (Pt):
(i) P0 = I (the unity matrix)
(ii) Pt is a stochastic matrix - rows add up to 1 and elements are all between 0
and 1
(iii) Ps+t = PsPt for any s, t; this is called theChapman-Kolmogorov equation
These three properties tell us that (Pt) is a stochastic semigroup. Consider now
a small time interval, and assume that the probability of two or more transitions
in this interval is o (h). We now write for i 6= j
ph (i; j) = q(i; j)h+ o (h)
for some q(i; j) 0, and for i = j
ph (i; i) = 1 ih+ o (h)
with i =
P
j 6=i q (i; j) 0. We call q (i; j) the jump rate from i to j.
Example 8.2 For a Poisson process with intensity , we have that the transition
probability to go from n to n+ 1 is given for all n 0 as
ph (n; n+ 1) = 1 eh + o (h)
= h+ o (h) ;
so we get
q (n; n+ 1) = :
52
We introduce a new matrix Q with entries
Q (i; j) =

q (i; j) if j 6= i
i if j = i
Note that the o¤-diagonal elements q (i; j) with i 6= j are nonnegative, while the
diagonal entry i is a negative number chosen to make the row sum equal to
0. A computation based on the Chapman-Kolmogorov equation now reveals the
connection between the matrices P and Q (you can look up the details in the book
by Durrett, Ch. 4.2):
Theorem 8.3 We have
d
dt
Pt = QPt Kolmogorov
0s backward equation
d
dt
Pt = PtQ Kolmogorov
0s forward equation;
subject to P0 = I.
If Q would be a number instead of a matrix, we could solve these equations easily:
just set Pt = exp(Qt) and check by di¤erentiating that the equation holds. Inspired
by this observation, we de…ne the matrix
exp(Qt) =
1X
n=0
(Qt)n
n!
=
1X
n=0
Qn t
n
n!
and check by di¤erentiating (we may interchange summation and di¤erentiation)
that
d
dt
exp(Qt) =
1X
n=0
Qn
n tn1
n!
=
1X
n=1
Qn
tn1
(n 1)!
= Q
1X
n=1
Qn1tn1
(n 1)!
= Q exp(Qt):
Comparing the backward with the forward equation, we note the remarkable fact
that
QPt = PtQ:
53
In general, matrices do not commute: AB 6= BA, but the explanation is that here
Pt = e
Qt is made up of powers of Q:
Q exp(Qt) =
1X
n=0
Q (Qt)
n
n!
=
1X
n=0
(Qt)n
n!
Q = exp(Qt) Q:
Let us now write the backward equation more explicitly as
p0t(i; j) =
X
k 6=i
q (i; k) pt (k; j) ipt (i; j) (8.1)
Example 8.4 Poisson process. Let Nt be the number of arrivals up to time
t in a Poisson process with rate . In order to go from i arrivals at time s to j
arrivals at time t+ s we must have j i and have exactly j i arrivals in t units
of time, so
pt (i; j) = e
t (t)
ji
(j i)!
The backwards equation as in (8.1) tells us that in our situation
p0t (i; j) = pt (i+ 1; j) pt (i; j) :
To check this we di¤erentiate the formula for pt (i; j) as above to get for j > i
et (t)
ji
(j i)! + e
t (t)
ji1
(j i 1)! = pt (i; j) + pt (i+ 1; j) :
When j = i, pt (i; i) = et, so the derivative is
et = pt (i; i) = pt (i; i) + pt (i+ 1; i)
since pt (i+ 1; i) = 0.
8.2 Limit Behaviour
In discrete time a stationary solution is a solution of Tp = T . In continuous
time we need the following condition:
De…nition 8.5 is said to be a stationary distribution if TPt = T for all
t > 0.
54
This is di¢ cult to check since it involves all of the Pt which in addition are usually
not that easy to compute. It would be better to get a condition which only
involves the single matrix Q which we usually get easily from the problem under
consideration.
Theorem 8.6 is a stationary distribution if and only if TQ = 0.
Proof. We just prove the ’only if’part. We have, since Pt = exp(Qt),
TPt =
T exp(Qt)
= T
1X
n=0
Qn t
n
n!
= T

I +
1X
n=1
Qn t
n
n!
!
= T +

TQ
1X
n=1
Qn1 t
n
n!
= T :
We now want to relate the concept of a stationary distribution to the limiting
behaviour of a Markov chain.
De…nition 8.7 A chain is called irreducible if for any two states i and j it is
possible to get from i to j in a …nite number of steps.
Theorem 8.8 If a continuous-time Markov chain is irreducible and has a station-
ary distribution , then
lim
t!1
pt(i; j) = (j):
Example. Weather chain. It can be Sunny, Cloudy, or Rainy. The weather
stays sunny for an exponentially distributed number of days with mean 3, then
becomes cloudy. It stays cloudy with mean 4, then rain comes. The rain lasts
with mean 1 until sunshine returns. This verbal description translates into the
following Q-matrix:
S C R
S 1=3 1=3 0
C 0 1=4 1=4
R 1 0 1
55
The relation TQ = 0 now leads to three equations, which we solve to get (1) =
3=8, (2) = 4=8, (3) = 1=8.
De…nition 8.9 A birth and death chain has state space S = f0; 1; :::; Ng and
jump rates
q(n; n+ 1) = n
q(n; n 1) = n
for 0 n < N (and = 0 otherwise).
For a birth and death chain we can give an easy su¢ cient condition for to be a
stationary distribution:
(k) q (k; j) = (j) q (j; k) for all j; k:
If this detailed balance condition holds, then is a stationary distribution.
This follows by summing over all k 6= j and recalling the de…nition of j,X
k 6=j
(k) q (k; j) = (j)
X
k 6=j
q (j; k) = (j)j;
since then
TQ

j
=
X
k 6=j
(k) q (k; j) (j)j = 0:
Note: the detailed balance condition is not a necessary condition for being
stationary. Indeed, in the weather example it is not satis…ed. However, it always
holds for birth and death chains.
Example. Barbershop. A barber can cut hair at rate 3, where the units are
people per hour, i.e. each haircut requires an exponentially distributed amount of
time with mean 20 minutes. Suppose customers arrive at times of an intensity rate
2 Poisson process, but will leave if both chairs in the waiting room are full. What
fraction of time will both chairs be full? In the long run, how many customers
does the barber serve per hour?
Solution: we de…ne our state to be the number of customers in the system, so
S = f0; 1; 2; 3g. From the problem description it is clear that
q (i; i 1) = 3 for i = 1; 2; 3
q (i; i+ 1) = 2 for i = 0; 1; 2
56
The detailed balance conditions say
2 (0) = 3 (1) ; 2 (1) = 3 (2) ; 2 (2) = 3 (3)
This together with (0) + (1) + (2) + (3) = 1 yields
(0) = 27=65; (1) = 18=65; (2) = 12=65; (1) = 8=65:
From this we see that 8=65’s of the time both chairs are full, so that fraction of
the arrivals are lost and hence 57=65’s or 87:7% of the customers enter service.
Since the original arrival rate is 2, this means that the barber serves an average
of 114=65 = 1:754 customers per hour.
Example. Queuing chain. A bank has one teller to serve customers who need
an exponential amount of service with rate and queue in a line if the server is
busy. Customers arrive at times of a Poisson process with rate but only join the
queue with probability an if there are n customers in line. We have
q (n; n+ 1) = an
q(n; n 1) = :
The next result gives a natural condition to prevent the queue length from growing
out of control.
Theorem 8.10 If an ! 0 as n!1, then there is a stationary distribution.
Proof. It follows from the detailed balance conditions that
(n+ 1) = an

(n) :
If N is large enough and n N , then an= 1=2 and it follows that
(n+ 1) 1
2
(n) :::

1
2
nN
(N) :
This implies that
P
n (n) <1, so we can pick (0) to make the sum = 1.
If we now take an = 1=(n+ 1), then we get from the detailed balance conditions
(n) =

n
(n 1)
=
2
n(n 1)2 (n 2)
= :::
=
(=)n
n!
(0) :
57
To …nd the stationary distribution we want to …nd c = (0) such that
c
1X
n=0
(=)n
n!
= 1:
As the sum is the series for the exponential function, we get c = exp (=) and
the stationary distribution is Poisson.
58

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468