程序代写案例-MATH4007

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

G14CST/MATH4007 —Computational
Statistics
Chapter II - Simulation
2 Simulation
By simulation, we mean the use of a model to produce artificial ‘results’
rather than obtaining them from a physical experiment. When should we
use stochastic models as opposed to deterministic ones? A lack of knowledge
about a deterministic system is often accounted for by adding some random-
ness to the models. Whenever we have a model, we can simulate from the
model to learn about it. Note however, that we learn about the model, not
the physical system!
In order to simulate from a stochastic model, we need a source of random-
ness. Most simulation methods require a sequence of uniformly distributed
random variables as the starting point from which more complicated simula-
tions can be derived. How could we generate U [0, 1] random variables? The
approach is to generate a sequence of pseudo-random numbers, which are
generated deterministically, but appear indistinguishable from random when
statistical tests are applied.
Definition: A sequence of pseudo-random numbers {ui} is a determin-
istic sequence of numbers in [0, 1] having the same statistical properties as a
similar sequence of random numbers.
Note that the sequence {ui} is reproducible provided the first term in the
sequence is known (which is known as the seed). A good sequence would be
“unpredictable to the uninitiated”. That is, if we were only shown the se-
quence produced, and didn’t know the deterministic formula used to produce
it, we would not be able to predict its future behaviour, and the sequence
would appear to be completely random.
1
2.1 Congruential generators
The following sequence
0.155109197, 0.701655716, 0.813951522, 0.568807691, 0.087282449,
was generated by starting with the number N0 = 78953536 then calculating
N1 = (aN0) mod M , N2 = (aN1) mod M , N3 = (aN2) mod M , and then
setting U1 = N1/M , U2 = N2/M , U3 = N3/M , etc.
The two constants are
a = 216 + 3 = 65539, M = 231 = 2147483648.
N1 = remainder when 65539 × 78953536 is divided by 2147483648. Then
U1 = N1/2147483648.
The general form of a congruential generator is
Ni = (aNi−1 + c) mod M,
Ui = Ni/M, where integers a, c ∈ [0,M − 1]
If c = 0, it is called amultiplicative congruential generator (otherwise, mixed).
N0 is called the seed.
The numbers produced are restricted to the M possible values
0,
1
M
,
2
M
, . . . ,
M − 1
M
.
Clearly, they are rational numbers, but if M is large they will practically
cover the reals in [0, 1]. As soon as some Ni repeats, say, Ni = Ni+T , then
the whole subsequence repeats, i.e. Ni+t = Ni+T+t, t = 1, 2, . . .. The least
such T is called the period.
A good generator will have a long period. The period cannot be longer
than M and also depends on a and c.
2
2.2 Generation from non-U(0, 1)
Now suppose we have a sequence U1, U2, U3, . . . of independent uniform ran-
dom numbers in [0, 1], and we want to generate X1, X2, . . . distributed inde-
pendently and identically from some other specified distribution. That is, we
wish to transform the U1, U2, . . . sequence into a X1, X2, . . . sequence. There
are often many ways of doing this. A good, efficient algorithm should be fast
because millions of random numbers from the desired distribution may be
required.
2.2.1 The inversion method
Let X be any continuous random variable and define Y = FX(X), where FX
is the distribution function of X : FX(x) = P (X ≤ x).
Claim: Y ∼ U [0, 1].
Proof: Y ∈ [0, 1] and the distribution function of Y is
FY (y) = P (Y ≤ y) = P
(
FX(X) ≤ y
)
= P
(
X ≤ F−1X (y)
)
= FX
(
F−1X (y)
)
= y
which is the distribution function of a uniform random variable on [0, 1].
So, whatever the distribution of X , Y = FX(X) is uniformly distributed
on [0, 1].
The inversion method turns this backwards. Let U = FX(X), so that
X = F−1X (U). To generate from FX(·), take a single uniform variable U ,
calculate F−1X (U), and then X will have the required distribution:
P (X ≤ x) = P (F−1X (U) ≤ x)
= P
(
U ≤ FX(x)
)
= FX(x)
3
space
Example: exponential distribution
Let X have the exponential distribution with parameter λ (mean 1λ). I.e.
f(x) = λe−λx (x > 0)
F (x) =
∫ x
0
λe−λz dz = [−e−λz]x0 = 1− e−λx.
Set U = 1− e−λX and solve for X
X = −1
λ
ln(1− U).
Note that 1 − U is uniformly distributed on [0, 1], so we might just as well
use
X = −1
λ
lnU.
2.2.2 Discrete distributions
The inversion method works for discrete random variables in the following
sense. Let X be discretely distributed with k distinct possible values xi
having probabilities pi, i = 1, . . . , k. So
P (X = xi) = pi,
k∑
i=1
pi = 1.
Then FX(x) =
∑
xi≤x
pi is a step function.
Inversion gives X = xi if
∑
xjpj < U ≤
∑
xj≤xi
pj, which selects each
possible value of X with the correct probability.
Think of this as splitting [0, 1] into intervals of length pi. The interval in
which U falls is the value of X , which occurs with probability pi as required.
4
Example
Let X ∼ Bi(4, 0.3). The probabilities are
P (X = 0) = 0.2401, P (X = 1) = 0.4116, P (X = 2) = 0.2646
P (X = 3) = 0.0756, P (X = 4) = 0.0081.
The algorithm says X = 0 if 0 ≤ U ≤ 0.2401,
X = 1 if 0.2401 < U ≤ 0.6517,
X = 2 if 0.6517 < U ≤ 0.9163,
X = 3 if 0.9163 < U ≤ 0.9919,
X = 4 if 0.9919 < U ≤ 1.
One way to perform the algorithm is this: let U ∼ U(0, 1).
1. Test U ≤ 0.2401. If true, return X = 0.
2. If false, test U ≤ 0.6517. If true, return X = 1.
3. If false, test U ≤ 0.9163. If true, return X = 2.
4. If false, test U ≤ 0.9919. If true, return X = 3.
5. If false, return X = 4.
Consider the speed of this. The expected number of steps (which roughly
equates to speed) is
1× 0.2401 + 2× 0.4116 + 3× 0.2646 + 4× 0.0756 + 4× 0.0081 = 2.1919.
5
To speed things up we can rearrange the order so that the later steps are
less likely.
1. Test U ≤ 0.4116. If true return X = 1.
2. If false, test U ≤ 0.6762. If true return X = 2.
3. If false, test U ≤ 0.9163. If true return X = 0.
4. and 5. as before.
The expected number of steps is 1× 0.4116 + 2× 0.2646 + 3× 0.2401 + 4×
(0.0756 + 0.0081) = 1.9959: Roughly a 10% speed increase.
2.2.3 Transformations
Here are a few useful relationships which can be used to transform simulated
random variables between related distributions:
(a) If U ∼ U(0, 1) set V = (b− a)U + a then V ∼ U(a, b) where a < b.
(b) If Yi are iid exponential with parameter λ then
X =
n∑
i=1
Yi = −1
λ
n∑
i=1
logUi = −1
λ
log
(
n∏
i=1
Ui
)
has a Ga(n,λ) distribution.
(c) If X1 ∼ Ga(p, 1), X2 ∼ Ga(q, 1), X1 and X2 independent then Y =
X1/(X1 +X2) ∼ Be(p, q).
6
(d) Composition: if
f(x) =
r∑
i=1
pifi(x)
where
∑
pi = 1 and each fi is a density, then we can sample from f
by first sampling an index j from the set {1, . . . , r} with correspond-
ing probability distribution p = {p1, . . . , pr}, and then taking a sample
from fj .
f(x) is a mixture density (we are “mixing” (averaging) the component
densities with weights given by the probability distribution of the in-
dex).
2.2.4 The Box-Mu¨ller algorithm for the normal distribution
We cannot generate a normal random variable by inversion, because FX is
not known in closed form (nor is its inverse). The Box–Mu¨ller method (1958)
can be used instead. Take U1, U2 ∼ U(0, 1) independently. Calculate
X1 =
√
−2 lnU1 cos(2πU2),
X2 =
√
−2 lnU1 sin(2πU2).
Then X1 and X2 are independent N(0, 1) variables. You will be invited to
prove this on the exercise sheet.
7
2.3 Rejection Algorithm
Fundamental Theorem of Simulation:
Simulating
X ∼ f(x)
is equivalent to simulating
(X,U) ∼ U{(x, u) : 0 < u < f(x)}.
Note that fX,U(x, u) = {0∫
f(x, u)du =
∫ f(x)
0
du = f(x),
so that f is the marginal density of the joint distribution (X,U) ∼ U{(x, u) :
0 < u < f(x)}.
The problem with this result is that simulating uniformly from the set
{(x, u) : 0 < u < f(x)}
may not be possible. A solution is to simulate the pair (X,U) in a bigger
set, where simulation is easier, and then take the pair if the constraint is
satisfied.
0.0 0.2 0.4 0.6 0.8 1.0
0.
0
0.
5
1.
0
1.
5
2.
0
2.
5
x
D
en
si
ty
8
Suppose that f(x) is zero outside the interval [a, b] (so that
∫ b
a f(x)dx = 1)
and that f is bounded above by m. We can then simulate the pair
(Y, U) ∼ U{[a, b]× [0, m]}
( by simulating Y ∼ U [a, b], U ∼ U [0, m] independently). We then
accept the pair (Y, U) if 0 < U < f(Y ), in which case we return X = Y as
our simulated value from the distribution of X .
This results in the correct distribution for the simulated value X , since
P(X ≤ x) = P(Y ≤ x|U < f(Y ))
=
P(Y ≤ x, U < f(Y ))
P(U < f(Y ))
=
∫ x
a
∫ f(y)
0 f(y, u)dudy∫ b
a
∫ f(y)
0 f(y, u)dudy
=
∫ x
a
f(y)dy.
Note: we can use the rejection algorithm even if we only know f up
to a normalising constant (as is often the case in Bayesian statistics, for
example). To see this, suppose f(x) = f1(x)c , where c is the normalizing
constant of f (which may be unknown, or very difficult to compute). Then,
if we use f1(x) instead of f(x) in the algorithm, the value of m will change
(it is the bound on f1(x) not f(x)), but the accepted values will still be from
the correct distribution — this is because the acceptance probability of a
candidate value Y is the same in both cases.
Example: Sampling from a beta distribution
Consider sampling from X ∼ Beta(α, β) for α, β > 1 which has pdf
f(x) =
Γ(α + β)
Γ(α)Γ(β)
xα−1(1− x)β−1 0 < x < 1.
We note
f(x) ∝ f1(x) = xα−1(1− x)β−1 0 < x < 1
9
and that M = sup
0xα−1(1 − x)β−1 occurs at x = α− 1
α + β − 2 (mode) and
hence
M =
(α− 1)α−1(β − 1)β−1
(α + β − 2)α+β−2 .
The rejection algorithm is
1. Generate Y ∼ U(0, 1) and U ∼ U(0,M).
2. If U ≤ f1(Y ) = Y α−1(1 − Y )β−1 then let X = Y (accept) else go to 1
(reject).
2.3.1 Generalising the Rejection Idea
If the support of f is not finite, then bounding it within a rectangle will not
work. Instead of using a box to bound the density f(x) (ie requiring f(x) < m
for some constant m) we can use a function m(x) such that f(x) ≤ m(x) for
all x.
Suppose the larger bounding set is
L = {(y, u) : 0 < u < m(y)}.
Then all we require is that simulation of a uniform from L is feasible. Note
1. The closer m is to f the more efficient our algorithm.
2. Because m(x) ≥ f(x), m is usually not a probability density. We write
m(x) = Mg(x), where
∫
m(x)dx =
∫
Mg(x)dx = M
for some density g.
This suggests a more general implementation of the fundamental theorem:
10
Let X ∼ f(x) and let g(x) be a density function that
satisfies f(x) ≤Mg(x) for some constant M ≥ 1. Then,
to simulate X ∼ f , it is sufficient to generate
Y ∼ g and U |Y = y ∼ U(0,Mg(y))
and set X = Y if U < f(Y ).
Again, this results in the correct distribution for the simulated value X .
(Proof on next page.)
However, when implementing the algorithm in practice, it is more conve-
nient to rescale the uniform distribution such that the support of U doesn’t
depend on the particular value y sampled. Hence, the rejection algorithm is
usually stated in a slightly modified form:
Rejection Algorithm
If g is such that f/g is bounded, and therefore we
can find a value M such that Mg(x) ≥ f(x) for
all x, then we can simulate from f as follows:
1. Generate Y from density g, and U from U(0, 1).
2. If U < f(Y )/Mg(Y ) set X = Y . Otherwise, re-
turn to step 1.
We keep sampling new pairs Y and U until the condition is satisfied.
11
Proof: validity of the general rejection algorithm.
Let X be a random variable denoting an accepted sample - i.e. a sample Y
from g which met the acceptance criterion. We need to show that the cdf of
X is the one we want.
P(X ≤ x) = P
(
Y ≤ x|U < f(Y )
Mg(Y )
)
=
P
(
Y ≤ x, U < f(Y )Mg(Y )
)
P
(
U < f(Y )Mg(Y )
)
=
∫ x
−∞
∫ f(y)
Mg(y)
0 g(y)f(u) dudy∫∞
−∞
∫ f(y)
Mg(y)
0 g(y)f(u) dudy
=
∫ x
−∞
f(y)
Mg(y)g(y) dy∫∞
−∞
f(y)
Mg(y)g(y) dy
=
1
M
∫ x
−∞ f(y) dy
1
M
∫∞
−∞ f(y) dy
=
∫ x
−∞
f(y) dy,
which is the definition of the cdf of the random variable with density f that
we initially wanted to simulate from.
12
Example: Sampling from a beta distribution revisited
Use rejection to sample from X ∼ Beta(α, β). Let g(y) = αyα−1, 0 < y < 1,
then
f1(x)
g(x)
=
(1− x)β−1
α
is bounded if and only if β ≥ 1
Then M = sup
x
{
f1(x)
g(x)
}
=
1
α
occurs at x = 0.
1. Simulate Y with pdf g(y) = αyα−1, 0 < y < 1 and U ∼ U(0, 1).
2. If U <
f1(Y )
Mg(Y )
=
(1− Y )β−1(
1
α
)
α
= (1 − Y )β−1 then set X = Y else go to
1.
The full algorithm is:
1. Generate U ∼ U(0, 1) and Z ∼ U(0, 1). Let Y = Z 1α .
2. If U ≤ (1− Y )β−1 then set X = Y else go to 1.
0.0 0.2 0.4 0.6 0.8 1.0
0.
0
0.
2
0.
4
0.
6
0.
8
1.
0
x
D
en
si
ty
2.3.2 Efficiency of the rejection method
The probability of accepting a particular given proposal y is
P (U ≤ f(Y )Mg(Y ) |Y = y) = f(y)Mg(y) . To compute the probability of accepting Y
13
from a randomly chosen pair (U, Y ), we need to integrate the joint density
of U and Y over the region {−∞ < Y < ∞, 0 < U < f(Y )Mg(Y )}, i.e. over all
possible values of Y .
Prob(Accept) = P
(
U ≤ f(Y )/Mg(Y ))
=
∫ ∞
y=−∞
∫ f(y)
Mg(y)
u=0
1× g(y)dudy (by independence of U and Y and density of U is 1.)
=
∫ ∞
y=−∞
f(y)
Mg(y)
g(y)dy
=
1
M
∫ ∞
y=−∞
f(y)dy
=
1
M
,
since f(y) is a pdf and integrates to 1.
Hence, the number of tries until we accept Y is a geometric random
variable with expectation M . For maximum efficiency, we want M as small
as possible, i.e. sup f(x)/g(x) as small as possible. This means finding a g
that
(a) we can sample from efficiently, and
(b) mimics f as closely as possible in shape, so that the “envelope” function
Mg(x) is a good match to f . If so, then the acceptance probability
f(y)
Mg(y) will be close to 1.
There are many good generators based on rejection from a well-chosen g.
Note that the M here must include all normalising constants. I.e., to use
the algorithm, we only need to be able to evaluate f and g up to proportion-
ality, but to calculate the efficiency we need to calculate M = sup
x
{
f(x)
g(x)
}
,
where f and g are the full, normalised densities. This can be problematic for
distributions with awkward normalising constants.
14
Rejection Example: von Mises distribution
Let θ have von Mises distribution with pdf
f(θ) =
exp(k cos θ)
2πI(k)
0 ≤ θ < 2π (k ≥ 0)
where I(k) is the normalising constant. The von Mises distribution is an
important distribution in directional statistics, used for modelling angular
data (data on a circle). It plays an analagous role to the normal distribution
on the line. Here, k is a “concentration parameter”, so 1k is a measure of
dispersion.
Let f1(θ) =
1
2π
exp(k cos θ), 0 ≤ θ < 2π. Note f1(θ) ∝ f(θ).
Y ∼ U(0, 2π) so that g(y) = 1
2π
, 0 ≤ y < 2π.
Then
M = sup
θ
{
f1(θ)
g(θ)
}
= sup
θ
{exp(k cos θ)} = exp k.
Let U ∼ U(0, 1). Then if
U <
f1(Y )
Mg(Y )
=
exp(k cosY )
2π · 1
2π
· exp k
= exp
(
k(cosY − 1))
we accept θ = Y , otherwise reject.
15
0 1 2 3 4 5 6
0.
00
0.
05
0.
10
0.
15
0.
20
θ
0 1 2 3 4 5 6
0.
0
0.
2
0.
4
0.
6
0.
8
θ
Figure 1: The solid lines show the von Mises density for low concentration
(left plot) and higher concentration (right plot). The dashed lines show
the envelope function M/2π, where 1/2π is the uniform proposal density.
When the concentration is low, the unifom density has a similar shape to the
von Mises, so all proposals have a high chance of being accepted. For high
concentration, the von Mises density is concentrated around its mode, so the
uniform proposals will include lots of values away from this mode with very
little chance of acceptance.
If we use the full density f(θ), then M = exp kI(k) is the expected number
of attempts per acceptance, and it is an increasing function of k : as k
increases, the method becomes less efficient. This can be seen in Figure 1.
For low k, the von Mises distribution is close to uniform, and hence a uniform
proposal density is a good match. For higher concentrations, the distribution
becomes concentrated around the mean of 0 and the uniform proposals are
very inefficient: lots of proposals are generated where the von Mises density
is low, and hence have little chance of being accepted because the acceptance
probability f(θ)Mg(θ) will be close to 0. Note that in the plot, the apparent
bimodality is artificial, since we have “cut” the circle at θ = 0 (the mode) to
plot the density. Values close to 2π are really part of the same mode, and
the distribution is really unimodal. There is an “anti-mode” (minimum of
the density) at θ = π, where cos(θ) = −1.
16
2.3.3 Improving efficiency – ‘Squeezing’
Another factor in the speed of rejection algorithms is the need to calculate
f(y)/g(y) every time when in fact f is often a rather complex function and
therefore expensive or difficult to calculate.
Suppose, however, we can find two more functions h1(x) and h2(x), which
are quick to calculate and such that
h1(x) ≤ f(x)
Mg(x)
≤ h2(x) for all x.
Quick acceptance if U < h1(Y ).
Quick rejection if U ≥ h2(Y ). This is known as ‘squeezing’.
The squeeze rejection algorithm:
1. Generate Y from g and U from U(0, 1).
2. If U < h1(Y ), return X = Y .
3. Otherwise, if U ≥ h2(Y ), reject Y and go to step 1.
4. Otherwise, if U <
f(Y )
Mg(Y )
, return X = Y .
5. Otherwise, go to step 1.
If the squeeze functions h1 and h2 are well chosen, we will rarely actually
need to calculate f/g. If g is well chosen in the first place, f(x)/Mg(x) will
be close to one for most x, and so the upper squeeze function h2 will make
only a minor improvement, but h1 can make a big difference.
17
Example
Suppose we wish to generate X ∼ N(0, 1) using rejection from a Cauchy
distribution with pdf
g(y) =
1
π(1 + y2)
−∞ < y <∞.
Let M = sup
y
{
f(y)
g(y)
}
= sup
y
{
1√
2π
exp{−y2/2}
1
π(1+y2)
}
=
√
2π
e
which occurs at y = ±1. The acceptance condition is
U ≤ f(Y )
Mg(Y )
=
√
e
2
(1 + Y 2)e−Y
2/2.
The expression on the right hand side may be considered to be compli-
cated/expensive to compute. Now e−Y
2/2 ≥ 1 − Y 2/2 and thus the lower
squeeze test is
U <
√
e
2
(1 + Y 2)
(
1− Y
2
2
)
.
Example
Consider sampling from an exponential on (0, 2) such that the pdf of X is
f(x) =
e−x
1− e−2 0 < x < 2.
Let f1(x) = e−x, 0 < x < 2, and g(y) = 12 , 0 < y < 2, so that Y ∼ U(0, 2).
Also let c = sup
x
{
f1(x)
g(x)
}
= sup
x
2e−x = 2. If U <
f1(Y )
cg(Y )
= e−Y then accept
X = Y .
Now apply squeezing to U < e−Y . We have ex ≥ 1 + x ∀x ∈ R, so
1− x ≤ e−x ≤ (1 + x)−1.
This is true for any x, i.e. letting z = x− a, we have 1− z ≤ e−z ≤ (1+ z)−1
18
and hence
e−a(a+ 1− x) ≤ e−x ≤ e−a(1− a + x)−1.
Since a is arbitrary, looking at the two inequalities separately we can set
h1(x) = e
−a(a+ 1− x)
and
h2(x) = e
−b/(1− b+ x)
for arbitrary constants a and b.
The full algorithm is
1. Sample Y ∼ U(0, 2), U ∼ U(0, 1).
2. If U < e−a(a+ 1− Y ) go to 5.
3. If U ≥ e−b/(1− b+ Y ) go to 1.
4. If U ≥ e−Y go to 1.
5. Return X = Y .
Since we are free to choose a and b, they may be chosen to maximize the
probability of quick acceptance/rejection (i.e. avoid step 4), in which case it
can be shown that a = 1 and b = .662 is optimal.
19
2.4 Ratio-of-uniforms
Our aim is to sample from fX(x) = f1(x)/
∫
f1(x)dx = f1(x)/c. The follow-
ing theorem provides us with another method: the ratio-of-uniforms method.
Theorem: If the pair of random variables (U, V ) is uniformly distributed
on
A = {(u, v) ∈ R2 : 0 < u ≤
√
f1(v/u)}
then the variable X = V/U has density fX(x).
Proof : Introduce Y and make a transformation (U, V ) +→ (X, Y ) via
x = v/u, y = u. The determinant of the Jacobian is | detJ(x, y)| = |−y| = y,
thus
Area of A =
∫∫
A
du dv =
∫∫ √f1(x)
0
y dy dx
=
∫ [
y2
2
]√f1(x)
0
dx
=
∫
f1(x)
2
dx = c/2.
Since (U, V ) is uniform over A then (U, V ) has pdf =
2
c
A(u, v)
=⇒ (X, Y ) has p.d.f. fX,Y (x, y) = 2y
c
A(y, xy)
=⇒ X has marginal p.d.f. = fX(x) =
∫
fX,Y (x, y) dy
=⇒ fX(x) =
∫ √f1(x)
0
2y dy
c
=
[
y2
c
]√f1(x)
0
=
f1(x)
c
.
Therefore X = V/U has pdf fX(x) = f1(x)/
∫
f1(x)dx as required.
20
2.4.1 Interpretation of Ratio-of-Uniforms
Suppose we can generate (U, V ) uniformly over A; then we can generate X
from an arbitrary f(x) by just letting X = V/U .
Unfortunately, the boundary of A may be hard to evaluate. However, if
we can enclose A in a rectangle [0, a]× [b1, b2] then we can use the following
rejection algorithm:
1. Generate U1, U2 ∼ U [0, 1]
2. Let U = aU1 and V = b1 + (b2 − b1)U2, then (U, V ) is point randomly
chosen in the rectangle [0, a]× [b1, b2].
3. If (U, V ) ∈ A, then return X = V/U , else repeat from step 1.
What conditions do we need for A to be contained in a rectangle?
Theorem: If f1(x) and x2f1(x) are bounded, then A ⊂ [0, a] × [b1, b2]
where
a = sup
x
√
f1(x)
b1 = inf
x≤0
x
√
f1(x)
b2 = sup
x≥0
x
√
f1(x)
Proof
(i) 0 < u ≤
√
f1
(v
u
)
=⇒ 0 ≤ u ≤ sup
x
√
f1(x) = a.
(ii) For v ≥ 0 to be a possible value, there must exist u > 0 with 0 < u2 ≤
f1
(v
u
)
.
Let t =
v
u
, i.e. u =
v
t
, i.e. there must exist t > 0 such that v2 ≤ t2f1(t)
so that 0 < v ≤ t√f1(t).
∴ v ≥ 0, (u, v) ∈ A imply v2 ≤ b22 i.e. v ≤ b2 = sup
t≥0
t
√
f1(t).
(iii) The case v < 0 follows similarly.
21
Finally, the probability of acceptance =
Area of A
Area of rectangle
=
1
2
∫
f1(x) dx
a(b2 − b1) =
∫
f1(x) dx
2a(b2 − b1) .
Hence, for densities with bounded f1 and x2f1 we can write the ratio of
uniforms algorithm as
Ratio of Uniforms Algorithm
For a given f1, find a, b1 and b2
1. Generate U1, U2 ∼ U [0, 1]
2. Let U = aU1 and V = b1+(b2−b1)U2, then (U, V )
is point randomly chosen in the rectangle [0, a] ×
[b1, b2].
3. Accept X = V/U if and only if
U2 ≤ f1
(
V
U
)
otherwise return to step 1.
Example
Suppose we wish to sample from the Rayleigh distribution with pdf
f(x) =
x
σ2
exp
{
− x
2
2σ2
}
x > 0.
First we must find the bounding rectangle.
(i) First find a = supx
√
f(x) = supx
x
1
2
σ exp
{
− x24σ2
}
. Taking natural logs
we obtain
log
√
f(x) = − log σ + 12 log x−
x2
4σ2
d
dx
(
log
√
f(x)
)
=
1
2x
− x
2σ2
= 0 =⇒ x = σ
22
d2
dx2
(
log
√
f(x)
)
< 0
Therefore, there is a maximum at x = σ.
Hence a = sup
x
√
f(x) =
√
f(σ) =
1
σ
1
2
exp
(−14).
(ii) Secondly b1 = inf
x≤0
(
x
√
f(x)
)
= 0 since f(x) = 0, x ≤ 0.
(iii) Finally log x
√
f(x) = − log σ + 32 log x−
x2
4σ2
d
dx
(log(x
√
f(x))) =
3
2x
− x
2σ2
= 0 =⇒ x = √3σ
d
dx2
(log(x
√
f(x))) < 0
Therefore, x
√
f(x) has a maximum at x =
√
3σ.
Hence
b2 = sup
x≥0
(
x
√
f(x)
)
=
√
3σ
√√
3σ
σ2
exp(−3σ
2
2σ2
) = σ
1
2
(
3
e
) 3
4
.
The algorithm is
1. Generate U1, U2 ∼ U(0, 1).
2. Set U = aU1 = σ−
1
2 e−
1
4U , and V = b2U2 = σ
1
2
(
3
e
) 3
4
U2.
3. If
U ≤
√
f
(
V
U
)
=
√
V
σ2U
exp
{
− V
2
2σ2U2
}
then return X =
V
U
as a realisation from a Rayleigh distribution.
Probability of acceptance =
∫
f(x) dx
2a(b2 − b1) =
1
2σ−
1
2 e−
1
4σ
1
2
(
3
e
) 3
4
=
e
2(3
3
4 )
= 0.5962.
23
2.5 Multivariate generators
So far, we have considered several methods which can be used to simulate
from general univariate distributions given a supply of uniform random num-
bers. Now suppose we want to generate a random vector X = (X1, . . . , Xp)
from the multivariate density f(x). We can note the following points
1. If the elements of X are to be independent, i.e.
f(x) = f1(x1)f2(x2) . . . fp(xp),
then we can separately generate X1 from f1, X2 from f2, . . . , Xp from
fp using different uniform random numbers.
2. Inversion cannot be generalised to higher dimensions.
3. Rejection does work, if we can generate from g(x) (and g may be a
product of independent components) and find M ≥ sup
x
f(x)
g(x)
.
2.5.1 Sequential methods
More generally, if the components are not independent we can sample the
components sequentially as follows. Since we can write
f(x) = f1(x1)f2(x2|x1)f3(x3|x1, x2) . . . ,
we can first generate X1 from f1. Then for that given value of X1, generate
X2 from f2, and so on.
This gives us a sample (x1, . . . , xp) from the joint distribution of X1, . . . , Xp.
Note that each element x1, . . . , xp of the joint sample is automatically a
sample from the respective component marginal distributions of X1, . . . , Xp.
This gives us a very useful simulation trick as follows. Suppose we want to
sample a value of a random variable X from its distribution, but don’t know
how to. However, suppose we introduce another random variable Y , such
that we can sample from the conditional distribution of X|Y = y and the
24
marginal distribution of Y . Then, since
f(x, y) = f(x|y)f(y),
we can first simulate a pair (X, Y ) from the joint distribution f(x, y) as
follows:
1. Simulate a value y from the marginal f(y).
2. Simulate a value x from the conditional distribution f(x|y)
Then the value of x is automatically a sample from the marginal f(x). In
general, if we sample a random vector from its joint distribution, each com-
ponent is also a sample from its marginal distribution.
The composition method in Section 2.2.3 to simulate from mixture distri-
butions is a special case of this idea. We first sample an index Z with prob-
ability P (Z = i) = pi, i = 1, . . . , r, then simulate from the corresponding
component according to the index z sampled. This gives a sample from the
joint distribution f(x, z) = f(x|z)f(z), where f(z) is the discrete probability
distribution over the index set, and f(x|z) is the conditional distribution for
component z (which we assume we can sample easily). The value of x is
a sample from the marginal density f(x) (the mixture density), because we
first randomly sampled a component index from f(z).
Example
Suppose X|Y = y ∼ N(0, 1y ) and y ∼ Γ(ν2 , ν2 ) for some known positive integer
ν. We know how to simulate from each of these distributions easily, so we can
simulate a pair (x, y) from the joint distribution by simulating a value y from
the Gamma distribution and then a value x from the normal distribution with
variance y−1. This also gives us a sample from the marginal distribution of
x. (Intuitively, by first sampling y, we “average” over the possible values of
y.)
25
What is the marginal of x?
f(x, y) = f(x|y)f(y)
∝ √y exp
{
−y
2
x2
}
y
ν
2−1 exp
{
−ν
2
y
}
∝ y ν2− 12 exp
{
−y(x
2
2
+
ν
2
)
}
.
Now integrate over y:
f(x) =
∫ ∞
0
f(x, y)dy
∝
∫ ∞
0
y
ν
2− 12 exp
{
−y
(
x2
2
+
ν
2
)}
dy,
which, for fixed x, is in the form of an integral of a Gamma density for y
with shape parameter a = (ν+1)2 and rate parameter b =
x2+ν
2 . Hence,
f(x) ∝ b−aΓ(a)
∫ ∞
0
ba
Γ(a)
ya−1 exp{−by}dy
∝ (ν + x2)− (ν+1)2 ,
where we made the integrand into a full Gamma density by including the
required constants (so it integrates to 1) and compensating by also putting
the reciprocal of these constants outside the integral. Now,
(ν + x2)−
(ν+1)
2
is proportional to a t distribution with ν degrees of freedom. Hence, the
marginal distribution of x is a tν distribution, and so if we sample y from a
Γ(ν2 ,
ν
2 ) and then x from a N(0, y
−1), we get a sample x from the tν distribu-
tion.
space
26
2.5.2 Multivariate normal
Suppose we wish to sample from X ∼ Np(m, V ), the p-dimensional normal
distribution with mean vectorm and covariance matrix V . Standardising, we
know that Y = B−1(X−m) ∼ Np(0, Ip), where B is any square root of V so
that V = BB⊤, and Ip is the (p×p) identity matrix. But if Y ∼ N(0, Ip), the
individual components Yi of Y are independent N(0, 1) univariate random
variables. So we can simulate X as follows:
1. Generate Y1, Y2, . . . , Yp independently fromN(0, 1) and set Y = (Y1, Y2, . . . , Yp).
2. Set X =m+BY .
27

欢迎咨询51作业君