辅导案例-MATH 4430/6602

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

York University
MATH 4430/6602 3.0AF (Stochastic Processes)
Class Project, due December 2, 2020
MATH 4430 students should form groups of 3-4 people. MATH 6602 students
should work individually. Try to make sure that somebody in each group has
some experience coding. Everybody in each group is expected to understand
everything the group does, so you are expected to talk to each other and explain
your contributions to each other. Each group should select one of the following
three projects.
You will need to carry out Monte Carlo simulations using a modern program-
ming language. I suggest you use R, which is a versatile and widely-used pro-
gramming language designed for statistics and simulation. It is open source and
free to download, either as R or R-Studio. The built-in documentation is good,
and there are many other introductions available on-line. If your group has a
really strong preference for using some other programming language (eg Python,
C++, Matlab) then you may do so. But even if you are proficient in one of those
languages, this would still be a good opportunity to try out R.
References: There is a bit of information about the Metropolis Algorithm and
the Gibbs Sampler in Chapter 2.6 of Rosenthal (though he chooses a rather dif-
ferent special case of the Gibbs sampler than I do). Much more detail (and an
introduction to R) can be found in Introducing Monte Carlo Methods with R by
Robert and Casella (Springer 2010), which is available electronically through the
York Library. I personally like Neal Madras’s book Lectures on Monte Carlo
Methods (AMS and Fields Institute, 2001), but you may not be able to find that
in electronic form.
1. Metropolis Algorithm for expectations
1.1. Introduction. The Metropolis algorithm gives a way of producing samples
from a target probability distribution µ on a discrete state space U with all µi > 0.
It is used in situations where it is dicult or impossible to directly sample from
µ (the way you would if you were generating random numbers from a normal
distribution).
It uses an irreducible transition matrix Q on U that is symmetric (ie Qij = Qji
for each i, j). Any such Q will work, but the art of using Metropolis lies in choosing
a Q that is likely to make the algorithm work quickly. We define new transition
probabilities as follows
pij =
(
qij min(1, µj/µi), j 6= i
1Pk 6=i qikmin(1, µk/µi), j = i.
1
2This looks complicated, so let me describe it a dynamic way: Starting from i use
Q to propose a state j to move to. If µj µi then accept that move. If µj < µi
then flip a biased coin so that we accept the move with probability µjµi < 1 and
reject the move otherwise (and just stay at state i). Or, what amounts to the
same thing, generate a uniform random number U on [0, 1] and accept the move
if U  µjµi .
For example, if all the µi’s are equal, then the target distribution is uniform on
U . In that case, P = Q and there is no rejection. Because Q is symmetric, it is
reversible with respect to the uniform distribution, so µ is an invariant distribu-
tion. You may find it strange to think that sampling from a uniform distribution
could be hard, but if U is a large complicated set (so large that we can’t count its
elements easily), this may in fact be the case. Note that we don’t actually need
to know the values of µi, but just the ratios
µj
µi
. So in the uniform case, we know
these ratios = 1, even if we can’t count the size of U precisely in order to find
µi =
1
#U .
The following refer to the general Metropolis algorithm, described above.
First task: Write out a proof that P is reversible with respect to µ. Conclude
that µ is an invariant probability distribution for P .
Second task: Write out a proof that P is irreducible, and that if µ isn’t perfectly
uniform then P is aperiodic. [Hint: Show that if i! j under Q then i! j under
P . For aperiodicity, consider a site where a transition could be rejected.]
Therefore if you pick any state to start in and run the chain for a long time n
then the random variable Xn will be sample from approximately the distribution
µ. Going further, Xn, Xn+1, . . . , Xn+m will all have distributions approximately
µ. They won’t be independent, but if m is also large, then we can still show thatX
i
f(i)µi ⇡ 1
m+ 1
mX
k=0
f(Xk).
So expectations with respect to the distribution µ can be worked out this way.
This often works well in practice, even though it is sometimes hard to get rigorous
bounds on how large n and m must be for the answers to be reliable.
1.2. Travelling Salesman Problem. Suppose we have N cities located at the
vertices of a graph. The graph has an edge between cities if it is possible to travel
between them, and we label the edge with the travel time. A (Hamiltonian) circuit
is a sequence of cities that starts and ends at some city, visits that city only at the
beginning and end, and visits every other city exactly once. The classic Travelling
Salesman Problem is to find the shortest such circuit. When N is large, this is
hard. Our problem will be a bit di↵erent: I want you to find the average travel
time of circuits. In other words, the expected travel time for a circuit chosen
uniformly from all circuits. Use Metropolis with µ = 1 (so no rejection).
3If every pair of cities is connected by an edge, we wouldn’t actually require
MCMC, because sampling uniformly from Hamiltonian circuits in a complete
graph isn’t hard. But if the graph is not complete, then that is no longer true.
So what I want you to do is the following:
(1) Label 20 cities as 1, 2, . . . , 20.
(2) Pick 5 pairs of cities {i, j} for which direct travel between i and j will be
forbidden (eg because of border restrictions or because there are no flights
between those cities).
(3) Generate random positive distances between the other 185 pairs of cities
{i, j} with i 6= j (and explain why is it 185). You should generate these
using random sampling from some distribution on the positive reals, with
a density.
(4) Choose some large n and m and use Metropolis to sample uniformly from
the circuits, and compute the average travel time (so f applied to a circuit
is the travel time).
(5) Check if your choice of n and m is robust enough, by trying this with 2n
and 2m instead, and seeing if your answers change much. If they do, try
again with larger values of n or m. Make sure you use the same travel
distances as before.
Of course, you need to start with a Q. I suggest that you use the following. Take
your state space to be sequences of 20 cities without any repetitions. If i is such a
sequence, then choose two of those cities at random and interchange them (unless
that would make us travel between one of the 5 forbidden pairs). For example, if
you pick cities 7 and 9, and they occur 3rd and 6th in the list respectively, then
the new proposed state will have city 7 as the 6th and city 9 as 3rd. You will have
to check that this doesn’t introduce direct travel between one of the 5 forbidden
pairs. If it does then you will reject the interchange, ie Q will leave the sequence
unchanged this step.
Third task: Check that this Q is symmetric
There is a problem here, namely whether Q is irreducible or not. On the
complete graph this chain would be, and the algorithm works well. On some
graphs it isn’t (for example, this approach fails badly if the graph is a lattice).
In practice it is hard to confirm irreducibility one way or the other. I think it
is unlikely that you’ll run into problems with the numbers I’ve chosen, but one
never knows. So just try it and hope for the best!
Fourth task: Code this up, carry out the above steps, and then write up a short
report (3-5 pages). Explain the problem, answer the questions I posed as the first
three tasks, explain what you did, and summarize your results, including what
parameters you settled on and how long the computations took. Include your
code as an appendix. Then e-mail me the report.
42. Metropolis Algorithm for Optimization
2.1. Introduction: This is the same as in the previous section, so read that and
carry out the first two tasks.
2.2. Knapsack Problem. This is another classic optimization problem. You
have N objects. Object i is worth vi and weighs wi. You must select which ones
of these to put into a knapsack. You want to maximize the value of the selected
objects, but there is a constraint. Namely that the total weight cannot exceed
some number W . In other words, we want
max{
X
i2I
vi | I ⇢ {1, . . . , N},
X
i2I
wi  W}.
If N is large, this is hard. In particular, if N is so large that you can’t just
compute all the possibilities and pick the best, you need to do something di↵erent.
We’ll try instead to compute a random list of possibilities to try, that stands a
reasonable chance of finding the best value, or at least, of coming close to it. The
answer you’ll report will be the best choice among the ones you’ve seen.
Your states will be N -tuples of 0’s and 1’s (z1, z2, . . . , zN). zi = 0 means that
the ith object isn’t in the knapsack. zi = 1 means it is. Let S be the set of states
satisfying the weight constraint, ie such that
P
i ziwi  W . The Q chain will pick
i uniformly from {1, . . . , N} and will try changing zi to 1 zi (ie if the ith object
is in, we take it out, and otherwise we put it in). If this violates the constraint on
total weight (ie if it tries to move outside S) then the Q chain rejects the move,
and the state stays unchanged for that step.
Third task: Check that Q is symmetric and irreducible as a Markov chain on S.
Of course, not every way of choosing random samples is going to work out in a
reasonable amount of time. There are reasons to believe that sampling from the
following distribution is often a good choice:
µi = Ce

P
zivi
for i 2 S. Here > 0 is a parameter. Note that one of the nice things about
Metropolis is that we don’t need to work out the constant C that makes µ into a
probability.
A quick calculation (you should do this) shows that if Q proposes moving from
z to y by switching the ith coordinate (ie if yi = 1 zi and for other j’s, yj = zj),
then
µy
µz
= e(12zi).
Therefore the algorithm always allows you put in an object that doesn’t violate
the weight constraint (as this ratio is > 1), but it sometimes rejects attempts to
take out objects.
5What I want you to do is the following:
(1) Label 40 objects as 1, 2, . . . , 40.
(2) Generate random values and weights for these, by sampling from a uniform
distribution on [0, 1].
(3) Take a weight restriction of W = 10. Take = 1. Pick some large n
and some initial choice of a state in S. Use Metropolis to generate states
X1, . . . , Xn, keeping track of the current best choice. Then report that
choice, its value, and its weight.
(4) Check if your choice of n is robust enough, by trying this again with 2n,
and seeing if your answers change much. If it doesn’t, try again with a
larger value of n. Make sure you use the same weights and values as before.
Fourth task: Code this up, carry out the above steps, and then write up a short
report (3-5 pages). Explain the problem, answer the questions I posed as the first
three tasks, explain what you did, and summarize your results, including what
parameters you settled on and how long the computations took. Include your
code as an appendix. Then e-mail me the report.
3. Gibbs Sampler
3.1. Introduction. This project assumes that you’ve had at least some exposure
to Bayesian statistics.
Its objective is to obtain random samples from the posterior distribution (ie
the distribution of the model parameters, conditional on the data). With that
one can find things like point estimates of the parameters (posterior means), or
confidence intervals for the parameters (though that is more complicated since
the samples won’t be independent). In most natural applications one deals with
conditional densities, which would move us a bit outside the range of things we’ve
discussed so far, since for us (so far) all chains have been discrete. But I’ll pose
a discrete version, at the expense of making it a bit artificial. It is similar, at a
purely formal level anyway, to a real application of the Gibbs Sampler where the
data is the failure time of pumps in the nuclear power plant Farley I. The actual
application is extensively discussed in Madras’s book.
We’ll have data X1, . . . , XN where each Xi is sampled from a distribution with
unknown parameter yi. The parameters have a complicated prior distribution,
corresponding to first choosing an additional parameter z and then choosing
y1, . . . , yN independently of each other but in a way that depends on z. Once
we observe the data, we wish to sample from the posterior distribution, which
tells us the conditional distribution of the parameters z, y1, . . . , yN given the data.
This kind of thing is called a Bayesian Hierarchical model, and the algorithm we’ll
use on it is called a random scan Gibbs sampler.
More precisely, suppose a random variable Z is generated according to some
discrete distribution ↵z, so P (Z = z) = ↵z > 0 for the z’s under consideration.

欢迎咨询51作业君