辅导案例-CSC 229

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

BCS/CSC 229: Computer Models of Human Perception and Cognition
Homework Assignment #2
Instructions: Answer all questions below. Include all requested calculations and graphs.
Also include the Python code that you wrote to answer the questions. When writing text
or equations, please write NEATLY!
(0) (Part A) At the top of the document that you turn in, place your name and the date.
(Part B) Next, please take the honor pledge. That is, write (by hand using a pen): “I affirm
that I have not given or received any unauthorized help on this assignment, and that this
work is my own.” Then sign your name.
(1) [WARNING: This problem is mathematically challenging. Don’t be surprised if you
struggle with it. Indeed, it may be smart to first work on the other homework problems, and
then return to this problem if time permits.] (Problem 2.4 from the draft of the textbook
by Ma, Kording, and Goldreich) Many Bayesian inference problems involve a product of
two or more Gaussians. A convenient property of Gaussians is that their product is also
Gaussian. In this problem, we will lead you through an example to derive this property
yourself. Consider an observer who infers a stimulus s from a measurement x. Suppose that
the measurement distribution p(x|s) is a Gaussian distribution with standard deviation σ
and the prior distribution is a Gaussian with mean µ and standard deviation σs.
(a) Write down the equations for p(x|s) and p(s).
(b) Use Bayes’ rule to write down the equation for the posterior p(s|x). Substitute p(x|s)
and p(s), but do not simplify.
The numerator is a product of two Gaussians. The denominator p(x) is a normalization
factor that ensures that the integral equals 1. For now, we will ignore it and focus on the
numerator.
(c) Apply the rule eA eB = eA+B to simplify the numerator.
(d) Expand the two quadratic terms in the exponent.
1
(e) Rewrite the exponent to the form as2 + bs+ c.
(f) Show that any quadratic function of the form as2 + bs+ c can be written as:
a
(
s+
b
2a
)2
+ c− b
2
4a
.
This operation is known as “completing the square”.
(g) Rewrite your expression obtained in (e) by completing the square.
(h) Apply the rule eA eB = eA+B to rewrite this into the form
eZ e
− (s−µcombined)
2
2σ2
combined .
Express µcombined and σcombined in terms of x, σ, µ, and σs.
(i) Why is µcombined the same as the maximum-a-aposteriori (MAP) estimate of the stim-
ulus (i.e., the s that maximizes the posterior distribution p(s|x))?
(j) Recall that p(s|x) is a distribution and that its integral should therefore be equal
to 1. However, the expression that you obtained in (e) is not properly normalized because
we ignored p(x). Modify the expression such it is properly normalized, without using p(x)
(Hint: Does eZ depend on s?)
(2) (Problem 2.12 from the draft of the textbook by Ma, Kording, and Goldreich) An ob-
server infers a stimulus s from a measurement x. Let’s say that on a particular trial, the
measurement is x = 30. The measurement distribution p(x|s) is Gaussian with standard
deviation σ = 5. Assume a Gaussian stimulus distribution p(s) with mean 20 and standard
deviation 4; this also serves as the prior distribution. We are now going to calculate the
posterior pdf using Python.
(a) Define a vector of possible s-values: 0, 0.2, 0.4, . . . , 40.
(b) Compute the likelihood function and the prior on this vector of values of s. [Hint: The
values of the prior distribution will not sum to one (instead, they should sum to 1/stepsize
where stepsize = 0.2). That is because we are approximating a continuous distribution by a
discrete distribution. A similar comment applies to the likelihood function, though keep in
mind that the likelihood function is not a distribution, and thus its values do not need to
2
sum to one.]
(c) Multiply the likelihood and the prior. In Python, elementwise multiplication of two
vectors can be achieved using the “*” command.
(d) Divide this product by its sum over all s (normalization step).
(e) Convert this posterior probability mass function into a probability density function
by dividing by the step size you used in your vector of s-values (e.g., 0.2).
(f) Plot the likelihood, prior, and posterior in the same plot. Is the posterior wider
or narrower than the likelihood and prior? Do you expect this based on the equations we
discussed?
(g) Change the standard deviation of the measurement distribution to a very large value.
What happens to the posterior? Can you explain this?
(h) Change the standard deviation of the measurement distribution to a very small value.
What happends to the posterior? Can you explain this?
(3) (Problem 2.13 from the draft of the textbook by Ma, Kording, and Goldreich) Repeat
Question (2), but instead of using a single value of the measurement x, start with a fixed
value of s = 10. From this value of s, draw 10 values of x from the measurement distribution.
You should observe that, from trial to trial, the likelihood function and posterior probability
density function “jump around”. Observe how the posterior shifts under the influence of the
“jumping” likelihood function and stationary prior. Explain.
(4) (Problem 2.14 from the draft of the textbook by Ma, Kording, and Goldreich) Continuing
from Questions (2) and (3), generate a distribution of maximum-a-posteriori (MAP) and
maximum likelihood (ML) estimates by:
(a) drawing an s from the stimulus distribution;
(b) drawing a single x from the measurement distribution, and calculating the posterior
distribution.
(c) For each of 1000 repetitions of (a) and (b), plot the MAP estimate (y-axis) against
the true stimulus (x-axis). On a separate graph, plot the MLE (i.e., measurement x) against
the true stimulus.
3
(d) Repeat (a), (b), and (c) using different values of the noise standard deviation relative
to prior standard deviation. When the noise standard deviation is very small, the MAP and
MLE plots should look the same. Why? When the noise standard deviation is very large,
the MAP plot looks flat, whereas the MLE plot looks very scattered. Why?
(5) (Problem 3.7 from the draft of the textbook by Ma, Kording, and Goldreich) In Chapters
2 and 3 (of the Ma, Kording, and Goldreich textbook), we were able to derive analytical
expressions for the posterior distribution. For more complex psychophysical tasks, however,
analytical solutions often do not exist. In such a case, we can use numerical methods to
approximate the distribution of interest. To get some familiarity with this method, we
will reconsider the cue combination experiment described in this chapter, but we will now
compute the distribution of MAP estimates using numerical methods. We assume that the
experimenter introduces a cue conflict between the auditory and the visual stimuli: sA = 5
and sV = 10. The standard deviation of the auditory and of the visual noise is σA = 2 and
σV = 1, respectively. We assume a flat (uniform) prior over s.
(a) Randomly draw an auditory measurement xA and a visual measurement xV from
their respective distributions. (It’s okay if a measurement has a negative value.)
(b) Plot the corresponding elementary likelihood functions, p(xA|s) and p(xV |s), in one
figure.
(c) Calculate the combined likelihood function, p(xA, xV |s), by numerically multiplying
the elementary likelihood functions in Python. Plot this function.
(d) Calculate the posterior distribution by normalizing the combined likelihood function.
Plot this distribution in the same figure as the likelihood functions.
(e) Use Python to find the MAP estimate of s, i.e., the value of s at which the posterior
distribution is maximal.
(f) Compare with the MAP estimate of s computed from Eq. (3.3) using the measure-
ments drawn in (a). For convenience, here is Eq. (3.3):
sˆMAP =
xA
σ2A
+ xV
σ2V
1
σ2A
+ 1
σ2V
4
(g) In the above, we simulated a single trial and computed the observer’s MAP estimate
of s, given the noisy measurements on that trial. If an analytical solution does not exist
for the distribution of MAP estimates, we can repeat the above procedure many times to
approximate this distribution. Here, we practice this method even though an analytical
solution is available in this case. Draw 100 pairs (xA, xV ) and numerically compute the
observer’s MAP estimate for each pair as in (e).
(h) Compute the mean of the MAP estimates obtained in (g) and compare with the mean
estimate predicted using Eq. (3.5). For convenience, here is Eq. (3.5):
wA =
1
σ2A
1
σ2A
+ 1
σ2V
wV =
1
σ2V
1
σ2A
+ 1
σ2V
〈sˆ〉 = wAsA + wV sV
(i) Make a histogram of the MAP estimate (in Python, use the “numpy.histogram”
function).
(j) Relative auditory bias is defined as the mean MAP estimate minus the true auditory
stimulus, divided by the true visual stimulus minus the true auditory stimulus. Compute
relative auditory bias for your set of estimates.
5