程序代写案例-MATH7501-Assignment 3

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

MATH7501 Assignment 3 Semester 1, 2021 Due 27/5/2021
You can use Mathematica as an aid for many of the computations, however make sure to do
hand calculations where suitable as well.
1. Consider the exponential distribution with probability density function f(x) = λe−λx
defined on x ≥ 0 and with parameter λ > 0.
(a) Show that f(x) is a valid probability density function by showing that the integral
over [0,∞) is unity.
(b) Use integration to show that the mean of the distribution is 1λ .
(c) Use integration to show that the variance of the distribution is 1
λ2
.
(d) Determine the median of the distribution. The median is the number M such that,∫ M
0
f(x) dx =
1
2
.
(e) The quantile function of the distribution, q(u) for u ∈ [0, 1), is defined as follows: For
each u, we should have, ∫ q(u)
0
f(x) dx = u.
Determine an expression for q(u).
(f) Say that U is a uniformly distributed random variable on [0, 1]. If you set a new
random variable X, via X = q(U), then the distribution of X is exponential (for q(·)
evaluated for an exponential distribution as in the item above). Show this empirically
for λ = 3 by generating 106 uniform random variables, and comparing the empirical
quantile of this data with q(·).
2. Consider the normal probability distribution with parameters µ ∈ R and σ > 0. The
probability density is,
f(x) =
1
σ
√
2pi
e−
(x−µ)2
2σ2 .
(a) Showing that f(x) is a valid probability density function is not immediate. Do this
first numerically for µ = 2 and σ = 3 by approximating the integral via a discretiza-
tion sum over 100, 1, 000, 10, 000, and 105 terms. You should observe that as the
number of terms grows, the value of the sum approaches 1.
(b) Use integration to show that the mean of the distribution is µ.
(c) Use integration to show the variance of the distribution is σ2.
(d) The k’th moment of the distribution, denoted mk for k = 1, 2, 3, . . ., is
mk =
∫ ∞
−∞
xkf(x) dx.
Based on the previous items, m0 = 1, m1 = µ, and m2 = µ
2 + σ2. Show that for
higher valued k > 2, we have,
mk = µmk−1 + (k − 1)σ2mk−2.
(e) Use this recurrence relation to compute m4 for µ = 2 and σ = 3. Compare this
value to a numerical computation of the integral both using a discretization, and
Mathematica’s in-built, NIntegrate[] function.
1 of 2
MATH7501 Assignment 3 Semester 1, 2021 Due 27/5/2021
(f) Show analytically that, ∫ ∞
−∞
f(x) dx = 1.
You may use material from the lecture or online material, but must justify and explain
your calculations.
3. Assume you are presented with univariate data of a random sample, x1, . . . , xn and wish
to find a single number, x∗ that summarizes x1, . . . , xn as best as possible. One way to
specify this in terms of a loss function is to seek a value x∗ that minimizes,
L(u) =
n∑
i=1
(xi − u)2.
Analytically, it is very easy to show that x∗ =
∑n
i=1 xi/n, the sample mean. Nevertheless,
it is good to see how this number can be reached via a gradient descent algorithm. Set
η > 0, start with some arbitrary initial x(0). Then you get a sequence of points x(t), for
t = 1, 2, 3, . . . via,
x(t+ 1) = x(t)− η∇L(x(t)).
(a) Show that,
x(t) = αtx(0) + β
1− αt
1− α .
for some α and β (specify these values in terms of of the problem parameters and
data).
(b) Determine the range of η values for which x(t) will converge to x∗.
4. Consider the simple linear regression problem where you are presented with data points
(x1, y1), . . . , (xn, yn). You seek β0 and β1 to fit the line,
y = β0 + β1x,
by minimizing the loss function
L(β0, β1) =
n∑
i=1
(yi − β0 − β1xi)2.
This minimization is often carried out by methods others than gradient descent, but for
the purposes of this exercise you will use gradient descent.
(a) Compute an expression for the gradient ∇L(β0, β1).
(b) Return to question 5 from Assignment 1. In that question you dealt with data points,
(2.4, 3.1), (4.7, 2.7), (4.9, 4.8), (2.9, 7.6), (8.1, 5.4),
and fit a line parameterized by β0 and β1. Do this now numerically using a gra-
dient descent algorithm using the expression for the gradient you developed above.
Make sure that the learning rate (step size rate), η is small enough for convergence.
Illustrate your numerical experiments.
2 of 2

欢迎咨询51作业君