MATH7501 Assignment 3 Semester 1, 2021 Due 27/5/2021 You can use Mathematica as an aid for many of the computations, however make sure to do hand calculations where suitable as well. 1. Consider the exponential distribution with probability density function f(x) = λe−λx defined on x ≥ 0 and with parameter λ > 0. (a) Show that f(x) is a valid probability density function by showing that the integral over [0,∞) is unity. (b) Use integration to show that the mean of the distribution is 1λ . (c) Use integration to show that the variance of the distribution is 1 λ2 . (d) Determine the median of the distribution. The median is the number M such that,∫ M 0 f(x) dx = 1 2 . (e) The quantile function of the distribution, q(u) for u ∈ [0, 1), is defined as follows: For each u, we should have, ∫ q(u) 0 f(x) dx = u. Determine an expression for q(u). (f) Say that U is a uniformly distributed random variable on [0, 1]. If you set a new random variable X, via X = q(U), then the distribution of X is exponential (for q(·) evaluated for an exponential distribution as in the item above). Show this empirically for λ = 3 by generating 106 uniform random variables, and comparing the empirical quantile of this data with q(·). 2. Consider the normal probability distribution with parameters µ ∈ R and σ > 0. The probability density is, f(x) = 1 σ √ 2pi e− (x−µ)2 2σ2 . (a) Showing that f(x) is a valid probability density function is not immediate. Do this first numerically for µ = 2 and σ = 3 by approximating the integral via a discretiza- tion sum over 100, 1, 000, 10, 000, and 105 terms. You should observe that as the number of terms grows, the value of the sum approaches 1. (b) Use integration to show that the mean of the distribution is µ. (c) Use integration to show the variance of the distribution is σ2. (d) The k’th moment of the distribution, denoted mk for k = 1, 2, 3, . . ., is mk = ∫ ∞ −∞ xkf(x) dx. Based on the previous items, m0 = 1, m1 = µ, and m2 = µ 2 + σ2. Show that for higher valued k > 2, we have, mk = µmk−1 + (k − 1)σ2mk−2. (e) Use this recurrence relation to compute m4 for µ = 2 and σ = 3. Compare this value to a numerical computation of the integral both using a discretization, and Mathematica’s in-built, NIntegrate[] function. 1 of 2 MATH7501 Assignment 3 Semester 1, 2021 Due 27/5/2021 (f) Show analytically that, ∫ ∞ −∞ f(x) dx = 1. You may use material from the lecture or online material, but must justify and explain your calculations. 3. Assume you are presented with univariate data of a random sample, x1, . . . , xn and wish to find a single number, x∗ that summarizes x1, . . . , xn as best as possible. One way to specify this in terms of a loss function is to seek a value x∗ that minimizes, L(u) = n∑ i=1 (xi − u)2. Analytically, it is very easy to show that x∗ = ∑n i=1 xi/n, the sample mean. Nevertheless, it is good to see how this number can be reached via a gradient descent algorithm. Set η > 0, start with some arbitrary initial x(0). Then you get a sequence of points x(t), for t = 1, 2, 3, . . . via, x(t+ 1) = x(t)− η∇L(x(t)). (a) Show that, x(t) = αtx(0) + β 1− αt 1− α . for some α and β (specify these values in terms of of the problem parameters and data). (b) Determine the range of η values for which x(t) will converge to x∗. 4. Consider the simple linear regression problem where you are presented with data points (x1, y1), . . . , (xn, yn). You seek β0 and β1 to fit the line, y = β0 + β1x, by minimizing the loss function L(β0, β1) = n∑ i=1 (yi − β0 − β1xi)2. This minimization is often carried out by methods others than gradient descent, but for the purposes of this exercise you will use gradient descent. (a) Compute an expression for the gradient ∇L(β0, β1). (b) Return to question 5 from Assignment 1. In that question you dealt with data points, (2.4, 3.1), (4.7, 2.7), (4.9, 4.8), (2.9, 7.6), (8.1, 5.4), and fit a line parameterized by β0 and β1. Do this now numerically using a gra- dient descent algorithm using the expression for the gradient you developed above. Make sure that the learning rate (step size rate), η is small enough for convergence. Illustrate your numerical experiments. 2 of 2
欢迎咨询51作业君