MAST30027: Modern Applied Statistics Assignment 4, 2022. Due: 11:59pm Sunday October 23rd • This assignment is worth 17% of your total mark. • To get full marks, show your working including 1) R commands and outputs you use, 2) mathematics derivation, and 3) rigorous explanation why you reach conclusions or answers. If you just provide final answers, you will get zero mark. • The assignment you hand in must be typed (except for math formulas), and be submitted using LMS as a single PDF document only (no other formats allowed). For math formulas, you can take a picture of them. Your answers must be clearly numbered and in the same order as the assignment questions. • The LMS will not accept late submissions. It is your responsibility to ensure that your assignments are submitted correctly and on time, and problems with online submissions are not a valid excuse for submitting a late or incorrect version of an assignment. • We will mark a selected set of problems. We will select problems worth ≥ 50% of the full marks listed. • If you need an extension, please contact the tutor coordinator before the due date with appropriate justification and supporting documents. Late assignments will only be accepted if you have obtained an extension from the tutor coordinator before the due date. Under no circumstances an assignment will be marked if solutions for it have been released. Please DO NOT email the lecturer for extension request. • Also, please read the “Assessments” section in “Subject Overview” page of the LMS. Data: The file assignment4.txt contains 100 observations simulated from a normal distribution with mean = 5 and standard deviation = 2 by using the following code. > set.seed(30027) > x = rnorm(100, 5, 2) You can read the data as follows. > X = scan(file="assignment4.txt", what=double()) Read 100 items > mean(X) [1] 5.089332 > sqrt(var(X)) [1] 1.998487 Model: we consider a normal model: xi ∼ N(µ, σ2) for i = 1, . . . , 100. Prior: we impose the following prior for mean and variance parameters: p(µ, σ2) ∝ 1 σ2 1 Problem 1: Posterior inference using Gibbs sampling (a) (10 marks) Derive the following conditional distributions. p(µ|σ2, x1, . . . , x100) and p(σ2|µ, x1, . . . , x100). If they are known distributions, write distribution names and their parameters. [ For exam- ple, gamma distribution with shape = ∑ i xi and scale = ∑ i x 2 i ] . (b) (5 marks)Write a code that uses the Gibbs sampling to simulate samples from p(µ, σ2|x1, . . . , x100). Run at least two Gibbs sampling chains with different initial values. Please run with at least 500 iterations. Make a trace plot for each of parameters and see if samples from different chains are mixed well and behave similarly. (c) (7 marks) Using the simulated sample from one chain, for each parameter 1) make a plot that shows empirical (estimated) marginal posterior distribution, 2) estimate marginal posterior mean, and 3) report a 90% credible interval for the marginal posterior distribution. You can find a 90% credible interval in a number of ways. For this assignment, use 5% in each tail. Discard early iterations as a burn-in. You can decide burn-in period from the trace plots in (b). Problem 2: Posterior inference using Metropolis-Hastings (MH) algorithm (a) (15 marks)Write a code that uses the MH algorithm to simulate samples from p(µ, σ2|x1, . . . , x100). For the current values of parameters (µc, σ 2 c ), we propose new values (µn, σ 2 n) as follows. σ 2 n ∼ gamma(shape = 5σ2c , rate = 5) and µn ∼ Normal(mean = µc, variance = σ2n). Run at least two MH chains with different initial values. Please run with at least 10000 iterations. Make a trace plot for each of parameters and see if samples from different chains are mixed well and behave similarly. (b) (7 marks) Repeat (c) in the Problem 1. The application of Variational Inference (VI) to the current model and prior is not straightforward because the ELBO is not well defined with the improper prior (that does not integrate to a finite quantity). Thus, we will consider another prior and apply VI for the posterior inference. Model: we consider the same normal model: xi ∼ N(µ, σ2) for i = 1, . . . , 100. Prior: we impose the following prior for mean and variance parameters: p(µ, σ2) = p(µ|σ2)p(σ2), µ|σ2 ∼ N(µ0, σ2), σ2 ∼ Inverse-Gamma(a0, b0), where the Inverse-Gamma(a0, b0) has the pdf f(x) = ba00 Γ(a0) x−a0−1 exp ( −b0 x ) . If necessary, you can use the following two properties of Inverse-Gamma(a0, b0) without providing derivation. When X ∼ Inverse-Gamma(a0, b0), E[ 1 X ] = a0 b0 , E[logX] = log b0 −Ψ(a0), where Ψ(a0) = d da0 log Γ(a0). 2 Problem 3: Posterior inference using Variational Inference (VI) We will apply VI with the mean-field variational family where q(µ, σ2) = qµ(µ)qσ2(σ 2) and use the CAVI algorithm for optimisation. The CAVI iteratively optimises each factor as follows while holding the other factors fixed: q∗µ(µ) ∝ exp{Eσ2 [log p(µ, σ2, x1, . . . , x100)]}, q∗σ2(σ 2) ∝ exp{Eµ[log p(µ, σ2, x1, . . . , x100)]}, where the expectations Eσ2 and Eµ are taken with respect to q∗σ2(σ 2) and q∗µ(µ), respectively. (a) (10 marks) Derive q∗µ(µ) and q ∗ σ2(σ 2) and write the corresponding distribution names and their parameters. [ For example, for the model and prior we considered in the lecture - see the page 20 of the Variational Inference slides, we derived and presented that q∗µ(µ) is the pdf of N(µ∗, σ2∗) and q∗τ (τ) is the pdf of Gamma(a ∗, b∗), where µ∗ = λ0µ0+nx¯λ0+n , σ 2∗ = 1(λ0+n)Eτ [τ ] , Eτ [τ ] = a ∗ b∗ , a ∗ = n+12 + a0, b ∗ = b0 + ∑ i Eµ[(xi−µ)2] 2 + λ0Eµ[(µ−µ0)2] 2 , Eµ[(xi − µ)2] = σ2∗ + (µ∗)2 − 2xiµ∗ + x2i , and Eµ[(µ− µ0)2] = σ2∗ + (µ∗)2 − 2µ0µ∗ + µ20. ] (b) (10 marks) Derive the ELBO up to constant. (c) (10 marks) Implement the CAVI algorithm and obtain q∗µ(µ) and q ∗ σ2(σ 2) which minimise the KL divergence by applying the implemented algorithm to x1, . . . , xn for µ0 = 0, a0 = 2, b0 = 2. Set the CAVI algorithms to stop when either the number of iterations reaches 100 (max.iter = 100) or the ELBO has changed by less than 0.00001 ( = 0.00001). Run the CAVI algorithm at least two times using different initial values and report q∗µ(µ) and q∗σ2(σ 2) with the highest ELBO. Provide the initial values which lead the reported q∗µ(µ) and q∗σ2(σ 2) and we will probably run our own implementation using the initial values and see if we can reproduce your answer when marking. For each run, check that the ELBO increase at each iteration by plotting them. 3
欢迎咨询51作业君