程序辅导案例 > Program >

代写辅导接单-ARIN7101 Statistics in Artificial Intelligence (2023 Fall)

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

THE UNIVERSITY OF HONG KONG

sonnino

DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE

ARIN7101 Statistics in Artificial Intelligence (2023 Fall)

Assignment 1, due on October 15

All numerical computation MUST be conducted in Python, and attach the Python code.

1. Question 1 (Bayesian inference, variational inference and sampling) Let y = {y1,...,yn} be i.i.d. samples from the normal distribution N(μ,⌧?1). We specify normal-gamma prior distributions on μ and ⌧,

μ|⌧,μ0,?0 ⇠N(μ0,(?0⌧)?1), ⌧|a0,b0 ⇠ Gamma(a0,b0),

ba a?1 ?bx fGamma(x|a,b) = ?(a)x e

For a pair of random variables (X,T), if X|T ⇠ N(μ,(?T)?1) and T ⇠ Gamma(a, b), then (X, T ) follows a normal-gamma distribution with parame- ters (μ, ?, a, b). The joint probability density function of (X, T ) has the form

bap? a?1 ?bt ✓ ?t(x?μ)2◆ f(x,t|μ,?,a,b) = ?(a)p2⇡t 2 e exp ? 2 .

For the Python programming questions, we set μ0 = 0, ?0 = 10, a0 = b0 = 10 and observations y = {y1, . . . , yn} are stored in Q1y.csv.

(a) Derive the joint prior (μ, ⌧ ) and likelihood function p(y|μ, ⌧ ). Write down the probability density function of the posterior distribution (μ,⌧|y) (no need to derive the exact distribution)

(b) In fact, for normal distributed data with unknown mean and precision (inverse of variance), the normal-gamma prior is a conjugate prior and the posterior (μ,⌧|y) is also a normal-gamma distribution with param!eters

? 0 μ 0 + n y ̄ n 1 Xn ? 0 n ( y ̄ ? μ 0 ) 2 ?0+n ,?0+n,a0+2,b0+2 (yi?y ̄)2+ 2(?0+n) ,

i=1 where y ̄ = Pni=1 yi is the sample mean.

Derive the full conditional posterior distribution μ|y,⌧ and ⌧|y,μ (need to obtain the exact distribution)

(c) Write down the probability density function of the posterior predictive distribution y⇤|y. Describe how to approximate p(y⇤|y) via the simple Monte Carlo approach.

(d) The mode of Normal-Gamma(μ, ?, a, b) is ⇣μ, a? 12 ⌘. Consider the Laplace b

approximation on the joint posterior (μ,⌧|y):

ln ⇡(✓|y) ⇡ ln ⇡(✓MAP|y)) ? 12(✓ ? ✓MAP)|A(✓ ? ✓MAP) = ln ⇡ ̃(✓|y), A = ?OO ln ⇡(✓|y)|✓=✓MAP ,

where ✓ = (μ,⌧)|.

Derive the approximated posterior distribution ⇡ ̃(μ, ⌧ |y). Draw two con- tour plots of ⇡(μ, ⌧ |y) and ⇡ ̃(μ, ⌧ |y) respectively in Python. (Python scipy.stats package does not provide direct functions to calculate the pdf of the normal-gamma distribution. You need to calculate it by yourself)

(e) Assume a mean-field variational inference for the joint posterior q(μ, ⌧ ) = qμ(μ)q⌧(⌧). Find the optimal mean-field factor qμ⇤ and q⌧⇤. Write down the procedures to iteratively update the parameters of qμ⇤ and q⌧⇤ and implement them in Python to obtain the estimated parameters of qμ⇤ and q⌧⇤ (set convergence criterion ✏ = 10?4)

(Hints: qj(✓j)/exp{Eqi6=j[lnP(D,✓)]) 2. Question 2 (Regularization)

| i.i.d. 2 Consider the simple linear regression yi = ? xi + ✏i, ✏i ⇠ N(0, ?✏ ), i =

1,...,n, where n is the number of samples, and the residual sum of squares

loss,

Xn |2 |

RSS(?)= (yi ?? xi) =(y?X?) (y?X?).

i=1

(a) Under the assumption that X|X = diag(?12,...,?p2), where p is the number of covariates in X, derive the closed-form formula for the LASSO

regression,

?ˆLASSO =argminRSS(?)+?k?k1, ?

as the function of X, y, ? and (?12, . . . , ?p2) (do not include ?ˆOLS in your final results).

(b) The dataset q2 train.csv and q2 test.csv store age, weight, height, and several body circumference measurements for 252 men. Use the ‘brozek’ as the response variable (y) and the other variables as predictors (x) in the linear regression model.

Normalize the training and test datasets by estimating sample mean and variance from the training dataset. Set ? = 1e ? 4 for the learning rate of the proximal gradient method with convergence criteria ✏ = 1e ? 7. Plot the estimated coe?cients for the ridge regression and LASSO re- gression, respectively, against ? 2np.linspace(0, 350, 1001).

(c) Given the LASSO regression results in (b), what’s the range of ? if you want to include 4 predictors in the linear regression model? Which four predictors would you choose?

(d) Find the optimal ? 2np.linspace(0, 350, 1001) which can yield the lowest loss on the test dataset for the LASSO regression. Which predictors are included in the model for the optimal ??

3. Question 3 (Multiple Hypothesis Testing)

The dataset q3 pvalues.csv stores two hundred p-values for multiple testings. Consider conducting large-scale hypothesis testing to control the FWER or FDR.

(a) Conduct Bonferroni’s correlation, Holm’s procedure, and Benjamini Hochberg’s procedure under ↵ = 0.05 and q = 0.05 then compare the results.

(b) Overlay the not rejected and rejected p-values, and the corresponding criterion curve in a plot for Bonferroni’s correlation, Holm’s procedure, and Benjamini Hochberg’s procedure separately. Denote the not re- jected and rejected p-values by di↵erent colors. (You can use log scale for demonstration and the plots in our lecture notes are good examples.)