程序代写案例-GU4206

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

GU4206-GR5206 Final Exam Practice Problems
Student (UNI)
Part 1: Warm Up
Recall the strikes dataset from Week 8 lecture notes. This data set, compiled by Bruce Western has information
on 18 countries over 35 years.
strikes <- read.csv("strikes.csv", as.is = TRUE)
dim(strikes)
## [1] 625 8
head(strikes, 3)
## country year strike.volume unemployment inflation left.parliament
## 1 Australia 1951 296 1.3 19.8 43
## 2 Australia 1952 397 2.2 17.2 43
## 3 Australia 1953 360 2.5 4.3 43
## centralization density
## 1 0.3748588 NA
## 2 0.3751829 NA
## 3 0.3745076 NA
In this problem, you will use the variable unemployment. The goal of this exercise is to take the following
inelegant nested-loop and condense it to as few of lines of code as possible. For full credit, perform the same
task in no more that two lines of code.
countries <- unique(strikes$country)
unemployment_statistic <- rep(NA,length(countries))
counter <- 1
for (c in countries) {
one_country <- strikes[strikes$country==c,]
unemployment_temp <- one_country[,c("unemployment")]
stat <- 0
for (i in 1:length(unemployment_temp)){
stat <- stat + unemployment_temp[i]/length(unemployment_temp)
}
unemployment_statistic[counter] <- stat
counter <- counter+1
}
data.frame(country=countries,unemployment=unemployment_statistic)
## country unemployment
## 1 Australia 3.5057143
## 2 Austria 2.5400000
## 3 Belgium 3.6466667
1
## 4 Canada 6.0428571
## 5 Denmark 5.7114286
## 6 Finland 2.5714286
## 7 France 3.1828571
## 8 Germany 3.1171429
## 9 Ireland 7.7714286
## 10 Italy 6.7257143
## 11 Japan 1.6028571
## 12 Netherlands 3.6914286
## 13 New.Zealand 1.0028571
## 14 Norway 1.4285714
## 15 Sweden 2.1371429
## 16 Switzerland 0.3285714
## 17 UK 3.4514286
## 18 USA 5.5428571
Write your solution below.
## Solution goes here -------
Part 2: Simulation
The goal of this section is to study the random variable
X = U1 + U2,
where U1 and U2 are independent uniform random variables, each over the unit interval. This section of the
final exam has five major components; simulate X = U1 + U2 directly, plot the pdf of X = U1 + U2, use the
inverse-transform method to simulate X, use the accept-reject method to simulate X, and use the simulated
random variable X to compute a Monte-Carlo integral of the function b(x) = sin(sin(x)), over [0, 2].
Perform the following tasks:
Part 2.i)
Simulate 10,000 cases of the random variable X = U1 + U2 directly by using runif() for U1 and U2 and
then adding the resulting vectors. Display the first 10 simulated values of X and use ggplot to construct a
histogram of the distribution of X. Use a Base R plot for partial credit.
## Solution goes here -------
Part 2.ii)
Using ggplot, plot the pdf of X over the range [−1, 3]. The pdf is given by the following piecewise function:
f(x) =

x 0 ≤ x < 1
2− x 1 ≤ x < 2
0 otherwise
For guidance on defining and plotting a piecewise function, please see Part 2.v.
2
## Solution goes here -------
Part 2.iii)
Use the inverse-transform method to simulate 10,000 draws of X. For full credit, perform the following
tasks:
a) Define a R function called F.cdf.inv which is the inverse of the cdf F (x). Test your function using the
points .1,.45,.7. More specifically, run the code F.cdf.inv(F.cdf(c(.1,.45,.7))), where F.cdf() is the
cdf.
b) Use the inverse-transform method to simulate 10,000 draws of X and display the first 10 simulated
draws.
c) Plot the 10,000 simulated cases using ggplot, i.e., plot a histogram. Also overlay the pdf f(x) on the
the histogram. For partial credit, use Base R graphics.
Note that the cumulative distribution function (cdf) of f(x) is
F (x) =

0 x < 0
1
2x
2 0 ≤ x < 1
− 12x2 + 2x− 1 1 ≤ x < 2
1 x ≥ 2
Note that the function F (x) appears to be non-invertible because of the quadratic terms. However, the
function is invertible over the domain of each piecewise interval.
Hint:
y = ax2 + bx+ c −→ x = − b2a ±
√
1
a
(y − c) +
(
b
2a
)2
Part 2.iii.a)
## Solution goes here -------
Part 2.iii.b)
## Solution goes here -------
Part 2.iii.c)
## Solution goes here -------
Part 2.iv)
Use the accept-reject method to simulate 10,000 draws of X. For full credit, perform the following tasks:
a) Clearly identify an easy to simulate distribution g(x).
b) Identify a suitable envelope function e(x) that satisfies
f(x) ≤ e(x) = g(x)/α, where 0 < α < 1.
c) Simulate 10,000 draws from the target distribution f(x) using the accept-reject algorithm. Display
the first 10 simulated values.
3
d) Using ggplot or Base R, construct a histogram of the simulated distribution with the pdf f(x)
overlayed on the plot.
Part 2.vi.a)
Part 2.vi.b)
## Solution goes here -------
Part 2.iv.c)
## Solution goes here -------
Part 2.iv.d)
## Solution goes here -------
Part 2.v)
The goal of this section is to use the simulated cases coming from target distribution f(x) to numerically
integrate b(x) via Monte-Carlo. You must draw directly from f(x) by using any of the previous three
methods. The integral of interest is:∫ 2
0
b(x)dx, where b(x) =
{
sin(sin(x)) 0 < x < 2
0 otherwise
To visualize b(x), see the code below.
x <- seq(-1,3,by=.01)
n.x <- length(x)
b <- function(x) {
out <- ifelse((x<=0)|(x>2),0,sin(sin(x)))
return(out)
}
plot_data <- data.frame(x=x,b=b(x))
library(ggplot2)
ggplot(data = plot_data) +
geom_abline(slope=0,intercept=0,linetype = "dashed")+
geom_line(mapping = aes(x = x, y = b),color="red")+
labs(title = expression(sin(sin(x))),y="b(x)")
4
0.0
0.2
0.4
0.6
0.8
−1 0 1 2 3
x
b(x
)
sin(sin(x))
## Solution goes here -----
Part 3: MLE and Bayes’ Posterior MCMC
Data Generating Process and Dataset:
Consider the simple linear regression model:
Yi = β0 + β1xi + i,
where i = 1, 2, . . . , 100 and i
iid∼ N(0, σ2 = 4). Note that we are interested in estimating β0, β1 and we are
assuming known variance σ2 = 4. The dataset of interest is MLE_MCMC_Problem.csv. The scatter
plot follows:
MCMC.data <- read.csv("MLE_MCMC_Problem.csv")
plot(MCMC.data$x,MCMC.data$y,xlab="x",ylab="y")
5
3 4 5 6 7
4
6
8
10
12
14
16
x
y
Estimation Procedure 1: Maximum Likelihood
The goal of this section is to estimate parameters β0 and β1 using maximum likelihood. The likelihood
function is
L(β|y1, . . . , yn) = L(β0, β1|y1, . . . , yn) =
100∏
i=1
f(yi|β0, β1).
The function f(yi|β0, β1) comes directly from our simple linear regression model.
Perform the following tasks:
Part 3.i)
Define the regular likelihood function L(β0, β1|y1, . . . , y100) in R. This should be a function of the vector(
β0 β1
)
. Name the likelihood function my.likeliood and evaluate your likelihood at the point c(0,0).
## Solution goes here
# my.likeliood(beta=c(0,0))
Part 3.ii)
Define the negative log-likelihood function −`(β0, β1|y1, . . . , y100) in R. This should be a function of the
vector
(
β0 β1
)
. Name the likelihood function my.NL.likeliood. Evaluate your negative log-likelihood at
the point c(0,0).
## Solution goes here
# my.NL.likelihood(beta=c(0,0))
6
Part 3.iii)
Check that your answer from problem (2) matches problem (1) by taking evaluating:
#exp(-1*my.NL.likelihood(beta=c(0,0)))
# Compare
#my.likelihood(beta=c(0,0))
Part 3.iv)
Use the nlm() function to compute the maximum likelihood estimates of β0, β1. Use the starting value
p=c(0,0). Compare your answer to the estimated coefficients produced from the linear model function lm().
Note: the least squares and maximum likelihood estimates are analytically the same in this scenario.
## Solution goes here
Estimation Procedure 2: Bayes’ MCMC
The goal of this section is to estimate parameters β0 and β1 using Bayesian techniques. Students are required
to preform the famous MCMC approach to simulate from target posterior distribution pi(β0, β1|y1, . . . , y100).
The empirical Bayes’ estimators βˆ0,B and βˆ1,B can then be computed from the simulated posterior. Note
that both posteriors are simulated simultaneously. To solve this problem, we fist choose a prior distribution
pi(β0, β1) and we must also set up the likelihood function
L(θ|y1, . . . , yn) = L(β0, β1|y1, . . . , yn) =
100∏
i=1
f(yi|β0, β1).
The likelihood was solved in problem (1).
Choice of Prior:
In ordinary linear regression, the objective function for least squares or maximum likelihood has no constraints
placed on our parameters. Thus the unknown parameters β0, β1 can theoretically take on any real numbers.
Consequently, a first natural choice of our prior should be a joint distribution that allows for real-valued
coefficients. In our setting, choose the prior distribution pi(β0, β1) to be bivariate normal with pdf defined by:
pi(β) = pi(β0, β1) = (2pi)−1 det(Σ)−1/2 exp
(
− 12(β − µ)
TΣ−1(β − µ)
)
, where β ∈ R2,
and
β =
(
β0 β1
)T
, µ =
(
µ0 µ1
)T
, Σ =
(
σ20 ρσ0σ1
ρσ0σ1 σ
2
1
)
.
This prior has parameters µ0, µ1, σ0, σ1, ρ. The following code, which is hidden in the knitted markdown pdf,
is the joint pdf of the bivariate normal. The density is plotted for parameter values µ0 = 0, µ1 = 0, σ0 =
1, σ1 = 1, ρ = 0.
Note: the code displayed (or hidden) below will not be used in the MCMC algorithm. Recall that the
proposal density, in our case g(β0, β1) = pi(β0, β1), cancels out in the Metropolis-Hastings ratio. This code is
included to give students a visual representation of our bivariate normal prior.
7
u1
−2
0
2
4u2
−2
0
2
4
f(u)
0.0
0.2
MCMC Setup
Suppose that a pilot study provided some prior knowledge to our application. Namely, the true intercept
should be near zero and the true slope is likely between 1 and 3.
Choose the prior pi(β0, β1) as bivariate normal with parameters µ0 = 0, µ1 = 2, σ0 = .25, σ1 = .5, ρ = 0.
To simulate from proposal pi(β0, β1), use the following function sim.bivariate.norm.
library(MASS)
sim.bivariate.norm <- function(n,mu0=0,mu1=0,sigma0=1,sigma1=1,rho=0) {
mu <- c(mu0,mu1)
Sigma <- matrix(c(sigma0^2,sigma0*sigma1*rho,sigma0*sigma1*rho,sigma1^2),
nrow=2)
return(mvrnorm(n = n, mu=mu, Sigma=Sigma))
}
Perform the following tasks:
Part 3.v)
Simulate 1 case and 10 cases from our prior pi(β0, β1). Recall the choice of parameters: µ0 = 0, µ1 = 2, σ0 =
.25, σ1 = .5, ρ = 0.
## Solution goes here
Part 3.vi)
Run the MCMC algorithm using the bivariate normal prior with µ0 = 0, µ1 = 2, σ0 = .25, σ1 = .5, ρ = 0. Run
10,000 MCMC iterations and display the first 20 simulated cases of your simulated chain β(t).
## Solution goes here
8
Part 3.vii)
Construct traceplots (or lineplots) of the simulated chains as a function of their iterations. There should be
two plots, one for β0 and one for β1.
## Solution goes here
Part 3.viii)
Remove the first 1000 simulated cases (10% burn-in) and display the simulated posterior pi(β0, β1|y1, . . . , y100).
To accomplish this, simply plot two histograms, one for β0 and one for β1. Also calculate the posterior Bayes’
estimators for both β0 and β1.
## Solution goes here
Part 3.ix)
Based on our prior information (β0 ≈ 0 and 1 < β1 < 3), choose a prior distribution that exhibits poor
mixing behavior. Note that you should still choose a bivariate normal for prior pi(β0, β1). Clearly specify
your choice of the prior and run 10,000 iterations of the MCMC algorithm. Also provide traceplots showing
the chain’s mixing behavior.
Solution
## Solution goes here
Part 3.x)
Remove the first 1000 simulated cases (10% burn-in) and display the simulated posterior pi(β0, β1|y1, . . . , y100).
To accomplish this, simply plot two histograms, one for β0 and one for β1. Also calculate the posterior Bayes’
estimators for both β0 and β1.
## Solution goes here
Part 3.xii)
Compare your Bayes estimators (for both priors) to the least squares estimator (or MLE). To assess the
quality of your estimators, compute the Euclidean norm:
‖βˆ − β‖.
Let the true coefficients be defined as β =
(
0 2
)T .
## Solution goes here
9

欢迎咨询51作业君