# 程序代写案例-STAT 8670

STAT 8670 Project
Wei Gao∗
Dept of Math and Stat, Georgia State University, Atlanta, GA 30302-4110, USA
1 Introduction
Bayesian Analysis is a formal method for combining prior beliefs with observed information.
It can fit very realistic but complicated models. In this project we will do Bayesian data analysis
using Marcov Chain Monte Carlo to speculate whether the difference between male income and
female income increased or decreased in USA from 2004 to 2009. The data sets we use are
median earnings of full-time, year-round workers 16 and older by sex and by state from year
2004 to year 2009. The data sets are downloaded from www.census.gov. For each year, we use
the difference (in thousand) between men’s earnings and women’s earnings for each state as our
observations. So there are 51 observations for each year.
Now we formulate our questions as:
• Is the mean of differences in year 2005 greater than the mean of differences in year 2004?
If so, how much?
• Is the mean of differences in year 2007 greater than the mean of differences in year 2006?
If so, how much?
• Is the mean of differences in year 2009 greater than the mean of differences in year 2008?
If so, how much?
2 Data Analysis
2.1 Comparison of differences in 2004 and 2005
The histograms of differences in 2004 and 2005 are given below.
∗Email: [email protected]
1
The conclusion is not so obvious from histograms. We may assume that the difference in
each year follows normal distribution N(µ, σ2). Let us assume that the difference in 2004 come
from N(µ1, σ
2
1) and the difference in 2005 come from N(µ2, σ
2
2). So our parameter vector is
θ = (µ1, σ1, µ2, σ2).
We will construct a likelihood function and a prior and then multiply them together to get
posterior P (θ|Data).
Let x denote the difference in 2004 and let y denote the difference in 2005. Then the likelihood
is given by the formula
P (Data|θ) = P (x|θ)P (y|θ) =
51∏
i=1
N(xi|µ1, σ21)
51∏
i=1
N(yi|µ2, σ22),
where the second equation holds since xi and yj are independent for i = 1, 2, . . . , 51 and
j = 1, 2, . . . , 51. We can write the likelihood function according to the above formula.
Then we need to decide on a prior density for θ = (µ1, σ1, µ2, σ2), and write it as a function.
The basic thinking is simply to choose priors for the µ1 and µ2 that do not seem to prejudge which
is bigger, and to choose priors that are sort of uniformish over all even remotely plausible values,
with the aim of letting the data produce any interesting features in the posterior distribution.
We choose the distribution of µ1 as N(9.7, 9.7
2) and choose the distribution of µ2 as N(10, 10
2),
where 9.7 is the sample mean of x and 10 is the sample mean for y. We choose the distributions
like this because we want the distribution as flat as possible, otherwise the prior will control the
posterior. Then we choose the distribution of σ1 as exponential with mean 9.7 and choose the
distribution of σ2 as exponential with mean 10. Thus all 4 variables are independent. Then the
prior is given by
2
P (µ1, σ1, µ2, σ2) = P (µ1)P (σ1)P (µ2)P (σ2)
=
1√
2pi
(9.72)−0.5e−
1
2·9.72 (µ1−9.7)
2 1
9.7
e−σ1/9.7
1√
2pi
(102)−0.5e−
1
2·102 (µ2−10)
2 1
10
e−σ2/10.
Next, we multiply prior times likelihood to get the posterior. We will run a Markov chain
using Metropolis for 10000 iterations to simulate a sample. First we choose a starting value
θ0 = (9.7, 9.7, 10, 10). Given a current state, we need to decide on a way to propose a “candidate”
move, then evaluate the posterior of candidate move and the posterior of current state and take
the ratio. We will record our results in a big matrix with 4 columns and 10000 rows. The first
row is the starting value and each successive row will record the next θ as we run the chain.
Now we have got a bunch of parameter vectors. Let’s first take a look at how the chain ran.
It looks like the first few hundred iterations may be noticeably influenced by our starting
values. Then we throw away the first 200 iterations and let the remaining be our new results.
Then we could base our inferences on the new results instead of the whole results matrix.
3
Our estimated posterior probability that µ1 − µ2 < 0 is about 0.81. Thus with high proba-
bility the mean of differences in year 2005 is greater than the mean of difference in year 2004.
2.2 Comparison of differences in 2006 and 2007
The histograms of differences in 2006 and 2007 are given below.
The conclusion is not so obvious from histograms. We may assume that the difference in
each year follow normal distribution N(µ, σ2). Let us assume that the difference in 2006 come
from N(µ1, σ
2
1) and the difference in 2007 come from N(µ2, σ
2
2). So our parameter vector is
θ = (µ1, σ1, µ2, σ2).
To get posterior, we still need to construct a likelihood function and a prior and then multiply
them together. The likelihood function we use is exactly the same as the likelihood function
in the previous subsection. To decide the prior density for θ = (µ1, σ1, µ2, σ2), we choose the
4
distribution of µ1 as N(9.9, 9.9
2) and choose the distribution of µ2 as N(10.5, 10.5
2), where 9.9 is
the sample mean of differences in 2006 and 10.5 is the sample mean for differences in 2007. Then
we choose the distribution of σ1 as exponential with mean 9.9 and choose the distribution of σ2
as exponential with mean 10.5. Next, we multiply prior times likelihood to get the posterior.
We still run a Markov chain using Metropolis for 10000 iterations to simulate a sample. This
time we choose a starting value θ0 = (9.9, 9.9, 10.5, 10.5). The results are shown below.
It looks like the first few hundred iterations may be noticeably influenced by our starting
values. Then we throw away the first 200 iterations and let the remaining be our new results.
Then we could base our inferences on the new results instead of the whole results matrix.
5
Our estimated posterior probability that µ1 − µ2 < 0 is about 0.94. Thus with high proba-
bility the mean of differences in year 2007 is greater than the mean of difference in year 2006.
2.3 Comparison of differences in 2008 and 2009
The histograms of differences in 2008 and 2009 are given below.
The conclusion is not so obvious from histograms. We may assume that the difference in
each year follow normal distribution N(µ, σ2). Let us assume that the difference in 2008 come
from N(µ1, σ
2
1) and the difference in 2009 come from N(µ2, σ
2
2). So our parameter vector is
θ = (µ1, σ1, µ2, σ2).
To decide the prior density for θ = (µ1, σ1, µ2, σ2), we choose the distribution of µ1 as
N(10.7, 10.72) and choose the distribution of µ2 as N(10.5, 10.5
2), where 10.7 is the sample
mean of differences in 2008 and 10.5 is the sample mean for differences in 2009. Then we
6
choose the distribution of σ1 as exponential with mean 10.7 and choose the distribution of σ2
as exponential with mean 10.5. Next, we multiply prior times likelihood to get the posterior.
We still run a Markov chain using Metropolis for 10000 iterations to simulate a sample. This
time we choose a starting value θ0 = (10.7, 10.7, 10.5, 10.5). The results are shown below.
It looks like the first few hundred iterations may be noticeably influenced by our starting
values. Then we throw away the first 200 iterations and let the remaining be our new results.
Then we could base our inferences on the new results instead of the whole results matrix.
7
Our estimated posterior probability that µ1 − µ2 < 0 is about 0.36. Thus with high proba-
bility the mean of differences in year 2009 is less than the mean of difference in year 2008.
3 Conclusion
According to our discussion in section 2, we get the conclusion that
• The mean of differences in year 2005 is greater than the mean of differences in year 2004.
• The mean of differences in year 2007 is greater than the mean of differences in year 2006.
• The the mean of differences in year 2009 is less than the mean of differences in year 2008.
Thus overall, the difference between male income and female income was increasing at the
beginning and then decreasing.
8
4 Reference
Dr. Jing Zhang’s lecture notes.
5 R Code
2 .1
dat=read . table ( f i l e=”income1 . txt ” , header=T, sep=’ , ’ )
summary( dat )
d i f =(dat [ ,2 ]− dat [ , 3 ] ) /1000
pre=d i f [ dat [ ,1 ]==1]
pos=d i f [ dat [ ,1 ]==2]
mean( pre )
mean(pos )
### draw a p i c t u r e
xlim =c (min( d i f ) ,max( d i f ) )
par ( mfrow=c ( 2 , 1 ) )
hist ( pre , 100 , col=” red ” , xlim=xlim )
hist (pos , 1 00 , col=” red ” , xlim=xlim )
### l i k e l i h o o d func t i on
l i k = function ( th ){
mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ]
prod (dnorm( pre ,mean=mu1, sd=s i g 1 ) )∗prod (dnorm(pos ,mean=mu2, sd=s i g 2 ) )
}
### pr io r func t i on
p r i o r = function ( th ){
mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ]
i f ( s i g 1 <= 0 | s i g 2 <= 0) return (0 )
dnorm(mu1 , 9 . 7 , 9 . 7 ) ∗dnorm(mu2, 1 0 , 1 0 )∗dexp( s ig1 , r a t e=1/ 9 . 7 )∗dexp( s ig2 , r a t e=1/10)
}
### po s t e r i o r f unc t i on
post = function ( th ){ p r i o r ( th ) ∗ l i k ( th )}
#Sta r t i n g va l u e s
mu1 = 9 . 7 ; s i g 1 = 9 . 7 ; mu2 = 10 ; s i g 2 = 10
9
th0=c (mu1 , s ig1 , mu2 , s i g 2 )
# Here i s what does the MCMC ( Metropo l i s method ) :
n i t =10000
r e s u l t s = matrix (0 , nrow=nit , ncol=4)
th = th0
r e s u l t s [ 1 , ] = th0
for ( i t in 2 : n i t ){
cand = th + rnorm(4 , sd=.3)
r a t i o = post ( cand )/post ( th )
i f ( runif (1)< r a t i o ) th=cand
r e s u l t s [ i t , ] = th
}
# Take a peek at what we got
edit ( r e s u l t s )
par ( mfrow=c ( 4 , 1 ) )
plot ( r e s u l t s [ , 1 ] )
plot ( r e s u l t s [ , 2 ] )
plot ( r e s u l t s [ , 3 ] )
plot ( r e s u l t s [ , 4 ] )
r e s=r e s u l t s [ 2 0 1 : 1 0 0 0 0 , ]
mu1s = r e s [ , 1 ]
s i g 1 s = r e s [ , 2 ]
mu2s = r e s [ , 3 ]
s i g 2 s = r e s [ , 4 ]
par ( mfrow=c ( 4 , 1 ) )
plot (mu1s )
plot ( s i g 1 s )
plot (mu2s )
plot ( s i g 2 s )
par ( mfrow=c ( 2 , 1 ) )
plot (mu1s−mu2s)
hist (mu1s−mu2s)
mean(mu1s−mu2s < 0)
[ 1 ] 0 .8119388
10
2 .2
dat=read . table ( f i l e=”income2 . txt ” , header=T, sep=’ , ’ )
summary( dat )
d i f =(dat [ ,2 ]− dat [ , 3 ] ) /1000
pre=d i f [ dat [ ,1 ]==1]
pos=d i f [ dat [ ,1 ]==2]
mean( pre )
mean(pos )
### draw a p i c t u r e
xlim =c (min( d i f ) ,max( d i f ) )
par ( mfrow=c ( 2 , 1 ) )
hist ( pre , 100 , col=” red ” , xlim=xlim )
hist (pos , 1 00 , col=” red ” , xlim=xlim )
### l i k e l i h o o d func t i on
l i k = function ( th ){
mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ]
prod (dnorm( pre ,mean=mu1, sd=s i g 1 ) )∗prod (dnorm(pos ,mean=mu2, sd=s i g 2 ) )
}
### pr io r func t i on
p r i o r = function ( th ){
mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ]
i f ( s i g 1 <= 0 | s i g 2 <= 0) return (0 )
dnorm(mu1 , 9 . 9 , 9 . 9 ) ∗dnorm(mu2 , 1 0 . 5 , 1 0 . 5 ) ∗dexp( s ig1 , r a t e=1/ 9 . 9 )∗dexp( s ig2 , r a t e=1/ 1 0 . 5 )
}
### po s t e r i o r f unc t i on
post = function ( th ){ p r i o r ( th ) ∗ l i k ( th )}
#Sta r t i n g va l u e s
mu1 = 9 . 9 ; s i g 1 = 9 . 9 ; mu2 = 1 0 . 5 ; s i g 2 = 10 .5
th0=c (mu1 , s ig1 , mu2 , s i g 2 )
# Here i s what does the MCMC ( Metropo l i s method ) :
n i t =10000
r e s u l t s = matrix (0 , nrow=nit , ncol=4)
th = th0
11
r e s u l t s [ 1 , ] = th0
for ( i t in 2 : n i t ){
cand = th + rnorm(4 , sd=.5)
r a t i o = post ( cand )/post ( th )
i f ( runif (1)< r a t i o ) th=cand
r e s u l t s [ i t , ] = th
}
# Take a peek at what we got
edit ( r e s u l t s )
par ( mfrow=c ( 4 , 1 ) )
plot ( r e s u l t s [ , 1 ] )
plot ( r e s u l t s [ , 2 ] )
plot ( r e s u l t s [ , 3 ] )
plot ( r e s u l t s [ , 4 ] )
r e s=r e s u l t s [ 2 0 1 : 1 0 0 0 0 , ]
mu1s = r e s [ , 1 ]
s i g 1 s = r e s [ , 2 ]
mu2s = r e s [ , 3 ]
s i g 2 s = r e s [ , 4 ]
par ( mfrow=c ( 4 , 1 ) )
plot (mu1s )
plot ( s i g 1 s )
plot (mu2s )
plot ( s i g 2 s )
par ( mfrow=c ( 2 , 1 ) )
plot (mu1s−mu2s)
hist (mu1s−mu2s)
mean(mu1s−mu2s < 0)
[ 1 ] 0 .9378571
2 .3
dat=read . table ( f i l e=”income3 . txt ” , header=T, sep=’ , ’ )
summary( dat )
12
d i f =(dat [ ,2 ]− dat [ , 3 ] ) /1000
pre=d i f [ dat [ ,1 ]==1]
pos=d i f [ dat [ ,1 ]==2]
mean( pre )
mean(pos )
### draw a p i c t u r e
xlim =c (min( d i f ) ,max( d i f ) )
par ( mfrow=c ( 2 , 1 ) )
hist ( pre , 100 , col=” red ” , xlim=xlim )
hist (pos , 1 00 , col=” red ” , xlim=xlim )
### l i k e l i h o o d func t i on
l i k = function ( th ){
mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ]
prod (dnorm( pre ,mean=mu1, sd=s i g 1 ) )∗prod (dnorm(pos ,mean=mu2, sd=s i g 2 ) )
}
### pr io r func t i on
p r i o r = function ( th ){
mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ]
i f ( s i g 1 <= 0 | s i g 2 <= 0) return (0 )
dnorm(mu1 , 1 0 . 7 , 1 0 . 7 ) ∗dnorm(mu2 , 1 0 . 5 , 1 0 . 5 ) ∗dexp( s ig1 , r a t e=1/ 1 0 . 7 )∗dexp( s ig2 , r a t e=1/ 1 0 . 5 )
}
### po s t e r i o r f unc t i on
post = function ( th ){ p r i o r ( th ) ∗ l i k ( th )}
#Sta r t i n g va l u e s
mu1 = 1 0 . 7 ; s i g 1 = 1 0 . 7 ; mu2 = 1 0 . 5 ; s i g 2 = 10 .5
th0=c (mu1 , s ig1 , mu2 , s i g 2 )
# Here i s what does the MCMC ( Metropo l i s method ) :
n i t =10000
r e s u l t s = matrix (0 , nrow=nit , ncol=4)
th = th0
r e s u l t s [ 1 , ] = th0
for ( i t in 2 : n i t ){
cand = th + rnorm(4 , sd=.4)
r a t i o = post ( cand )/post ( th )
i f ( runif (1)< r a t i o ) th=cand
13
r e s u l t s [ i t , ] = th
}
# Take a peek at what we got
edit ( r e s u l t s )
par ( mfrow=c ( 4 , 1 ) )
plot ( r e s u l t s [ , 1 ] )
plot ( r e s u l t s [ , 2 ] )
plot ( r e s u l t s [ , 3 ] )
plot ( r e s u l t s [ , 4 ] )
r e s=r e s u l t s [ 2 0 1 : 1 0 0 0 0 , ]
mu1s = r e s [ , 1 ]
s i g 1 s = r e s [ , 2 ]
mu2s = r e s [ , 3 ]
s i g 2 s = r e s [ , 4 ]
par ( mfrow=c ( 4 , 1 ) )
plot (mu1s )
plot ( s i g 1 s )
plot (mu2s )
plot ( s i g 2 s )
par ( mfrow=c ( 2 , 1 ) )
plot (mu1s−mu2s)
hist (mu1s−mu2s)
mean(mu1s−mu2s < 0)
[ 1 ] 0 .362551
14 Email:51zuoyejun

@gmail.com