STAT 8670 Project

Wei Gao∗

Dept of Math and Stat, Georgia State University, Atlanta, GA 30302-4110, USA

1 Introduction

Bayesian Analysis is a formal method for combining prior beliefs with observed information.

It can fit very realistic but complicated models. In this project we will do Bayesian data analysis

using Marcov Chain Monte Carlo to speculate whether the difference between male income and

female income increased or decreased in USA from 2004 to 2009. The data sets we use are

median earnings of full-time, year-round workers 16 and older by sex and by state from year

2004 to year 2009. The data sets are downloaded from www.census.gov. For each year, we use

the difference (in thousand) between men’s earnings and women’s earnings for each state as our

observations. So there are 51 observations for each year.

Now we formulate our questions as:

• Is the mean of differences in year 2005 greater than the mean of differences in year 2004?

If so, how much?

• Is the mean of differences in year 2007 greater than the mean of differences in year 2006?

If so, how much?

• Is the mean of differences in year 2009 greater than the mean of differences in year 2008?

If so, how much?

2 Data Analysis

2.1 Comparison of differences in 2004 and 2005

The histograms of differences in 2004 and 2005 are given below.

∗Email: [email protected]

1

The conclusion is not so obvious from histograms. We may assume that the difference in

each year follows normal distribution N(µ, σ2). Let us assume that the difference in 2004 come

from N(µ1, σ

2

1) and the difference in 2005 come from N(µ2, σ

2

2). So our parameter vector is

θ = (µ1, σ1, µ2, σ2).

We will construct a likelihood function and a prior and then multiply them together to get

posterior P (θ|Data).

Let x denote the difference in 2004 and let y denote the difference in 2005. Then the likelihood

is given by the formula

P (Data|θ) = P (x|θ)P (y|θ) =

51∏

i=1

N(xi|µ1, σ21)

51∏

i=1

N(yi|µ2, σ22),

where the second equation holds since xi and yj are independent for i = 1, 2, . . . , 51 and

j = 1, 2, . . . , 51. We can write the likelihood function according to the above formula.

Then we need to decide on a prior density for θ = (µ1, σ1, µ2, σ2), and write it as a function.

The basic thinking is simply to choose priors for the µ1 and µ2 that do not seem to prejudge which

is bigger, and to choose priors that are sort of uniformish over all even remotely plausible values,

with the aim of letting the data produce any interesting features in the posterior distribution.

We choose the distribution of µ1 as N(9.7, 9.7

2) and choose the distribution of µ2 as N(10, 10

2),

where 9.7 is the sample mean of x and 10 is the sample mean for y. We choose the distributions

like this because we want the distribution as flat as possible, otherwise the prior will control the

posterior. Then we choose the distribution of σ1 as exponential with mean 9.7 and choose the

distribution of σ2 as exponential with mean 10. Thus all 4 variables are independent. Then the

prior is given by

2

P (µ1, σ1, µ2, σ2) = P (µ1)P (σ1)P (µ2)P (σ2)

=

1√

2pi

(9.72)−0.5e−

1

2·9.72 (µ1−9.7)

2 1

9.7

e−σ1/9.7

1√

2pi

(102)−0.5e−

1

2·102 (µ2−10)

2 1

10

e−σ2/10.

Next, we multiply prior times likelihood to get the posterior. We will run a Markov chain

using Metropolis for 10000 iterations to simulate a sample. First we choose a starting value

θ0 = (9.7, 9.7, 10, 10). Given a current state, we need to decide on a way to propose a “candidate”

move, then evaluate the posterior of candidate move and the posterior of current state and take

the ratio. We will record our results in a big matrix with 4 columns and 10000 rows. The first

row is the starting value and each successive row will record the next θ as we run the chain.

Now we have got a bunch of parameter vectors. Let’s first take a look at how the chain ran.

It looks like the first few hundred iterations may be noticeably influenced by our starting

values. Then we throw away the first 200 iterations and let the remaining be our new results.

Then we could base our inferences on the new results instead of the whole results matrix.

3

Our estimated posterior probability that µ1 − µ2 < 0 is about 0.81. Thus with high proba-

bility the mean of differences in year 2005 is greater than the mean of difference in year 2004.

2.2 Comparison of differences in 2006 and 2007

The histograms of differences in 2006 and 2007 are given below.

The conclusion is not so obvious from histograms. We may assume that the difference in

each year follow normal distribution N(µ, σ2). Let us assume that the difference in 2006 come

from N(µ1, σ

2

1) and the difference in 2007 come from N(µ2, σ

2

2). So our parameter vector is

θ = (µ1, σ1, µ2, σ2).

To get posterior, we still need to construct a likelihood function and a prior and then multiply

them together. The likelihood function we use is exactly the same as the likelihood function

in the previous subsection. To decide the prior density for θ = (µ1, σ1, µ2, σ2), we choose the

4

distribution of µ1 as N(9.9, 9.9

2) and choose the distribution of µ2 as N(10.5, 10.5

2), where 9.9 is

the sample mean of differences in 2006 and 10.5 is the sample mean for differences in 2007. Then

we choose the distribution of σ1 as exponential with mean 9.9 and choose the distribution of σ2

as exponential with mean 10.5. Next, we multiply prior times likelihood to get the posterior.

We still run a Markov chain using Metropolis for 10000 iterations to simulate a sample. This

time we choose a starting value θ0 = (9.9, 9.9, 10.5, 10.5). The results are shown below.

It looks like the first few hundred iterations may be noticeably influenced by our starting

values. Then we throw away the first 200 iterations and let the remaining be our new results.

Then we could base our inferences on the new results instead of the whole results matrix.

5

Our estimated posterior probability that µ1 − µ2 < 0 is about 0.94. Thus with high proba-

bility the mean of differences in year 2007 is greater than the mean of difference in year 2006.

2.3 Comparison of differences in 2008 and 2009

The histograms of differences in 2008 and 2009 are given below.

The conclusion is not so obvious from histograms. We may assume that the difference in

each year follow normal distribution N(µ, σ2). Let us assume that the difference in 2008 come

from N(µ1, σ

2

1) and the difference in 2009 come from N(µ2, σ

2

2). So our parameter vector is

θ = (µ1, σ1, µ2, σ2).

To decide the prior density for θ = (µ1, σ1, µ2, σ2), we choose the distribution of µ1 as

N(10.7, 10.72) and choose the distribution of µ2 as N(10.5, 10.5

2), where 10.7 is the sample

mean of differences in 2008 and 10.5 is the sample mean for differences in 2009. Then we

6

choose the distribution of σ1 as exponential with mean 10.7 and choose the distribution of σ2

as exponential with mean 10.5. Next, we multiply prior times likelihood to get the posterior.

We still run a Markov chain using Metropolis for 10000 iterations to simulate a sample. This

time we choose a starting value θ0 = (10.7, 10.7, 10.5, 10.5). The results are shown below.

It looks like the first few hundred iterations may be noticeably influenced by our starting

values. Then we throw away the first 200 iterations and let the remaining be our new results.

Then we could base our inferences on the new results instead of the whole results matrix.

7

Our estimated posterior probability that µ1 − µ2 < 0 is about 0.36. Thus with high proba-

bility the mean of differences in year 2009 is less than the mean of difference in year 2008.

3 Conclusion

According to our discussion in section 2, we get the conclusion that

• The mean of differences in year 2005 is greater than the mean of differences in year 2004.

• The mean of differences in year 2007 is greater than the mean of differences in year 2006.

• The the mean of differences in year 2009 is less than the mean of differences in year 2008.

Thus overall, the difference between male income and female income was increasing at the

beginning and then decreasing.

8

4 Reference

Dr. Jing Zhang’s lecture notes.

5 R Code

2 .1

dat=read . table ( f i l e=”income1 . txt ” , header=T, sep=’ , ’ )

summary( dat )

d i f =(dat [ ,2 ]− dat [ , 3 ] ) /1000

pre=d i f [ dat [ ,1 ]==1]

pos=d i f [ dat [ ,1 ]==2]

mean( pre )

mean(pos )

### draw a p i c t u r e

xlim =c (min( d i f ) ,max( d i f ) )

par ( mfrow=c ( 2 , 1 ) )

hist ( pre , 100 , col=” red ” , xlim=xlim )

hist (pos , 1 00 , col=” red ” , xlim=xlim )

### l i k e l i h o o d func t i on

l i k = function ( th ){

mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ]

prod (dnorm( pre ,mean=mu1, sd=s i g 1 ) )∗prod (dnorm(pos ,mean=mu2, sd=s i g 2 ) )

}

### pr io r func t i on

p r i o r = function ( th ){

mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ]

i f ( s i g 1 <= 0 | s i g 2 <= 0) return (0 )

dnorm(mu1 , 9 . 7 , 9 . 7 ) ∗dnorm(mu2, 1 0 , 1 0 )∗dexp( s ig1 , r a t e=1/ 9 . 7 )∗dexp( s ig2 , r a t e=1/10)

}

### po s t e r i o r f unc t i on

post = function ( th ){ p r i o r ( th ) ∗ l i k ( th )}

#Sta r t i n g va l u e s

mu1 = 9 . 7 ; s i g 1 = 9 . 7 ; mu2 = 10 ; s i g 2 = 10

9

th0=c (mu1 , s ig1 , mu2 , s i g 2 )

# Here i s what does the MCMC ( Metropo l i s method ) :

n i t =10000

r e s u l t s = matrix (0 , nrow=nit , ncol=4)

th = th0

r e s u l t s [ 1 , ] = th0

for ( i t in 2 : n i t ){

cand = th + rnorm(4 , sd=.3)

r a t i o = post ( cand )/post ( th )

i f ( runif (1)< r a t i o ) th=cand

r e s u l t s [ i t , ] = th

}

# Take a peek at what we got

edit ( r e s u l t s )

par ( mfrow=c ( 4 , 1 ) )

plot ( r e s u l t s [ , 1 ] )

plot ( r e s u l t s [ , 2 ] )

plot ( r e s u l t s [ , 3 ] )

plot ( r e s u l t s [ , 4 ] )

r e s=r e s u l t s [ 2 0 1 : 1 0 0 0 0 , ]

mu1s = r e s [ , 1 ]

s i g 1 s = r e s [ , 2 ]

mu2s = r e s [ , 3 ]

s i g 2 s = r e s [ , 4 ]

par ( mfrow=c ( 4 , 1 ) )

plot (mu1s )

plot ( s i g 1 s )

plot (mu2s )

plot ( s i g 2 s )

par ( mfrow=c ( 2 , 1 ) )

plot (mu1s−mu2s)

hist (mu1s−mu2s)

mean(mu1s−mu2s < 0)

[ 1 ] 0 .8119388

10

2 .2

dat=read . table ( f i l e=”income2 . txt ” , header=T, sep=’ , ’ )

summary( dat )

d i f =(dat [ ,2 ]− dat [ , 3 ] ) /1000

pre=d i f [ dat [ ,1 ]==1]

pos=d i f [ dat [ ,1 ]==2]

mean( pre )

mean(pos )

### draw a p i c t u r e

xlim =c (min( d i f ) ,max( d i f ) )

par ( mfrow=c ( 2 , 1 ) )

hist ( pre , 100 , col=” red ” , xlim=xlim )

hist (pos , 1 00 , col=” red ” , xlim=xlim )

### l i k e l i h o o d func t i on

l i k = function ( th ){

mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ]

prod (dnorm( pre ,mean=mu1, sd=s i g 1 ) )∗prod (dnorm(pos ,mean=mu2, sd=s i g 2 ) )

}

### pr io r func t i on

p r i o r = function ( th ){

mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ]

i f ( s i g 1 <= 0 | s i g 2 <= 0) return (0 )

dnorm(mu1 , 9 . 9 , 9 . 9 ) ∗dnorm(mu2 , 1 0 . 5 , 1 0 . 5 ) ∗dexp( s ig1 , r a t e=1/ 9 . 9 )∗dexp( s ig2 , r a t e=1/ 1 0 . 5 )

}

### po s t e r i o r f unc t i on

post = function ( th ){ p r i o r ( th ) ∗ l i k ( th )}

#Sta r t i n g va l u e s

mu1 = 9 . 9 ; s i g 1 = 9 . 9 ; mu2 = 1 0 . 5 ; s i g 2 = 10 .5

th0=c (mu1 , s ig1 , mu2 , s i g 2 )

# Here i s what does the MCMC ( Metropo l i s method ) :

n i t =10000

r e s u l t s = matrix (0 , nrow=nit , ncol=4)

th = th0

11

r e s u l t s [ 1 , ] = th0

for ( i t in 2 : n i t ){

cand = th + rnorm(4 , sd=.5)

r a t i o = post ( cand )/post ( th )

i f ( runif (1)< r a t i o ) th=cand

r e s u l t s [ i t , ] = th

}

# Take a peek at what we got

edit ( r e s u l t s )

par ( mfrow=c ( 4 , 1 ) )

plot ( r e s u l t s [ , 1 ] )

plot ( r e s u l t s [ , 2 ] )

plot ( r e s u l t s [ , 3 ] )

plot ( r e s u l t s [ , 4 ] )

r e s=r e s u l t s [ 2 0 1 : 1 0 0 0 0 , ]

mu1s = r e s [ , 1 ]

s i g 1 s = r e s [ , 2 ]

mu2s = r e s [ , 3 ]

s i g 2 s = r e s [ , 4 ]

par ( mfrow=c ( 4 , 1 ) )

plot (mu1s )

plot ( s i g 1 s )

plot (mu2s )

plot ( s i g 2 s )

par ( mfrow=c ( 2 , 1 ) )

plot (mu1s−mu2s)

hist (mu1s−mu2s)

mean(mu1s−mu2s < 0)

[ 1 ] 0 .9378571

2 .3

dat=read . table ( f i l e=”income3 . txt ” , header=T, sep=’ , ’ )

summary( dat )

12

d i f =(dat [ ,2 ]− dat [ , 3 ] ) /1000

pre=d i f [ dat [ ,1 ]==1]

pos=d i f [ dat [ ,1 ]==2]

mean( pre )

mean(pos )

### draw a p i c t u r e

xlim =c (min( d i f ) ,max( d i f ) )

par ( mfrow=c ( 2 , 1 ) )

hist ( pre , 100 , col=” red ” , xlim=xlim )

hist (pos , 1 00 , col=” red ” , xlim=xlim )

### l i k e l i h o o d func t i on

l i k = function ( th ){

mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ]

prod (dnorm( pre ,mean=mu1, sd=s i g 1 ) )∗prod (dnorm(pos ,mean=mu2, sd=s i g 2 ) )

}

### pr io r func t i on

p r i o r = function ( th ){

mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ]

i f ( s i g 1 <= 0 | s i g 2 <= 0) return (0 )

dnorm(mu1 , 1 0 . 7 , 1 0 . 7 ) ∗dnorm(mu2 , 1 0 . 5 , 1 0 . 5 ) ∗dexp( s ig1 , r a t e=1/ 1 0 . 7 )∗dexp( s ig2 , r a t e=1/ 1 0 . 5 )

}

### po s t e r i o r f unc t i on

post = function ( th ){ p r i o r ( th ) ∗ l i k ( th )}

#Sta r t i n g va l u e s

mu1 = 1 0 . 7 ; s i g 1 = 1 0 . 7 ; mu2 = 1 0 . 5 ; s i g 2 = 10 .5

th0=c (mu1 , s ig1 , mu2 , s i g 2 )

# Here i s what does the MCMC ( Metropo l i s method ) :

n i t =10000

r e s u l t s = matrix (0 , nrow=nit , ncol=4)

th = th0

r e s u l t s [ 1 , ] = th0

for ( i t in 2 : n i t ){

cand = th + rnorm(4 , sd=.4)

r a t i o = post ( cand )/post ( th )

i f ( runif (1)< r a t i o ) th=cand

13

r e s u l t s [ i t , ] = th

}

# Take a peek at what we got

edit ( r e s u l t s )

par ( mfrow=c ( 4 , 1 ) )

plot ( r e s u l t s [ , 1 ] )

plot ( r e s u l t s [ , 2 ] )

plot ( r e s u l t s [ , 3 ] )

plot ( r e s u l t s [ , 4 ] )

r e s=r e s u l t s [ 2 0 1 : 1 0 0 0 0 , ]

mu1s = r e s [ , 1 ]

s i g 1 s = r e s [ , 2 ]

mu2s = r e s [ , 3 ]

s i g 2 s = r e s [ , 4 ]

par ( mfrow=c ( 4 , 1 ) )

plot (mu1s )

plot ( s i g 1 s )

plot (mu2s )

plot ( s i g 2 s )

par ( mfrow=c ( 2 , 1 ) )

plot (mu1s−mu2s)

hist (mu1s−mu2s)

mean(mu1s−mu2s < 0)

[ 1 ] 0 .362551

14

欢迎咨询51作业君