STAT 8670 Project Wei Gao∗ Dept of Math and Stat, Georgia State University, Atlanta, GA 30302-4110, USA 1 Introduction Bayesian Analysis is a formal method for combining prior beliefs with observed information. It can fit very realistic but complicated models. In this project we will do Bayesian data analysis using Marcov Chain Monte Carlo to speculate whether the difference between male income and female income increased or decreased in USA from 2004 to 2009. The data sets we use are median earnings of full-time, year-round workers 16 and older by sex and by state from year 2004 to year 2009. The data sets are downloaded from www.census.gov. For each year, we use the difference (in thousand) between men’s earnings and women’s earnings for each state as our observations. So there are 51 observations for each year. Now we formulate our questions as: • Is the mean of differences in year 2005 greater than the mean of differences in year 2004? If so, how much? • Is the mean of differences in year 2007 greater than the mean of differences in year 2006? If so, how much? • Is the mean of differences in year 2009 greater than the mean of differences in year 2008? If so, how much? 2 Data Analysis 2.1 Comparison of differences in 2004 and 2005 The histograms of differences in 2004 and 2005 are given below. ∗Email:
[email protected] 1 The conclusion is not so obvious from histograms. We may assume that the difference in each year follows normal distribution N(µ, σ2). Let us assume that the difference in 2004 come from N(µ1, σ 2 1) and the difference in 2005 come from N(µ2, σ 2 2). So our parameter vector is θ = (µ1, σ1, µ2, σ2). We will construct a likelihood function and a prior and then multiply them together to get posterior P (θ|Data). Let x denote the difference in 2004 and let y denote the difference in 2005. Then the likelihood is given by the formula P (Data|θ) = P (x|θ)P (y|θ) = 51∏ i=1 N(xi|µ1, σ21) 51∏ i=1 N(yi|µ2, σ22), where the second equation holds since xi and yj are independent for i = 1, 2, . . . , 51 and j = 1, 2, . . . , 51. We can write the likelihood function according to the above formula. Then we need to decide on a prior density for θ = (µ1, σ1, µ2, σ2), and write it as a function. The basic thinking is simply to choose priors for the µ1 and µ2 that do not seem to prejudge which is bigger, and to choose priors that are sort of uniformish over all even remotely plausible values, with the aim of letting the data produce any interesting features in the posterior distribution. We choose the distribution of µ1 as N(9.7, 9.7 2) and choose the distribution of µ2 as N(10, 10 2), where 9.7 is the sample mean of x and 10 is the sample mean for y. We choose the distributions like this because we want the distribution as flat as possible, otherwise the prior will control the posterior. Then we choose the distribution of σ1 as exponential with mean 9.7 and choose the distribution of σ2 as exponential with mean 10. Thus all 4 variables are independent. Then the prior is given by 2 P (µ1, σ1, µ2, σ2) = P (µ1)P (σ1)P (µ2)P (σ2) = 1√ 2pi (9.72)−0.5e− 1 2·9.72 (µ1−9.7) 2 1 9.7 e−σ1/9.7 1√ 2pi (102)−0.5e− 1 2·102 (µ2−10) 2 1 10 e−σ2/10. Next, we multiply prior times likelihood to get the posterior. We will run a Markov chain using Metropolis for 10000 iterations to simulate a sample. First we choose a starting value θ0 = (9.7, 9.7, 10, 10). Given a current state, we need to decide on a way to propose a “candidate” move, then evaluate the posterior of candidate move and the posterior of current state and take the ratio. We will record our results in a big matrix with 4 columns and 10000 rows. The first row is the starting value and each successive row will record the next θ as we run the chain. Now we have got a bunch of parameter vectors. Let’s first take a look at how the chain ran. It looks like the first few hundred iterations may be noticeably influenced by our starting values. Then we throw away the first 200 iterations and let the remaining be our new results. Then we could base our inferences on the new results instead of the whole results matrix. 3 Our estimated posterior probability that µ1 − µ2 < 0 is about 0.81. Thus with high proba- bility the mean of differences in year 2005 is greater than the mean of difference in year 2004. 2.2 Comparison of differences in 2006 and 2007 The histograms of differences in 2006 and 2007 are given below. The conclusion is not so obvious from histograms. We may assume that the difference in each year follow normal distribution N(µ, σ2). Let us assume that the difference in 2006 come from N(µ1, σ 2 1) and the difference in 2007 come from N(µ2, σ 2 2). So our parameter vector is θ = (µ1, σ1, µ2, σ2). To get posterior, we still need to construct a likelihood function and a prior and then multiply them together. The likelihood function we use is exactly the same as the likelihood function in the previous subsection. To decide the prior density for θ = (µ1, σ1, µ2, σ2), we choose the 4 distribution of µ1 as N(9.9, 9.9 2) and choose the distribution of µ2 as N(10.5, 10.5 2), where 9.9 is the sample mean of differences in 2006 and 10.5 is the sample mean for differences in 2007. Then we choose the distribution of σ1 as exponential with mean 9.9 and choose the distribution of σ2 as exponential with mean 10.5. Next, we multiply prior times likelihood to get the posterior. We still run a Markov chain using Metropolis for 10000 iterations to simulate a sample. This time we choose a starting value θ0 = (9.9, 9.9, 10.5, 10.5). The results are shown below. It looks like the first few hundred iterations may be noticeably influenced by our starting values. Then we throw away the first 200 iterations and let the remaining be our new results. Then we could base our inferences on the new results instead of the whole results matrix. 5 Our estimated posterior probability that µ1 − µ2 < 0 is about 0.94. Thus with high proba- bility the mean of differences in year 2007 is greater than the mean of difference in year 2006. 2.3 Comparison of differences in 2008 and 2009 The histograms of differences in 2008 and 2009 are given below. The conclusion is not so obvious from histograms. We may assume that the difference in each year follow normal distribution N(µ, σ2). Let us assume that the difference in 2008 come from N(µ1, σ 2 1) and the difference in 2009 come from N(µ2, σ 2 2). So our parameter vector is θ = (µ1, σ1, µ2, σ2). To decide the prior density for θ = (µ1, σ1, µ2, σ2), we choose the distribution of µ1 as N(10.7, 10.72) and choose the distribution of µ2 as N(10.5, 10.5 2), where 10.7 is the sample mean of differences in 2008 and 10.5 is the sample mean for differences in 2009. Then we 6 choose the distribution of σ1 as exponential with mean 10.7 and choose the distribution of σ2 as exponential with mean 10.5. Next, we multiply prior times likelihood to get the posterior. We still run a Markov chain using Metropolis for 10000 iterations to simulate a sample. This time we choose a starting value θ0 = (10.7, 10.7, 10.5, 10.5). The results are shown below. It looks like the first few hundred iterations may be noticeably influenced by our starting values. Then we throw away the first 200 iterations and let the remaining be our new results. Then we could base our inferences on the new results instead of the whole results matrix. 7 Our estimated posterior probability that µ1 − µ2 < 0 is about 0.36. Thus with high proba- bility the mean of differences in year 2009 is less than the mean of difference in year 2008. 3 Conclusion According to our discussion in section 2, we get the conclusion that • The mean of differences in year 2005 is greater than the mean of differences in year 2004. • The mean of differences in year 2007 is greater than the mean of differences in year 2006. • The the mean of differences in year 2009 is less than the mean of differences in year 2008. Thus overall, the difference between male income and female income was increasing at the beginning and then decreasing. 8 4 Reference Dr. Jing Zhang’s lecture notes. 5 R Code 2 .1 dat=read . table ( f i l e=”income1 . txt ” , header=T, sep=’ , ’ ) summary( dat ) d i f =(dat [ ,2 ]− dat [ , 3 ] ) /1000 pre=d i f [ dat [ ,1 ]==1] pos=d i f [ dat [ ,1 ]==2] mean( pre ) mean(pos ) ### draw a p i c t u r e xlim =c (min( d i f ) ,max( d i f ) ) par ( mfrow=c ( 2 , 1 ) ) hist ( pre , 100 , col=” red ” , xlim=xlim ) hist (pos , 1 00 , col=” red ” , xlim=xlim ) ### l i k e l i h o o d func t i on l i k = function ( th ){ mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ] prod (dnorm( pre ,mean=mu1, sd=s i g 1 ) )∗prod (dnorm(pos ,mean=mu2, sd=s i g 2 ) ) } ### pr io r func t i on p r i o r = function ( th ){ mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ] i f ( s i g 1 <= 0 | s i g 2 <= 0) return (0 ) dnorm(mu1 , 9 . 7 , 9 . 7 ) ∗dnorm(mu2, 1 0 , 1 0 )∗dexp( s ig1 , r a t e=1/ 9 . 7 )∗dexp( s ig2 , r a t e=1/10) } ### po s t e r i o r f unc t i on post = function ( th ){ p r i o r ( th ) ∗ l i k ( th )} #Sta r t i n g va l u e s mu1 = 9 . 7 ; s i g 1 = 9 . 7 ; mu2 = 10 ; s i g 2 = 10 9 th0=c (mu1 , s ig1 , mu2 , s i g 2 ) # Here i s what does the MCMC ( Metropo l i s method ) : n i t =10000 r e s u l t s = matrix (0 , nrow=nit , ncol=4) th = th0 r e s u l t s [ 1 , ] = th0 for ( i t in 2 : n i t ){ cand = th + rnorm(4 , sd=.3) r a t i o = post ( cand )/post ( th ) i f ( runif (1)< r a t i o ) th=cand r e s u l t s [ i t , ] = th } # Take a peek at what we got edit ( r e s u l t s ) par ( mfrow=c ( 4 , 1 ) ) plot ( r e s u l t s [ , 1 ] ) plot ( r e s u l t s [ , 2 ] ) plot ( r e s u l t s [ , 3 ] ) plot ( r e s u l t s [ , 4 ] ) r e s=r e s u l t s [ 2 0 1 : 1 0 0 0 0 , ] mu1s = r e s [ , 1 ] s i g 1 s = r e s [ , 2 ] mu2s = r e s [ , 3 ] s i g 2 s = r e s [ , 4 ] par ( mfrow=c ( 4 , 1 ) ) plot (mu1s ) plot ( s i g 1 s ) plot (mu2s ) plot ( s i g 2 s ) par ( mfrow=c ( 2 , 1 ) ) plot (mu1s−mu2s) hist (mu1s−mu2s) mean(mu1s−mu2s < 0) [ 1 ] 0 .8119388 10 2 .2 dat=read . table ( f i l e=”income2 . txt ” , header=T, sep=’ , ’ ) summary( dat ) d i f =(dat [ ,2 ]− dat [ , 3 ] ) /1000 pre=d i f [ dat [ ,1 ]==1] pos=d i f [ dat [ ,1 ]==2] mean( pre ) mean(pos ) ### draw a p i c t u r e xlim =c (min( d i f ) ,max( d i f ) ) par ( mfrow=c ( 2 , 1 ) ) hist ( pre , 100 , col=” red ” , xlim=xlim ) hist (pos , 1 00 , col=” red ” , xlim=xlim ) ### l i k e l i h o o d func t i on l i k = function ( th ){ mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ] prod (dnorm( pre ,mean=mu1, sd=s i g 1 ) )∗prod (dnorm(pos ,mean=mu2, sd=s i g 2 ) ) } ### pr io r func t i on p r i o r = function ( th ){ mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ] i f ( s i g 1 <= 0 | s i g 2 <= 0) return (0 ) dnorm(mu1 , 9 . 9 , 9 . 9 ) ∗dnorm(mu2 , 1 0 . 5 , 1 0 . 5 ) ∗dexp( s ig1 , r a t e=1/ 9 . 9 )∗dexp( s ig2 , r a t e=1/ 1 0 . 5 ) } ### po s t e r i o r f unc t i on post = function ( th ){ p r i o r ( th ) ∗ l i k ( th )} #Sta r t i n g va l u e s mu1 = 9 . 9 ; s i g 1 = 9 . 9 ; mu2 = 1 0 . 5 ; s i g 2 = 10 .5 th0=c (mu1 , s ig1 , mu2 , s i g 2 ) # Here i s what does the MCMC ( Metropo l i s method ) : n i t =10000 r e s u l t s = matrix (0 , nrow=nit , ncol=4) th = th0 11 r e s u l t s [ 1 , ] = th0 for ( i t in 2 : n i t ){ cand = th + rnorm(4 , sd=.5) r a t i o = post ( cand )/post ( th ) i f ( runif (1)< r a t i o ) th=cand r e s u l t s [ i t , ] = th } # Take a peek at what we got edit ( r e s u l t s ) par ( mfrow=c ( 4 , 1 ) ) plot ( r e s u l t s [ , 1 ] ) plot ( r e s u l t s [ , 2 ] ) plot ( r e s u l t s [ , 3 ] ) plot ( r e s u l t s [ , 4 ] ) r e s=r e s u l t s [ 2 0 1 : 1 0 0 0 0 , ] mu1s = r e s [ , 1 ] s i g 1 s = r e s [ , 2 ] mu2s = r e s [ , 3 ] s i g 2 s = r e s [ , 4 ] par ( mfrow=c ( 4 , 1 ) ) plot (mu1s ) plot ( s i g 1 s ) plot (mu2s ) plot ( s i g 2 s ) par ( mfrow=c ( 2 , 1 ) ) plot (mu1s−mu2s) hist (mu1s−mu2s) mean(mu1s−mu2s < 0) [ 1 ] 0 .9378571 2 .3 dat=read . table ( f i l e=”income3 . txt ” , header=T, sep=’ , ’ ) summary( dat ) 12 d i f =(dat [ ,2 ]− dat [ , 3 ] ) /1000 pre=d i f [ dat [ ,1 ]==1] pos=d i f [ dat [ ,1 ]==2] mean( pre ) mean(pos ) ### draw a p i c t u r e xlim =c (min( d i f ) ,max( d i f ) ) par ( mfrow=c ( 2 , 1 ) ) hist ( pre , 100 , col=” red ” , xlim=xlim ) hist (pos , 1 00 , col=” red ” , xlim=xlim ) ### l i k e l i h o o d func t i on l i k = function ( th ){ mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ] prod (dnorm( pre ,mean=mu1, sd=s i g 1 ) )∗prod (dnorm(pos ,mean=mu2, sd=s i g 2 ) ) } ### pr io r func t i on p r i o r = function ( th ){ mu1=th [ 1 ] ; s i g 1=th [ 2 ] ; mu2=th [ 3 ] ; s i g 2=th [ 4 ] i f ( s i g 1 <= 0 | s i g 2 <= 0) return (0 ) dnorm(mu1 , 1 0 . 7 , 1 0 . 7 ) ∗dnorm(mu2 , 1 0 . 5 , 1 0 . 5 ) ∗dexp( s ig1 , r a t e=1/ 1 0 . 7 )∗dexp( s ig2 , r a t e=1/ 1 0 . 5 ) } ### po s t e r i o r f unc t i on post = function ( th ){ p r i o r ( th ) ∗ l i k ( th )} #Sta r t i n g va l u e s mu1 = 1 0 . 7 ; s i g 1 = 1 0 . 7 ; mu2 = 1 0 . 5 ; s i g 2 = 10 .5 th0=c (mu1 , s ig1 , mu2 , s i g 2 ) # Here i s what does the MCMC ( Metropo l i s method ) : n i t =10000 r e s u l t s = matrix (0 , nrow=nit , ncol=4) th = th0 r e s u l t s [ 1 , ] = th0 for ( i t in 2 : n i t ){ cand = th + rnorm(4 , sd=.4) r a t i o = post ( cand )/post ( th ) i f ( runif (1)< r a t i o ) th=cand 13 r e s u l t s [ i t , ] = th } # Take a peek at what we got edit ( r e s u l t s ) par ( mfrow=c ( 4 , 1 ) ) plot ( r e s u l t s [ , 1 ] ) plot ( r e s u l t s [ , 2 ] ) plot ( r e s u l t s [ , 3 ] ) plot ( r e s u l t s [ , 4 ] ) r e s=r e s u l t s [ 2 0 1 : 1 0 0 0 0 , ] mu1s = r e s [ , 1 ] s i g 1 s = r e s [ , 2 ] mu2s = r e s [ , 3 ] s i g 2 s = r e s [ , 4 ] par ( mfrow=c ( 4 , 1 ) ) plot (mu1s ) plot ( s i g 1 s ) plot (mu2s ) plot ( s i g 2 s ) par ( mfrow=c ( 2 , 1 ) ) plot (mu1s−mu2s) hist (mu1s−mu2s) mean(mu1s−mu2s < 0) [ 1 ] 0 .362551 14
欢迎咨询51作业君