{"id":332,"date":"2014-10-17T12:18:52","date_gmt":"2014-10-17T12:18:52","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=332"},"modified":"2025-02-26T13:21:38","modified_gmt":"2025-02-26T13:21:38","slug":"solutions-to-the-exercises-chapter-3-question-1","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2014\/10\/17\/solutions-to-the-exercises-chapter-3-question-1\/","title":{"rendered":"Solutions to the Exercises: Chapter 3 &#8211; Question 1"},"content":{"rendered":"<p>This is another of my occasional postings working through the exercises given at the end of the chapters in \u2018<em>Bayesian analysis with Stata\u2019<\/em>.<\/p>\n<p>Chapter 3 introduces the Metropolis-Hastings sampler and so enables us to tackle any small sized Bayesian analysis. For larger problems this method can be too slow and faster algorithms are needed. Despite this, an analysis with a Metropolis-Hastings sampler can be used to illustrate most of the important issues that arise with the MCMC method. Because there is so much to say, I will only\u00a0answer question 1 in this posting and I\u2019ll consider question 2 next time, which will be in two weeks as I am about to take some leave.<\/p>\n<p>Question 1 extends the example considered in the text that looked at polyp counts in women taking part in a colon cancer prevention trial. In the\u00a0question we are asked to repeat the analysis for men.<\/p>\n<p>We are told to use\u00a0a gamma(3,0.3) prior for the average rate, \u03bb, and to fit the zero-truncated Poisson distribution to the trial data on the men using a Metropolis-Hastings algorithm with a log-normal random walk proposal distribution.<\/p>\n<p>(i) <strong>Asks us to identify a good value for the proposal standard deviation and a suitable length of burn-in<\/strong><\/p>\n<p>My data file contains the data for men and women, so I just keep the data on men and have,<\/p>\n<pre style=\"padding-left: 120px\"><span style=\"color: #0000ff\">gender y f<\/span>\r\n<span style=\"color: #0000ff\"> Male 1 434<\/span>\r\n<span style=\"color: #0000ff\"> Male 2 178<\/span>\r\n<span style=\"color: #0000ff\"> Male 3 87<\/span>\r\n<span style=\"color: #0000ff\"> Male 4 40<\/span>\r\n<span style=\"color: #0000ff\"> Male 5 23<\/span>\r\n<span style=\"color: #0000ff\"> Male 6 12<\/span>\r\n<span style=\"color: #0000ff\"> Male 7 7<\/span>\r\n<span style=\"color: #0000ff\"> Male 8 4<\/span>\r\n<span style=\"color: #0000ff\"> Male 9 1<\/span>\r\n<span style=\"color: #0000ff\"> Male 10 3<\/span><\/pre>\n<p>I\u2019ve set a seed so that anyone can reproduce my answers exactly, but this is not normal practice and we must accept that with any simulation based method there\u00a0will be\u00a0an element of random variation in the answer. If I were to use this question as a student exercise, I would want the students to\u00a0start with\u00a0different seeds so that we could compare their answers. This would help them to understand the importance of running the chain for long enough.<\/p>\n<p>The model is the same one that was used for the women in the text of chapter 3 so the program logpost is identical, except that the total number of men is 789. Logpost calculates the log-likelihood and combines it with a gamma prior on the parameter lambda. My call to mcmcrun does not request a burnin so we store the chain from its initial value of lambda=1. The log-scale Metropolis-Hastings sampler is used because lambda must be positive and I have chosen a sd of 0.05 for the proposal distribution based on our experience of analysing the women.<\/p>\n<p><span style=\"color: #0000ff\">use &#8220;polyp.dta&#8221;, clear<\/span><br \/>\n<span style=\"color: #0000ff\"> keep if gender == &#8220;Male&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\"> set seed 456901<\/span><\/p>\n<p><span style=\"color: #0000ff\">program logpost<\/span><br \/>\n<span style=\"color: #0000ff\"> args logp b<\/span><br \/>\n<span style=\"color: #0000ff\"> local lambda = `b'[1,1]<\/span><br \/>\n<span style=\"color: #0000ff\"> scalar `logp&#8217; = 0<\/span><br \/>\n<span style=\"color: #0000ff\"> logdensity poisson `logp&#8217; y `lambda&#8217; [fw=f]<\/span><br \/>\n<span style=\"color: #0000ff\"> scalar `logp&#8217; = `logp&#8217; &#8211; 789*log(1-exp(-`lambda&#8217;))<\/span><br \/>\n<span style=\"color: #0000ff\"> logdensity gamma `logp&#8217; `lambda&#8217; 3 0.3<\/span><br \/>\n<span style=\"color: #0000ff\"> end<\/span><\/p>\n<p><span style=\"color: #0000ff\">matrix b = ( 1 )<\/span><br \/>\n<span style=\"color: #0000ff\"> mcmcrun logpost b using temp.csv, samp(mhslogn , sd(0.05)) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> burn(0) update(1000) replace par(lambda) jpost<\/span><br \/>\n<span style=\"color: #0000ff\"> insheet using temp.csv, clear<\/span><br \/>\n<span style=\"color: #0000ff\"> mcmctrace lambda, gopt(scheme(lean1))<\/span><\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/solution_chap3_fig1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-343\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/solution_chap3_fig1-1024x688.png\" alt=\"solution_chap3_fig1\" width=\"620\" height=\"416\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/solution_chap3_fig1-1024x688.png 1024w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/solution_chap3_fig1-300x201.png 300w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><\/p>\n<p>This short run suggests that a burn-in of 200 would be ample for use with a starting value of 1.0. Mixing is not bad so the choice of proposal standard deviation looks reasonable. We can count the number of rejected moves between updates 201 and 1000 using<\/p>\n<p><span style=\"color: #0000ff\">.\u00a0count if lambda == lambda[_n-1] &amp; _n &gt; 200<\/span><\/p>\n<p>This gave me the value 327, so around 327\/800 or 41% of the proposals are rejected. The guideline is to reject a little over half of the proposals so perhaps we could increase the proposal standard deviation a little but this is not going to make much difference. Another resaonable approach would be to set the adapt option\u00a0with mcmcrun\u00a0in order\u00a0to learn about the proposal standard deviation during the burn-in.<\/p>\n<p>(ii) <strong>Asks us to run\u00a0the algorithm five times with a run length of 5,000 after the burn-in and note the posterior mean estimate of \u03bb. Are the results consistent to two decimal places? Find the smallest run length that will give two decimal places of accuracy<\/strong>.<\/p>\n<p>Here is some code that performs the analysis. Once again I have set a seed so that you can reproduce my results.<\/p>\n<p><span style=\"color: #0000ff\">set seed 456901<\/span><br \/>\n<span style=\"color: #0000ff\"> forvalues i=1\/5 {<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 quietly {<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 use &#8220;polyp.dta&#8221;, clear<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 keep if gender == &#8220;Male&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix b = ( 1 )<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 mcmcrun logpost b using temp.csv, samp(mhslogn , sd(0.05)) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 burn(200) update(5000) replace par(lambda) jpost<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 insheet using temp.csv, clear<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 mcmcstats lambda<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 di %8.4f r(mn1) %8.4f r(sd1) %8.4f r(se1)<\/span><br \/>\n<span style=\"color: #0000ff\"> }<\/span><\/p>\n<p>The displayed results are,<\/p>\n<p style=\"padding-left: 90px\"><span style=\"color: #0000ff\">Mean St Dev SEM<\/span><br \/>\n<span style=\"color: #0000ff\"> 1.4852 0.0510 0.0017<\/span><br \/>\n<span style=\"color: #0000ff\"> 1.4904 0.0517 0.0017<\/span><br \/>\n<span style=\"color: #0000ff\"> 1.4871 0.0501 0.0016<\/span><br \/>\n<span style=\"color: #0000ff\"> 1.4871 0.0519 0.0018<\/span><br \/>\n<span style=\"color: #0000ff\"> 1.4858 0.0481 0.0015<\/span><\/p>\n<p>So each of\u00a0the posterior mean estimates of lambda rounds to 1.49 and so they are consistent to 2 decimal places. Notice that the posterior standard deviation varies between 0.048 and 0.052 so it is stable to two decimal places but not to two significant figures. It is generally the case that you need more simulations to obtain consistent standard deviations.<\/p>\n<p>Had the chain been independent the standard error would have been around 0.051\/sqrt(5000) = 0.0007 so the correlation in the chain has roughly doubled the standard error. An observed standard error of 0.0017 means an accuracy of \u00b10.0034, which confirms that the results ought to be consistent to within 2 decimal places. As a rough guide we might\u00a0assume that for results to be accurate to 2 dec places we need the accuracy to be \u00b10.005 or the sem to be smaller than 0.0025. This suggests that we could have got away with a smaller number of iterations of the sampler. 5000*0.00172\/0.00252 = 2300. So perhaps 2,500 would have been enough.<\/p>\n<p>Re-running the analysis with 2500 updates we get,<br \/>\n<span style=\"color: #0000ff\">1.4852 0.0520 0.0024<\/span><br \/>\n<span style=\"color: #0000ff\"> 1.4861 0.0506 0.0024<\/span><br \/>\n<span style=\"color: #0000ff\"> 1.4909 0.0509 0.0023<\/span><br \/>\n<span style=\"color: #0000ff\"> 1.4872 0.0492 0.0022<\/span><br \/>\n<span style=\"color: #0000ff\"> 1.4843 0.0490 0.0021<\/span><\/p>\n<p>This (almost) confirms our approximate calculations. The answer seems to be about 1.487 so we need to do slightly better than \u00b10.005 in order to get 2 dec places of accuracy.<\/p>\n<p>(iii) <strong>Asks us to check the sensitivity of the estimate of \u03bb to the choice of the prior by contrasting gamma(3,0.3), gamma(6,0.15) and gamma(1.5,0.6)<\/strong><\/p>\n<p>Let\u2019s start by picturing the three priors.<br \/>\n<span style=\"color: #0000ff\">twoway ( function y=gammaden(1.5,0.6,0,x), range(0 4) ) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 ( function y=gammaden(3,0.3,0,x), range(0 4) ) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 ( function y=gammaden(6,0.15,0,x), range(0 4) ) , scheme(lean1) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 xtitle(lambda) ytitle(&#8221; &#8220;) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 leg( ring(0) pos(1) label(1 &#8220;1.5,0.6&#8221;) label(2 &#8220;3,0.3&#8221;) label(3 &#8220;6,0.15&#8221;))<\/span><\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/solution_chap3_fig2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-345\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/solution_chap3_fig2-1024x688.png\" alt=\"solution_chap3_fig2\" width=\"620\" height=\"416\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/solution_chap3_fig2-1024x688.png 1024w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/solution_chap3_fig2-300x201.png 300w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>The means of the priors are all 0.9 but the variances and the modes change.<\/p>\n<p>Now let\u2019s re-run the analyses with the different priors. The code is as for part (i) except that I chose a burn-in of 200 a run length of 5,000 and, of course,\u00a0I changed the priors. Each time I set the seed to 456901 to make my results reproducible.<\/p>\n<pre><span style=\"color: #0000ff\">G(1.5,0.6)<\/span>\r\n<span style=\"color: #0000ff\"> --------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\"> Parameter n mean   sd    sem    median    95% CrI<\/span>\r\n<span style=\"color: #0000ff\"> --------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\"> lambda 5000 1.486 0.051 0.0016 1.487 ( 1.389, 1.592 )<\/span>\r\n<span style=\"color: #0000ff\"> --------------------------------------------------------<\/span><\/pre>\n<pre><span style=\"color: #0000ff\">G(3,0.3)<\/span>\r\n<span style=\"color: #0000ff\"> --------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\"> Parameter n mean   sd    sem    median    95% CrI\r\n --------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\"> lambda 5000 1.485 0.051 0.0017 1.485 ( 1.386, 1.591 )<\/span>\r\n<span style=\"color: #0000ff\"> --------------------------------------------------------<\/span><\/pre>\n<pre><span style=\"color: #0000ff\">G(6,0.15)<\/span>\r\n<span style=\"color: #0000ff\"> --------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\"> Parameter n mean   sd    sem    median    95% CrI<\/span>\r\n<span style=\"color: #0000ff\"> --------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\"> lambda 5000 1.490 0.051 0.0017 1.491 ( 1.392, 1.596 )<\/span>\r\n<span style=\"color: #0000ff\"> --------------------------------------------------------<\/span><\/pre>\n<p>The priors show no effect on the second decimal place, i.e.\u00a0no practical importance<\/p>\n<p>Had we needed 3 dec places of accuracy we would have required a sem=0.00025 so the run length would need to be 5000*0.00172\/0.000252 = 230,000<\/p>\n<p>I re-ran the analysis with 250,000 iterations and obtained,<\/p>\n<pre>\u00a0\r\n <span style=\"color: #0000ff\">G(1.5,0.6)<\/span>\r\n <span style=\"color: #0000ff\"><span style=\"color: #0000ff\">\u00a0--------------------------------------------------------<\/span>\r\n Parameter n mean sd sem median 95% CrI<\/span>\r\n <span style=\"color: #0000ff\"><span style=\"color: #0000ff\">\u00a0--------------------------------------------------------<\/span>\r\n lambda 250000 1.489 0.051 0.0002 1.488 ( 1.390, 1.589 )<\/span>\r\n <span style=\"color: #0000ff\"><span style=\"color: #0000ff\">\u00a0--------------------------------------------------------<\/span>\r\n G(3,0.3)<\/span>\r\n <span style=\"color: #0000ff\"><span style=\"color: #0000ff\">\u00a0--------------------------------------------------------<\/span>\r\n Parameter n mean sd sem median 95% CrI<\/span>\r\n <span style=\"color: #0000ff\"><span style=\"color: #0000ff\">\u00a0--------------------------------------------------------<\/span>\r\n lambda 250000 1.487 0.051 0.0002 1.487 ( 1.389, 1.587 )<\/span>\r\n <span style=\"color: #0000ff\"><span style=\"color: #0000ff\">\u00a0--------------------------------------------------------<\/span>\r\n G(6,0.15)<\/span>\r\n <span style=\"color: #0000ff\"><span style=\"color: #0000ff\">\u00a0--------------------------------------------------------<\/span>\r\n Parameter n mean sd sem median 95% CrI<\/span>\r\n <span style=\"color: #0000ff\"><span style=\"color: #0000ff\">\u00a0--------------------------------------------------------<\/span>\r\n lambda 250000 1.492 0.051 0.0002 1.492 ( 1.394, 1.592 )<\/span>\r\n <span style=\"color: #0000ff\">\u00a0--------------------------------------------------------<\/span><\/pre>\n<p>So the priors do affect the third decimal place in the posterior mean. It is very unlikely that\u00a0such a\u00a0difference would ever be of practical importance.<\/p>\n<p>I hope that this example serves to demonstrate that we should not be too fussy about the priors that we use. Unless the sample size is small, we can change the prior slightly without having any meaningful impact on the answer. Remember too that \u2018all models are wrong\u2019 and small changes to the model are likely to have at least as strong an impact as any approximation in the prior.<\/p>\n<p>(iv) <strong>Asks us to analyse the data on men and on women under a zero-truncated Poisson model in order to decide whether the average rate of polyps varies by gender.<\/strong><\/p>\n<p>As the two models are unrelated to one another we could analyse them separately but, for no special reason, I have decided to calculate a single joint posterior. The answer will be the same.<\/p>\n<p><span style=\"color: #0000ff\">program logpost<\/span><br \/>\n<span style=\"color: #0000ff\"> args logp b<\/span><\/p>\n<p><span style=\"color: #0000ff\">local lambdaM = `b'[1,1]<\/span><br \/>\n<span style=\"color: #0000ff\"> local lambdaF = `b'[1,2]<\/span><br \/>\n<span style=\"color: #0000ff\"> scalar `logp&#8217; = 0<\/span><br \/>\n<span style=\"color: #0000ff\"> logdensity poisson `logp&#8217; y `lambdaM&#8217; [fw=fM]<\/span><br \/>\n<span style=\"color: #0000ff\"> scalar `logp&#8217; = `logp&#8217; &#8211; 789*log(1-exp(-`lambdaM&#8217;))<\/span><br \/>\n<span style=\"color: #0000ff\"> logdensity gamma `logp&#8217; `lambdaM&#8217; 3 0.3<\/span><\/p>\n<p><span style=\"color: #0000ff\">logdensity poisson `logp&#8217; y `lambdaF&#8217; [fw=fF]<\/span><br \/>\n<span style=\"color: #0000ff\"> scalar `logp&#8217; = `logp&#8217; &#8211; 390*log(1-exp(-`lambdaF&#8217;))<\/span><br \/>\n<span style=\"color: #0000ff\"> logdensity gamma `logp&#8217; `lambdaF&#8217; 3 0.3<\/span><br \/>\n<span style=\"color: #0000ff\"> end<\/span><\/p>\n<p><span style=\"color: #0000ff\">matrix b = ( 1 , 1 )<\/span><br \/>\n<span style=\"color: #0000ff\"> mcmcrun logpost b using temp.csv, samp( 2(mhslogn , sd(0.05)) ) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> burn(200) update(5000) replace par(lambdaM lambdaF) jpost<\/span><br \/>\n<span style=\"color: #0000ff\"> insheet using temp.csv, clear case<\/span><br \/>\n<span style=\"color: #0000ff\"> mcmcstats lambdaM lambdaF<\/span><br \/>\n<span style=\"color: #0000ff\"> kdensity lambdaM , gen(xM yM) nograph<\/span><br \/>\n<span style=\"color: #0000ff\"> kdensity lambdaF , gen(xF yF) nograph<\/span><br \/>\n<span style=\"color: #0000ff\"> twoway (line yM xM) (line yF xF) , leg(ring(0) pos(11) col(1) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> label(1 &#8220;Male&#8221;) label(2 &#8220;Female&#8221;))<\/span><\/p>\n<p>The summary statistics show the large difference between men and women<\/p>\n<pre> <span style=\"color: #0000ff\">--------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\"> Parameter n    mean sd     sem   median    95% CrI<\/span>\r\n<span style=\"color: #0000ff\"> --------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\"> lambdaM 5000 1.489 0.052 0.0017 1.489 ( 1.390, 1.594 )<\/span>\r\n<span style=\"color: #0000ff\"> lambdaF 5000 1.039 0.062 0.0028 1.039 ( 0.922, 1.167 )<\/span>\r\n<span style=\"color: #0000ff\"> --------------------------------------------------------<\/span><\/pre>\n<p>and this impression is reinforced by the posterior density plots<br \/>\n<a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/solution_chap3_fig3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-346\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/solution_chap3_fig3-1024x752.png\" alt=\"solution_chap3_fig3\" width=\"620\" height=\"455\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/solution_chap3_fig3-1024x752.png 1024w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/solution_chap3_fig3-300x220.png 300w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><\/p>\n<p>I<br \/>\nAnother way to illustrate the difference would be to calculate the difference between the two lambdas and summarize its posterior, which is positioned well away from zero. The probability that delta&lt;0 would be a good summary statistic for this analysis, except that the effect is so clear cut that the probability is indistinguishable from zero.<\/p>\n<p><span style=\"color: #0000ff\">gen delta = lambdaM \u2013 lambdaF<\/span><br \/>\n<span style=\"color: #0000ff\"> mcmcstats delta<\/span><\/p>\n<pre><span style=\"color: #0000ff\">------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\"> Parameter n mean    sd   sem   median     95% CrI<\/span>\r\n<span style=\"color: #0000ff\"> -----------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\"> delta 5000 0.449 0.079 0.0030 0.448 ( 0.295, 0.602 )<\/span>\r\n<span style=\"color: #0000ff\"> -----------------------------------------------------<\/span><\/pre>\n<p>(v) <strong>The zero-truncated Generalized Poisson model is able to allow for clustering of polyps within susceptible individuals. Investigate this model by running a joint analysis of the data on men and women in which men and women have different means \u03bcm and \u03bcf but the same amount of clustering \u03b1. Set your own realistically vague priors. Investigate the posterior of \u03bcm-\u03bcf.<\/strong><\/p>\n<p>For some reason I called these parameters mu in the question where before I referred to lambda. Perhaps I will change\u00a0the wording\u00a0if the book is ever re-printed. Anyway I will stick with lambda in my answer.<\/p>\n<p>My priors on both \u03bb\u2019s are gamma G(3,0.3) and for alpha G(0.5,2). I have used a block updating algorithm. The burn-in is 1,000. The run length is 20,000 but the run is thinned by 4 to give 5,000 values.<\/p>\n<p><span style=\"color: #0000ff\">program logpost<\/span><br \/>\n<span style=\"color: #0000ff\"> args logp b<\/span><\/p>\n<p><span style=\"color: #0000ff\">local lambdaM = exp(`b'[1,1])<\/span><br \/>\n<span style=\"color: #0000ff\"> local lambdaF = exp(`b'[1,2])<\/span><br \/>\n<span style=\"color: #0000ff\"> local alpha = exp(`b'[1,3])<\/span><br \/>\n<span style=\"color: #0000ff\"> local thetaM = `lambdaM&#8217;\/(1+`alpha&#8217;*`lambdaM&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\"> local thetaF = `lambdaF&#8217;\/(1+`alpha&#8217;*`lambdaF&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\"> tempvar ay lnp<\/span><br \/>\n<span style=\"color: #0000ff\"> gen `ay&#8217; = 1+ `alpha&#8217;*y<\/span><br \/>\n<span style=\"color: #0000ff\"> gen `lnp&#8217; = y*log(`thetaM&#8217;)+(y-1)*log(`ay&#8217;)-`thetaM&#8217;*`ay&#8217;-log(1-exp(-`thetaM&#8217;))<\/span><br \/>\n<span style=\"color: #0000ff\"> qui su `lnp&#8217; [fw=fM]<\/span><br \/>\n<span style=\"color: #0000ff\"> scalar `logp&#8217; = r(sum)<\/span><br \/>\n<span style=\"color: #0000ff\"> replace `lnp&#8217; = y*log(`thetaF&#8217;)+(y-1)*log(`ay&#8217;)-`thetaF&#8217;*`ay&#8217;-log(1-exp(-`thetaF&#8217;))<\/span><br \/>\n<span style=\"color: #0000ff\"> qui su `lnp&#8217; [fw=fF]<\/span><br \/>\n<span style=\"color: #0000ff\"> scalar `logp&#8217; = `logp&#8217; + r(sum)<\/span><br \/>\n<span style=\"color: #0000ff\"> logdensity gamma `logp&#8217; `lambdaM&#8217; 3 0.3<\/span><br \/>\n<span style=\"color: #0000ff\"> logdensity gamma `logp&#8217; `lambdaF&#8217; 3 0.3<\/span><br \/>\n<span style=\"color: #0000ff\"> logdensity gamma `logp&#8217; `alpha&#8217; 0.5 2<\/span><br \/>\n<span style=\"color: #0000ff\"> scalar `logp&#8217; = `logp&#8217; + `b'[1,1] + `b'[1,2] + `b'[1,3]<\/span><br \/>\n<span style=\"color: #0000ff\"> end<\/span><\/p>\n<p><span style=\"color: #0000ff\">matrix b = (1.0 , 1.0, 0.1)<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix V = (.11, 0.055, -0.120\\ 0.055, 0.110, -0.120 \\ -0.120, -0.120, 0.200)<\/span><br \/>\n<span style=\"color: #0000ff\">matrix L = cholesky(V)<\/span><br \/>\n<span style=\"color: #0000ff\"> mcmcrun logpost b using temp.csv, samp( mhsmnorm ,c(L) ) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> burn(1000) update(20000) thin(4) replace par(logM logF logalpha) jpost<\/span><br \/>\n<span style=\"color: #0000ff\"> insheet using temp.csv, clear case<\/span><br \/>\n<span style=\"color: #0000ff\"> mcmcstats logM logF logalpha , corr<\/span><br \/>\n<span style=\"color: #0000ff\"> graph matrix logM logF logalpha<\/span><br \/>\n<span style=\"color: #0000ff\"> gen lambdaM = exp(logM)<\/span><br \/>\n<span style=\"color: #0000ff\"> gen lambdaF = exp(logF)<\/span><br \/>\n<span style=\"color: #0000ff\"> gen alpha = exp(logalpha)<\/span><br \/>\n<span style=\"color: #0000ff\"> gen delta = lambdaM &#8211; lambdaF<\/span><br \/>\n<span style=\"color: #0000ff\"> mcmctrace lambdaM lambdaF alpha delta<\/span><br \/>\n<span style=\"color: #0000ff\"> mcmcstats lambdaM lambdaF alpha delta, corr<\/span><\/p>\n<p>You might well come up with a different algorithm that works perfectly well,\u00a0so I think that I should justify my choice.<\/p>\n<p>Looking at figure 3.5 in Chapter 3,\u00a0which shows lambda against alpha for the women alone, we can see that these two parameters are quite strongly negatively correlated. This suggests that updating one parameter at a time will be inefficient.<\/p>\n<p>The Metropolis-Hastings block updater introduced in chapter 3 is based on a multivariate normal distribution, which is not ideal for this problem because we want all three parameters to be positive by definition. The multivariate normal proposals could have problems if the chain got close to zero. In fact with this problem all three parameters are clearly non-zero so we could probably get away without any transformation, but I decided to work in terms of the logs of the parameters to protect myself from\u00a0the remote possibility of getting\u00a0close to zero.<\/p>\n<p>For the multivariate normal proposals to work well I have to make a guess at the covariance matrix of the logged parameters. From the analysis of the women shown in on figure 3.5 from chapter 3 I expected\u00a0the posterior distribution of lambdaF to range from about 0.3 to 0.9, on a log-scale this is\u00a0-1.30 to -0.11, a range of 1.19. So the standard deviation must be about a quarter of this range or 0.30 and the variance about 0.09. In a similar way I\u00a0guessed that alpha would be between 0.2 and 1.2\u00a0so I guessed a variance of 0.20 for log(alpha). I then assumed a negative correlation of -0.8 between log(alpha) and both log(lambda) and a positive correlation of +0.5 between log(lambdaF) and log(lambdaM) induced by their common dependence on alpha. These guesses can be very rough and the sampler will still work efficiently.<\/p>\n<p>This covariance matrix is called V in the code.<\/p>\n<p>Inside the program logpost, I undo the log-scales by taking exponents and then calculate the log-posterior in the usual way. The final line of the code adds the Jacobian to allow for the change of scale.<\/p>\n<p>The log(parameters) have summary statistics<\/p>\n<pre><span style=\"color: #0000ff\">----------------------------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\">Parameter\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 n\u00a0\u00a0\u00a0\u00a0 mean\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 sd\u00a0\u00a0\u00a0\u00a0\u00a0 sem\u00a0\u00a0 median\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 95% CrI<\/span>\r\n<span style=\"color: #0000ff\">----------------------------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\">\u00a0logM\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5000\u00a0\u00a0 -0.289\u00a0\u00a0\u00a0 0.131\u00a0\u00a0 0.0040\u00a0\u00a0 -0.272 (\u00a0 -0.565,\u00a0 -0.060 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0logF\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5000\u00a0\u00a0 -0.686\u00a0\u00a0\u00a0 0.147\u00a0\u00a0 0.0045\u00a0\u00a0 -0.676 (\u00a0 -0.989,\u00a0 -0.426 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0logalpha\u00a0\u00a0\u00a0\u00a0\u00a0 5000\u00a0\u00a0 -0.377\u00a0\u00a0\u00a0 0.219\u00a0\u00a0 0.0070\u00a0\u00a0 -0.389 (\u00a0 -0.783,\u00a0\u00a0 0.069 )<\/span>\r\n<span style=\"color: #0000ff\">----------------------------------------------------------------------------<\/span><\/pre>\n<pre><span style=\"color: #0000ff\">Correlations<\/span>\r\n<span style=\"color: #0000ff\">(obs=5000)<\/span><\/pre>\n<pre><span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 |\u00a0\u00a0\u00a0\u00a0 logM\u00a0\u00a0\u00a0\u00a0 logF logalpha<\/span>\r\n<span style=\"color: #0000ff\">-------------+---------------------------<\/span>\r\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 logM |\u00a0\u00a0 1.0000<\/span>\r\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 logF |\u00a0\u00a0 0.7511\u00a0\u00a0 1.0000<\/span>\r\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0 logalpha |\u00a0 -0.9042\u00a0 -0.8207\u00a0\u00a0 1.0000<\/span><\/pre>\n<p>So my guesses at the correlations were quite good. Transforming onto a more interpretable scale the summaries are<\/p>\n<pre><span style=\"color: #0000ff\"> ----------------------------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\">Parameter\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 n\u00a0\u00a0\u00a0\u00a0 mean\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 sd\u00a0\u00a0\u00a0\u00a0\u00a0 sem\u00a0\u00a0 median\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 95% CrI<\/span>\r\n<span style=\"color: #0000ff\">----------------------------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\">\u00a0lambdaM\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5000\u00a0\u00a0\u00a0 0.755\u00a0\u00a0\u00a0 0.097\u00a0\u00a0 0.0029\u00a0\u00a0\u00a0 0.762 (\u00a0\u00a0 0.568,\u00a0\u00a0 0.941 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0lambdaF\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5000\u00a0\u00a0\u00a0 0.509\u00a0\u00a0\u00a0 0.074\u00a0\u00a0 0.0023\u00a0\u00a0\u00a0 0.509 (\u00a0\u00a0 0.372,\u00a0\u00a0 0.653 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5000\u00a0\u00a0\u00a0 0.703\u00a0\u00a0\u00a0 0.159\u00a0\u00a0 0.0050\u00a0\u00a0\u00a0 0.678 (\u00a0\u00a0 0.457,\u00a0\u00a0 1.072 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0delta\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 5000\u00a0\u00a0\u00a0 0.246\u00a0\u00a0\u00a0 0.065\u00a0\u00a0 0.0019\u00a0\u00a0\u00a0 0.244 (\u00a0\u00a0 0.125,\u00a0\u00a0 0.376 )<\/span>\r\n<span style=\"color: #0000ff\">----------------------------------------------------------------------------<\/span><\/pre>\n<p>There does seem to be good evidence of clustering and allowing for that clustering, the Poisson means for both men and women are roughly halved although there is still good evidence that men and women differ.<\/p>\n<p>Mixing seems quite good. The standard errors are quite small (random sampling would have given 0.1\/sqrt(20000)=0.0007 and even with thinning we have a se of 0.0029) and the trace plots show no drift and seem to cover the posterior well.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/10\/solution_chap3_fig4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-364\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/10\/solution_chap3_fig4.png\" alt=\"solution_chap3_fig4\" width=\"811\" height=\"590\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/10\/solution_chap3_fig4.png 811w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/10\/solution_chap3_fig4-300x218.png 300w\" sizes=\"auto, (max-width: 811px) 100vw, 811px\" \/><\/a><\/p>\n<p>And the matrix scatter plots of the chains illustrate why an algorithm that updated one parameter at a time\u00a0would have taken longer to converge.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/10\/solution_chap3_fig5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-365\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/10\/solution_chap3_fig5.png\" alt=\"solution_chap3_fig5\" width=\"811\" height=\"590\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/10\/solution_chap3_fig5.png 811w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/10\/solution_chap3_fig5-300x218.png 300w\" sizes=\"auto, (max-width: 811px) 100vw, 811px\" \/><\/a><\/p>\n<p>Here I used my guess at the covariance matrix of the posterior distribution, V,\u00a0as the covariance matrix of the multivariate normal proposals. You can make the algorithm slightly more efficient by scaling this matrix slightly, perhaps basing the proposals on V\/4 rather than V. You could even optimise the scaling factor by learning about it during the burn-in.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is another of my occasional postings working through the exercises given at the end of the chapters in \u2018Bayesian analysis with Stata\u2019. Chapter 3 introduces the Metropolis-Hastings sampler and so enables us to tackle any small sized Bayesian analysis. For larger problems this method can be too slow and faster algorithms are needed. Despite [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[39,28],"class_list":["post-332","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-bayesian-analysis-with-stata","tag-solutions"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/332","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=332"}],"version-history":[{"count":11,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/332\/revisions"}],"predecessor-version":[{"id":369,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/332\/revisions\/369"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=332"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=332"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=332"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}