{"id":893,"date":"2015-10-09T07:59:53","date_gmt":"2015-10-09T07:59:53","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=893"},"modified":"2025-02-26T13:21:37","modified_gmt":"2025-02-26T13:21:37","slug":"reversible-jump-mcmc-ii","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/10\/09\/reversible-jump-mcmc-ii\/","title":{"rendered":"Reversible Jump MCMC II"},"content":{"rendered":"<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/10\/02\/reversible-jump-mcmc\/\">Last time<\/a> I outlined the derivation of a Metropolis-Hastings algorithm for RJMCMC and this time I want to implement it in Stata for a very simple problem. I have\u00a0taken the problem from\u00a0the paper in <em>The American Statistician<\/em> by Barker and Link that I referenced at the end of <a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/10\/02\/reversible-jump-mcmc\/\">my last posting<\/a>.<\/p>\n<p>We have two binomial samples y<sub>1<\/sub>=8, n<sub>1<\/sub>=20 and y<sub>2<\/sub>=16, n<sub>2<\/sub>=30, but we are not sure whether to model them as,<br \/>\nM1: binomials with different probabilities \u03c0<sub>1<\/sub> and \u03c0<sub>2<\/sub><br \/>\nM2: binomials with a common probability \u03be.<\/p>\n<p>We will need priors. In their paper, Barker and Link use identical vague priors that simply cancel out of the equations, so you cannot see exactly what is going on.\u00a0I prefer to\u00a0set\u00a0different priors and I&#8217;ll arbitrarily choose, p(M1)=0.3 and p(M2)=0.7. Under M1 my priors on the parameters will be p(\u03c0<sub>1<\/sub>|M1)=Beta(5,5) and p(\u03c0<sub>2<\/sub>|M1)=Beta(10,10) and under M2, p(\u03be|M2)=Beta(20,20).<\/p>\n<p>Before setting up the RJMCMC program let us analyse the two models separately.<\/p>\n<p>Here is a standard MH program for analysing model M1. It uses the truncated MH sampler, <strong>mhstrnc,<\/strong> to ensure that all proposals for \u03c0<sub>1<\/sub> and \u03c0<sub>2<\/sub> are between 0 and 1. I have lazily used the function, <strong>logdensity<\/strong>;\u00a0this is not very efficient, but the\u00a0program is so simple\u00a0that speed is not as issue.<\/p>\n<p><span style=\"color: #3366ff\">program logpost<\/span><br \/>\n<span style=\"color: #3366ff\"> args lnf b<\/span><\/p>\n<p><span style=\"color: #3366ff\">local pi1 = `b'[1,1]<\/span><br \/>\n<span style=\"color: #3366ff\"> local pi2 = `b'[1,2]<\/span><br \/>\n<span style=\"color: #3366ff\"> scalar `lnf&#8217; = 0<\/span><br \/>\n<span style=\"color: #3366ff\"> logdensity binomial `lnf&#8217; 8 20 `pi1&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\"> logdensity binomial `lnf&#8217; 16 30 `pi2&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\"> logdensity beta `lnf&#8217; `pi1&#8242; 5 5<\/span><br \/>\n<span style=\"color: #3366ff\"> logdensity beta `lnf&#8217; `pi2&#8242; 10 10<\/span><br \/>\n<span style=\"color: #3366ff\"> end<\/span><\/p>\n<p><span style=\"color: #3366ff\">matrix b = (0.5, 0.5)<\/span><br \/>\n<span style=\"color: #3366ff\"> mcmcrun logpost b using temp.csv , replace \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\"> param(pi1 pi2) burn(500) update(5000) \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\"> samplers( 2(mhstrnc , sd(0.5) lb(0) ub(1) ) )<\/span><\/p>\n<p><span style=\"color: #3366ff\">insheet using temp.csv, clear<\/span><br \/>\n<span style=\"color: #3366ff\"> mcmcstats *<\/span><\/p>\n<p>Here is a summary of the chain,<\/p>\n<pre>-------------------------------------------------------\r\n Parameter n  mean    sd    sem median    95% CrI\r\n ------------------------------------------------------\r\n pi1    5000 0.431 0.088 0.0030 0.428 ( 0.260, 0.615 )\r\n pi2    5000 0.521 0.071 0.0022 0.522 ( 0.376, 0.656 )\r\n -------------------------------------------------------<\/pre>\n<p>Mixing is fine\u00a0so I will not bother with the graphics.<\/p>\n<p>Probably you know that the beta prior is conjugate to the binomial probability so it can be shown theoretically that the posterior will also have a beta distribution.\u00a0When the\u00a0data are y out of n\u00a0and\u00a0the prior on p is\u00a0Beta(a,b)\u00a0then the posterior on p will be\u00a0Beta(a+y,b+n-y).<\/p>\n<p>We could take advantage of this and program our own Gibbs sampler.\u00a0This new algorithm\u00a0ought to\u00a0converge quicker than the Metropolis-Hastings algorithm and be faster too.<\/p>\n<p><span style=\"color: #3366ff\">program myGibbs<\/span><br \/>\n<span style=\"color: #3366ff\"> args logpost b ipar<\/span><\/p>\n<p><span style=\"color: #3366ff\">matrix `b'[1,1] = rbeta(13,17)<\/span><br \/>\n<span style=\"color: #3366ff\"> matrix `b'[1,2] = rbeta(26,24)<\/span><br \/>\n<span style=\"color: #3366ff\"> end<\/span><\/p>\n<p><span style=\"color: #3366ff\">matrix b = (0.5, 0.5)<\/span><br \/>\n<span style=\"color: #3366ff\"> mcmcrun logpost b using temp.csv , replace \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\"> param(pi1 pi2) burn(500) update(5000) \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\"> samplers( myGibbs , dim(2) )<\/span><\/p>\n<p><span style=\"color: #3366ff\">insheet using temp.csv, clear<\/span><br \/>\n<span style=\"color: #3366ff\"> mcmcstats *<\/span><\/p>\n<p>Now the results are,<\/p>\n<pre>-------------------------------------------------------\r\n Parameter n\u00a0 mean\u00a0\u00a0\u00a0 sd\u00a0\u00a0\u00a0 sem median\u00a0\u00a0\u00a0 95% CrI\r\n ------------------------------------------------------\r\n pi1    5000 0.433 0.088 0.0012 0.431 ( 0.270, 0.612 )\r\n pi2    5000 0.520 0.070 0.0010 0.522 ( 0.380, 0.654 )\r\n -------------------------------------------------------<\/pre>\n<p>As expected we have the same answer but we have reduced the sampling error as measured by the MCMC standard error (sem).<\/p>\n<p>We can even calculate the true posterior mean since Wikipedia tells us that a Beta(a,b) distribution has a mean a\/(a+b), so a Beta(13,17) should have a mean 0.433 and a Beta(26,24) should have a mean 0.520, just as we found.<\/p>\n<p>We can do the same for model 2 although I will not present\u00a0all of the details\u00a0as they are so similar. If y<sub>1<\/sub>=8, n<sub>1<\/sub>=20 and y<sub>2<\/sub>=16, n<sub>2<\/sub>=30 have a common binomial probability with prior, Beta(20,20), the posterior will be Beta(8+16+20,12+14+20)=Beta(44,46) so the posterior mean is 0.489.<\/p>\n<p>Now we will set up a chain that moves between the two models. When it is in M1 the chain will estimate (\u03c0<sub>1<\/sub>, \u03c0<sub>2<\/sub>) and should find exactly the same posterior that we have already derived. When the chain is in model 2 it should estimate\u00a0\u03be and again get the same estimate as before. The extra information will come about because the algorithm will decide whether to spend its time in M1 or M2\u00a0and it will do so in such a way that the time will be proportional to the model&#8217;s\u00a0posterior probability.<\/p>\n<p>Model M1 has 2 parameters and model M2 has 1 parameter. So in the notation that I set up last week, the maximal set of parameters is, \u03a8 = (\u03a81, \u03a82) and depending on the choice of model, we need to convert (\u03a81, \u03a82) into (\u03c0<sub>1<\/sub>, \u03c0<sub>2<\/sub>) or (\u03a81, \u03a82) into (\u03be,u) where u is\u00a0an arbitrary, unused parameter.<\/p>\n<p>Now we need to know how to move between models.<\/p>\n<p>There is no unique way to jump between the models, but as long as the jumps are 1:1 and we do not enter any cul-de-sacs, we can do what we like, although daft choices will lead to poor mixing and a very long chain.<\/p>\n<p>For instance, if under M1 we currently have estimates (0.5,0.3), it would be odd to propose a move to M2 with\u00a0\u03be =0.9. It would mean that we think that the separate probabilities are 0.5 and 0.3 and we try a move to a model with a common probability that is almost 1. Obviously such a move\u00a0would rarely be accepted and the chain would take a long time to converge. However, it would eventually move and it would eventually recover; though eventually might mean millions of iterations.<\/p>\n<p>I\u2019m going to\u00a0start with the simplest possible proposal,<\/p>\n<p>M1: \u03c0<sub>1<\/sub> = \u03a81 and \u03c0<sub>2<\/sub>= \u03a82<br \/>\nM2:\u00a0\u03be = \u03a81 and u=\u03a82<\/p>\n<p>Last time I derived the form of the Metropolis-Hastings algorithm and mentioned that we need to be aware of the problem of parameter transformation and the way that\u00a0transformation affects the increments d\u03a8 in the Metropolis-Hastings acceptance probability. Here both of the sets of parameters are derived, untransformed, from \u03a8.\u00a0Depending on how you choose to think about it, there\u00a0is either no transformation of parameters or the transformation is based on simple equality, so either the d\u03a8 cancel directly, or both of the Jacobians\u00a0are equal to 1 and can be omitted.<\/p>\n<p>This leads us to our first RJMCMC program<\/p>\n<p><span style=\"color: #3366ff\">program myRJMCMC <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 args logpost b ipar<\/span><\/p>\n<p><span style=\"color: #3366ff\">\u00a0\u00a0 local M = `b'[1,1]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 if( `M&#8217; == 1 ) {<\/span><br \/>\n<span style=\"color: #339966\">* currently in model 1: update<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,2] = rbeta(13,17)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,3] = rbeta(26,24)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = `b'[1,2]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = `b'[1,3]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 <span style=\"color: #339966\">* try switch to M=2<\/span><\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 local a1 = log(binomialp(20,8,`psi1&#8242;))+log(binomialp(30,16,`psi2&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 log(betaden(5,5,`psi1&#8242;))+log(betaden(10,10,`psi2&#8242;))+log(0.3) <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 local a2 = log(binomialp(20,8,`psi1&#8242;))+log(binomialp(30,16,`psi1&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 log(betaden(20,20,`psi1&#8242;))+log(betaden(26,24,`psi2&#8242;))+log(0.7) <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 if log(runiform()) &lt; `a2&#8242; &#8211; `a1&#8242; matrix `b'[1,1] = 2<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #339966\">* currently in model 2: update<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,2] = rbeta(44,46)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,3] = rbeta(26,24)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = `b'[1,2]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = `b'[1,3]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <span style=\"color: #339966\">* try switch to M=1<\/span><\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local a1 = log(binomialp(20,8,`psi1&#8242;))+log(binomialp(30,16,`psi2&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 log(betaden(5,5,`psi1&#8242;))+log(betaden(10,10,`psi2&#8242;))+log(0.3) <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local a2 = log(binomialp(20,8,`psi1&#8242;))+log(binomialp(30,16,`psi1&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 log(betaden(20,20,`psi1&#8242;))+log(betaden(26,24,`psi2&#8242;))+log(0.7) <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if log(runiform()) &lt; `a1&#8242; &#8211; `a2&#8242; matrix `b'[1,1] = 1<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">end<\/span><\/p>\n<p><span style=\"color: #3366ff\">matrix b = (1, 0.5, 0.5)<\/span><br \/>\n<span style=\"color: #3366ff\">mcmcrun logpost b using temp.csv , replace \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 param(m psi1 psi2) burn(500) update(5000) \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 samplers( myRJMCMC , dim(3) )<\/span><\/p>\n<p><span style=\"color: #3366ff\">insheet using temp.csv, clear<\/span><br \/>\n<span style=\"color: #3366ff\">tabulate m<\/span><br \/>\n<span style=\"color: #3366ff\">mcmcstats *<\/span><br \/>\n<span style=\"color: #3366ff\">mcmcstats * if m == 1<\/span><br \/>\n<span style=\"color: #3366ff\">mcmcstats * if m == 2<\/span><\/p>\n<p>The vector of parameters, b, has been expanded to contain an indicator of the model as well as \u03a8.<\/p>\n<p>Let&#8217;s\u00a0note a few of the features of the program.<\/p>\n<p>(a) I have programmed for clarity rather than for efficiency, for instance, I copy the parameter values into locals which strictly I do not need to do.<\/p>\n<p>(b) First I\u00a0find out\u00a0which model the chain is currently in. Then I update the parameter estimates just as I did when I was analysing the models separately. Finally I decide whether to stay with the current model or to switch and I do this with a Metropolis-Hastings step.<\/p>\n<p>(c) I calculate the log of the acceptance ratio because I want to avoid loss of precision due to very small probabilities.<\/p>\n<p>(d) Each part of the acceptance ratio involves the likelihood, the prior on \u03a8 for that model and the prior on the model. Often there would also be a Jacobian but as noted already, the Jacobians for our moves are both 1.<\/p>\n<p>(e) The proposal probabilities for the moves (g() in the notation that I set up last time)\u00a0are omitted from the acceptance ratio because they cancel. Since there are only two models,\u00a0we propose a move to M2 when in M1 and a move to M1\u00a0when in\u00a0M2 with equal probability.<\/p>\n<p>(f) Under model 2 I still have to update u even though it does not occur in the model&#8217;s likelihood. Here I have chosen to use the same distribution\u00a0that I used for \u03c0<sub>2<\/sub> under model 1, but this is only for convenience. The choice is arbitrary.<\/p>\n<p>Here are the results<\/p>\n<pre>\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 m |\u00a0\u00a0\u00a0\u00a0\u00a0 Freq.\u00a0\u00a0\u00a0\u00a0 Percent\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Cum.\r\n------------+-----------------------------------\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1 |\u00a0\u00a0\u00a0\u00a0\u00a0 1,165\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 23.30\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 23.30\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 2 |\u00a0\u00a0\u00a0\u00a0\u00a0 3,835\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 76.70\u00a0\u00a0\u00a0\u00a0\u00a0 100.00\r\n------------+-----------------------------------\r\n\u00a0\u00a0\u00a0\u00a0\u00a0 Total |\u00a0\u00a0\u00a0\u00a0\u00a0 5,000\u00a0\u00a0\u00a0\u00a0\u00a0 100.00<\/pre>\n<p>So we spend about\u00a077% of the time in model 2, which means that the posterior probability of model 2 is 0.77. Our prior was 0.7 so the data have pushed us slightly further in the direction of model 2.<\/p>\n<p>We can even calculate the Bayes Factor. The prior ratio of model probabilities is 0.7\/0.3=2.33 and the posterior ratio is 0.77\/0.23=3.35 so the Bayes Factor is 3.35\/2.33=1.44. Not a very impressive Bayes Factor since it suggests that the data\u00a0only push us\u00a0weakly in the direction of Model 2; most of our preference for model 2 comes from the prior.<\/p>\n<p>The Stata command<\/p>\n<p><span style=\"color: #3366ff\">. count if m != m[_n-1]<\/span><\/p>\n<p>Tells us that we switched model 1785 times in the chain of 5,000 iterations. The ideal for MH algorithms is to move around 30%-50% of the time\u00a0so this represents\u00a0good mixing.<\/p>\n<p>If we extract the results for the time that the chain spent in Model 1<\/p>\n<pre>----------------------------------------------------------------------------\r\nParameter\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 n\u00a0\u00a0\u00a0\u00a0 mean\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 sd\u00a0\u00a0\u00a0\u00a0\u00a0 sem\u00a0\u00a0 median\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 95% CrI\r\n----------------------------------------------------------------------------\r\n\u00a0m\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1165\u00a0\u00a0\u00a0 1.000\u00a0\u00a0\u00a0 0.000\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 .\u00a0\u00a0\u00a0 1.000 (\u00a0\u00a0 1.000,\u00a0\u00a0 1.000 )\r\n\u00a0psi1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1165\u00a0\u00a0\u00a0 0.435\u00a0\u00a0\u00a0 0.091\u00a0\u00a0 0.0028\u00a0\u00a0\u00a0 0.436 (\u00a0\u00a0 0.262,\u00a0\u00a0 0.618 )\r\n\u00a0psi2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1165\u00a0\u00a0\u00a0 0.519\u00a0\u00a0\u00a0 0.073\u00a0\u00a0 0.0021\u00a0\u00a0\u00a0 0.518 (\u00a0\u00a0 0.371,\u00a0\u00a0 0.657 )\r\n----------------------------------------------------------------------------<\/pre>\n<p>The parameter estimates are very similar to the values that we calculated before (0.433 and 0.520).<\/p>\n<p>Extracting the values for Model 2 we get<\/p>\n<pre>----------------------------------------------------------------------------\r\nParameter\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 n\u00a0\u00a0\u00a0\u00a0 mean\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 sd\u00a0\u00a0\u00a0\u00a0\u00a0 sem\u00a0\u00a0 median\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 95% CrI\r\n----------------------------------------------------------------------------\r\n\u00a0m\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3835\u00a0\u00a0\u00a0 2.000\u00a0\u00a0\u00a0 0.000\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 .\u00a0\u00a0\u00a0 2.000 (\u00a0\u00a0 2.000,\u00a0\u00a0 2.000 )\r\n\u00a0psi1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3835\u00a0\u00a0\u00a0 0.489\u00a0\u00a0\u00a0 0.052\u00a0\u00a0 0.0009\u00a0\u00a0\u00a0 0.490 (\u00a0\u00a0 0.387,\u00a0\u00a0 0.591 )\r\n\u00a0psi2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 3835\u00a0\u00a0\u00a0 0.521\u00a0\u00a0\u00a0 0.070\u00a0\u00a0 0.0011\u00a0\u00a0\u00a0 0.523 (\u00a0\u00a0 0.379,\u00a0\u00a0 0.654 )\r\n----------------------------------------------------------------------------<\/pre>\n<p>Once again the estimate of the joint probability is very close to the theoretical mean (0.489). The second parameter is the u and must be ignored.<\/p>\n<p>Basically that is\u00a0RJMCMC, but let us try some variations.<\/p>\n<p>First I said that the updating of u is arbitrary, so I will change from Beta(26,24) to Beta(1,1), which is equivalent to using a uniform distribution. Here\u00a0is the density of the distribution that I used\u00a0before and the uniform that I will switch to; they look very different but since that will be used for u, they may affect the mixing but should not change the posteriors.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/09\/uniform-e1443619901996.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-914\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/09\/uniform-e1443619901996.png\" alt=\"uniform\" width=\"600\" height=\"436\" \/><\/a><\/p>\n<p>The code is almost the same. I&#8217;ve highlighted the changes in red.<\/p>\n<p><span style=\"color: #3366ff\">program myRJMCMC <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 args logpost b ipar<\/span><\/p>\n<p><span style=\"color: #3366ff\">\u00a0\u00a0 local M = `b'[1,1]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 if( `M&#8217; == 1 ) {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,2] = rbeta(13,17)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,3] = rbeta(26,24)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = `b'[1,2]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = `b'[1,3]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0 * try switch to M=2<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0 local a1 = log(binomialp(20,8,`psi1&#8242;))+log(binomialp(30,16,`psi2&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 log(betaden(5,5,`psi1&#8242;))+log(betaden(10,10,`psi2&#8242;))+log(0.3) <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0 local a2 = log(binomialp(20,8,`psi1&#8242;))+log(binomialp(30,16,`psi1&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 log(betaden(20,20,`psi1&#8242;))+<span style=\"color: #ff0000\">log(betaden(1,1,`psi2&#8242;))+<\/span>log(0.7) <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0 if log(runiform()) &lt; `a2&#8242; &#8211; `a1&#8242; matrix `b'[1,1] = 2<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0 <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,2] = rbeta(44,46)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 <span style=\"color: #ff0000\">matrix `b'[1,3] = rbeta(1,1)<\/span><\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = `b'[1,2]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = `b'[1,3]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0 * try switch to M=1<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0 local a1 = log(binomialp(20,8,`psi1&#8242;))+log(binomialp(30,16,`psi2&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 log(betaden(5,5,`psi1&#8242;))+log(betaden(10,10,`psi2&#8242;))+log(0.3) <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0 local a2 = log(binomialp(20,8,`psi1&#8242;))+log(binomialp(30,16,`psi1&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 log(betaden(20,20,`psi1&#8242;))+<span style=\"color: #ff0000\">log(betaden(1,1,`psi2&#8242;))+<\/span>log(0.7) <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0 if log(runiform()) &lt; `a1&#8242; &#8211; `a2&#8242; matrix `b'[1,1] = 1<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">end<\/span><\/p>\n<p>The results are so similar that I will not even bother to show them. The mixing is slightly worse with a tendency to stay longer\u00a0in the same\u00a0model before switching, but even that effect is minor.<\/p>\n<p>At present the algorithm always updates the parameters for the current model and then always attempts\u00a0to switch models, but we do not need to follow this rigid pattern, for instance, we could randomly either update the current model or try to switch and we could vary the probabilities of updates and attempted model changes to try to improve overall mixing.<\/p>\n<p>Here is a program that updates the current model 3\/4 of the time and attempts a move between models 1\/4 of the time.<\/p>\n<p><span style=\"color: #3366ff\">program myRJMCMC <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 args logpost b ipar<\/span><\/p>\n<p><span style=\"color: #3366ff\">\u00a0\u00a0 local M = `b'[1,1]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 if( runiform() &lt; 0.75 ) {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 <span style=\"color: #339966\">* update within the current model<\/span><\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 if( `M&#8217; == 1 ) {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,2] = rbeta(13,17)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,3] = rbeta(26,24)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,2] = rbeta(44,46)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,3] = rbeta(1,1)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 <span style=\"color: #339966\">* try to switch <\/span><\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = `b'[1,2]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = `b'[1,3]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local a1 = log(binomialp(20,8,`psi1&#8242;))+log(binomialp(30,16,`psi2&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 log(betaden(5,5,`psi1&#8242;))+log(betaden(10,10,`psi2&#8242;))+log(0.3) <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local a2 = log(binomialp(20,8,`psi1&#8242;))+log(binomialp(30,16,`psi1&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 log(betaden(20,20,`psi1&#8242;))+log(betaden(1,1,`psi2&#8242;))+log(0.7) <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if `M&#8217; == 1 {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if log(runiform()) &lt; `a2&#8242; &#8211; `a1&#8242; matrix `b'[1,1] = 2<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0if log(runiform()) &lt; `a1&#8242; &#8211; `a2&#8242; matrix `b'[1,1] = 1<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">end<\/span><\/p>\n<p>Once again the results are so similar that it is not worth showing them. The only noticeable difference is that the mixing is worse. This time we make around 400 model switches every 5000 iterations instead of the nearly 1800 that we achieved with the first program. As a result the estimate of P(M2|Y)\u00a0is not\u00a0as good and we\u00a0ought to run the chain for longer.<\/p>\n<p>Perhaps the main conclusion is, <em>this is such a simple problem\u00a0that virtually everything that we try will work perfectly well<\/em>.<\/p>\n<p>There is one further complication to put into the algorithm and that is to introduce\u00a0a parameter transformation that will require Jacobians that\u00a0are not equal to 1. Since this posting is already quite long and because the Jacobian is the element that is most likely to cause an error, I will cover this next time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last time I outlined the derivation of a Metropolis-Hastings algorithm for RJMCMC and this time I want to implement it in Stata for a very simple problem. I have\u00a0taken the problem from\u00a0the paper in The American Statistician by Barker and Link that I referenced at the end of my last posting. We have two binomial [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[88,82],"class_list":["post-893","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-binomial","tag-rjmcmc"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/893","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=893"}],"version-history":[{"count":14,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/893\/revisions"}],"predecessor-version":[{"id":951,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/893\/revisions\/951"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=893"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=893"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=893"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}