{"id":918,"date":"2015-10-15T08:22:49","date_gmt":"2015-10-15T08:22:49","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=918"},"modified":"2025-02-26T13:21:37","modified_gmt":"2025-02-26T13:21:37","slug":"reversible-jump-mcmc-iii","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/10\/15\/reversible-jump-mcmc-iii\/","title":{"rendered":"Reversible Jump MCMC III"},"content":{"rendered":"<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/10\/09\/reversible-jump-mcmc-ii\/\">In the previous posting<\/a> in this series on RJMCMC, I created a Stata program for the following problem.<\/p>\n<p>We have two binomial samples y<sub><span style=\"font-size: small\">1<\/span><\/sub>=8, n<sub><span style=\"font-size: small\">1<\/span><\/sub>=20 and y<sub><span style=\"font-size: small\">2<\/span><\/sub>=16, n<sub><span style=\"font-size: small\">2<\/span><\/sub>=30, but we are not sure whether to model them as,<br \/>\nM1: binomials with different probabilities \u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub> and \u03c0<span style=\"font-size: small\"><sub>2<\/sub><br \/>\n<\/span> M2: binomials with a common probability \u03be.<\/p>\n<p>For my priors I\u00a0chose, p(M1)=0.3 and p(M2)=0.7. Under M1 my priors on the parameters\u00a0were p(\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>|M1)=Beta(5,5) and p(\u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub>|M1)=Beta(10,10) and under M2, p(\u03b7|M2)=Beta(20,20).<\/p>\n<p>In the\u00a0previous programs we imagined a vector of parameters \u03a8 with two elements (\u03a81,\u03a82). When we selected model M1 we just set \u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>=\u03a81 and \u03c0<span style=\"font-size: small\"><sub>2<\/sub><\/span>=\u03a82. While when in model M2 we set \u03be=\u03a81 and created an arbitrarily distributed,\u00a0unused parameter u=\u03a82. I then set up a Metropolis-Hastings algorithm to control switches between the models and everything worked well.<\/p>\n<p>It turned out that the posterior probability of M2 is about 0.77, only slightly higher than our prior model probability. So the data move us slightly in the direction of model 2. The\u00a0posterior distribution\u00a0of the binomial probabilities when in model 1 has a mean for \u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>=0.433 and \u03c0<span style=\"font-size: small\"><sub>2<\/sub><\/span>=0.520 and under model 2 the posterior mean for \u03be=0.489.<\/p>\n<p>This algorithm represents a rather special case of RJMCMC because it does not involve any transformations when we move between \u03a8 and the particular parameters of the two models, or if you wish you can say that there is a transformation but it takes the form of a simple equality. The result is that the Jacobians of the transformations are 1 and we can leave them out of the calculations.<\/p>\n<p>What I want to do now is to introduce a transformation into the algorithm so that we are forced to calculate the Jacobians.<\/p>\n<p>First let me present again the Stata program for RJMCMC without transformation. This is effectively the same as the program that we had before, except that I have expanded the code slightly with a few redundant calculations intended to prepare us for the generalization needed when we incorporate transformations.<\/p>\n<p><span style=\"color: #3366ff\">program myRJMCMC <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 args logpost b ipar<\/span><\/p>\n<p><span style=\"color: #3366ff\">\u00a0\u00a0 local M = `b'[1,1]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 if( runiform() &lt; 0.5 ) {<\/span><br \/>\n<span style=\"color: #339966\">\u00a0\u00a0\u00a0\u00a0\u00a0 * update within the current model<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 if( `M&#8217; == 1 ) {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local pi1 = rbeta(13,17)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local pi2 = rbeta(26,24)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = `pi1&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = `pi2&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local xi = rbeta(44,46)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local u\u00a0 = rbeta(1,1)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = `xi&#8217;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = `u&#8217;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,2] = `psi1&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,3] = `psi2&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\"><span style=\"color: #339966\">\u00a0\u00a0\u00a0\u00a0\u00a0 * try to switch<\/span> <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = `b'[1,2]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = `b'[1,3]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0local pi1 = `psi1&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local pi2 = `psi2&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0local xi\u00a0 = `psi1&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local u\u00a0\u00a0 = `psi2&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local a1 = log(binomialp(20,8,`pi1&#8242;))+log(binomialp(30,16,`pi2&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0log(betaden(5,5,`pi1&#8242;))+log(betaden(10,10,`pi2&#8242;))+log(0.3)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local a2 = log(binomialp(20,8,`xi&#8217;))+log(binomialp(30,16,`xi&#8217;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0log(betaden(20,20,`xi&#8217;))+log(betaden(1,1,`u&#8217;))+log(0.7)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if `M&#8217; == 1 {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0if log(runiform()) &lt; `a2&#8242; &#8211; `a1&#8242; matrix `b'[1,1] = 2<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0 if log(runiform()) &lt; `a1&#8242; &#8211; `a2&#8242; matrix `b'[1,1] = 1<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">end<\/span><\/p>\n<p><span style=\"color: #3366ff\">matrix b = (1, 0.5, 0.5)<\/span><br \/>\n<span style=\"color: #3366ff\">mcmcrun logpost b using temp.csv , replace \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 param(m psi1 psi2) burn(500) update(20000) thin(2) \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 samplers( myRJMCMC , dim(3) )<\/span><\/p>\n<p><span style=\"color: #3366ff\">insheet using temp.csv, clear<\/span><br \/>\n<span style=\"color: #3366ff\">gen pi1 = psi1<\/span><br \/>\n<span style=\"color: #3366ff\">gen pi2 = psi2<\/span><br \/>\n<span style=\"color: #3366ff\">mcmcstats pi1 pi2 if m == 1<\/span><br \/>\n<span style=\"color: #3366ff\">gen xi = psi1<\/span><br \/>\n<span style=\"color: #3366ff\">gen u = psi2<\/span><br \/>\n<span style=\"color: #3366ff\">mcmcstats xi u if m == 2<\/span><br \/>\n<span style=\"color: #3366ff\">tabulate m<\/span><\/p>\n<p>The main features of the program are:<\/p>\n<p>(a) randomly with probability 0.5, I decide either to update the parameters within the current model or to propose a move between models<\/p>\n<p>(b) the within model updates are able to use Gibbs sampling because we have chosen conjugate priors<\/p>\n<p>(c) the MH algorithm for switching between models\u00a0calculates the acceptance probability from\u00a0the likelihood of the data under that model, the prior on the parameters under that model, the prior model probability and the probability of selecting that move. Calculations are performed on a log-scale to avoid loss of precision due to very small quantities.<\/p>\n<p>(d) The chance of proposing a move to M2 when we are in M1 is the same as the chance of proposing a model to M1 when we are in M2 (i.e. 0.5) so these probabilities cancel and I have left them out of the acceptance probabilities.<\/p>\n<p>(e) the distribution of\u00a0M2&#8217;s unused parameter, u,\u00a0is arbitrary. Here it is Beta(1,1) which is a very poor choice given that under the other model the equivalent parameter is Beta(26,24). This will have a small impact on mixing\u00a0but the algorithm will recover and give the correct posterior.<\/p>\n<p>Now let us introduce a transformation into the algorithm. Under model 2, \u03be \u00a0is a common binomial parameter that replaces the separate binomial probabilities \u03c0<sub>1<\/sub> and \u03c0<sub>2<\/sub>. So a natural idea would be make \u03be equal to the average of \u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub> and \u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub>. We could write this as,<\/p>\n<p>Under M1<br \/>\n\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub> = \u03a81<br \/>\n\u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub> = \u03a82<\/p>\n<p>Under M2<br \/>\n\u03be = (\u03a81+\u03a82)\/2<br \/>\nu = \u03a82<\/p>\n<p>A word of warning before we continue. In the paper\u00a0in <em>The American Statistician<\/em> from which I took this problem, they use a very similar transformation, in fact they use a weighted average instead of a simple average. However, as we will see, transformations based on any sort of\u00a0average cause problems and\u00a0the algorithm\u00a0breaks down under extreme conditions. So\u00a0this is a nice illustration of RJMCMC partly because it fails and we can discover why it fails. Perhaps you can already see from the formulae for the transformations why we are going to run into difficulties.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/10\/02\/reversible-jump-mcmc\/\">A couple of weeks ago <\/a>we derived the formula for the MH acceptance probability as<\/p>\n<p><span style=\"color: #3366ff\">f(Y|\u03a8,M<sub><span style=\"font-size: small\">k<\/span><\/sub>) f(\u03a8|M<sub><span style=\"font-size: small\">k<\/span><\/sub>) d\u03a8 P(M<sub><span style=\"font-size: small\">k<\/span><\/sub>) g(M<sub><span style=\"font-size: small\">k<\/span><\/sub> to M<sub><span style=\"font-size: small\">j<\/span><\/sub>) \/ f(Y|\u03a8,M<sub><span style=\"font-size: small\">j<\/span><\/sub>) f(\u03a8|M<sub><span style=\"font-size: small\">j<\/span><\/sub>) d\u03a8 P(M<sub><span style=\"font-size: small\">j<\/span><\/sub>) g(M<sub><span style=\"font-size: small\">j<\/span><\/sub> to M<sub><span style=\"font-size: small\">k<\/span><\/sub>)<\/span><\/p>\n<p>Each product in the ratio includes\u00a0a likelihood, a prior on the parameters, a prior on the model and the probability of selecting that move. The extra term is d\u03a8 is present to turn the density of the prior into a probability.<\/p>\n<p>We are going to work in terms of \u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub> and \u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub> or \u03be and u and so we will need to transform the d\u03a8&#8217;s into small elements, d\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub> and d\u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub> or d\u03be and du this means multiplying by the Jacobian formed from derivatives d\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>\/d\u03a8 etc.<\/p>\n<p>In our program we calculate the MH acceptance probabilities with the lines<\/p>\n<p><span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local a1 = log(binomialp(20,8,`pi1&#8242;))+log(binomialp(30,16,`pi2&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0log(betaden(5,5,`pi1&#8242;))+log(betaden(10,10,`pi2&#8242;))+log(0.3)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local a2 = log(binomialp(20,8,`xi&#8217;))+log(binomialp(30,16,`xi&#8217;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0log(betaden(20,20,`xi&#8217;))+log(betaden(1,1,`u&#8217;))+log(0.7)<\/span><\/p>\n<p>The initial transformations both had a Jacobian of 1 because they\u00a0were based on\u00a0equalities with derivatives of 1 and, of course, the log(1) is zero so I\u00a0did not bother to include the Jacobians in the code.<\/p>\n<p>The new transformation\u00a0for M2 has the form<\/p>\n<p>\u03be = (\u03a81+\u03a82)\/2<br \/>\nu = \u03a82<\/p>\n<p>So the matrix of derivatives d\u03be\/\u03a81\u00a0,d\u03be\/\u03a82, du\/\u03a81, du\/d\u03a82 has the form<\/p>\n<p>0.5\u00a0\u00a0\u00a0\u00a0 0.5<\/p>\n<p>0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1<\/p>\n<p>which has determinant 0.5. This is our Jacobian. It says that small increments on the d\u03be\u00a0scale are\u00a0half the size of the corresponding increments on the d\u03a81\u00a0scale.<\/p>\n<p>Noting that out transformation implies that \u03a81=2\u03be-u, we can modify our Stata code. The changes are in red.<\/p>\n<p><span style=\"color: #3366ff\">program myRJMCMC <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 args logpost b ipar<\/span><\/p>\n<p><span style=\"color: #3366ff\">\u00a0\u00a0 local M = `b'[1,1]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 if( runiform() &lt; 0.5 ) {<\/span><br \/>\n<span style=\"color: #339966\">\u00a0\u00a0\u00a0\u00a0\u00a0 * update within the current model<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 if( `M&#8217; == 1 ) {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local pi1 = rbeta(13,17)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local pi2 = rbeta(26,24)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = `pi1&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = `pi2&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local xi = rbeta(44,46)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local u\u00a0 = rbeta(1,1)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = <span style=\"color: #ff0000\">2*`xi&#8217;-`u&#8217;<\/span><\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = `u&#8217;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,2] = `psi1&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,3] = `psi2&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\"><span style=\"color: #339966\">\u00a0\u00a0\u00a0\u00a0\u00a0 * try to switch<\/span> <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = `b'[1,2]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = `b'[1,3]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0local pi1 = `psi1&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local pi2 = `psi2&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0local xi\u00a0 = <span style=\"color: #ff0000\">(`psi1&#8217;+`psi2&#8242;)\/2<\/span><\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local u\u00a0\u00a0 = `psi2&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local a1 = log(binomialp(20,8,`pi1&#8242;))+log(binomialp(30,16,`pi2&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0log(betaden(5,5,`pi1&#8242;))+log(betaden(10,10,`pi2&#8242;))+log(0.3)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local a2 = log(binomialp(20,8,`xi&#8217;))+log(binomialp(30,16,`xi&#8217;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0log(betaden(20,20,`xi&#8217;))+log(betaden(1,1,`u&#8217;))+log(0.7)+<span style=\"color: #ff0000\">log(0.5)<\/span><\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if `M&#8217; == 1 {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0if log(runiform()) &lt; `a2&#8242; &#8211; `a1&#8242; matrix `b'[1,1] = 2<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0 if log(runiform()) &lt; `a1&#8242; &#8211; `a2&#8242; matrix `b'[1,1] = 1<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">end<\/span><\/p>\n<p>As I warned earlier this is almost correct but not quite and it fails to give the correct model posterior as it spends too much time in model 1.<\/p>\n<p>To illustrate the\u00a0breakdown of the algorithm,\u00a0I ran this program with\u00a0different priors for u ranging between Beta(25,25), Beta(10,10), Beta(5,5), Beta(3,3), Beta(2,2) and Beta(1,1). Each time I ran the algorithm for 10,000 iterations and repeated the analysis 25 times. The number of iterations is rather on the low side so we get quite a bit of sampling error between the 25 repeat runs. Here is a boxplot of the estimates of the mean posterior probability for Model 2 plotted on a percentage scale.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/10\/postm2byu-e1444139961358.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-933\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/10\/postm2byu-e1444139961358.png\" alt=\"postm2byu\" width=\"600\" height=\"436\" \/><\/a><\/p>\n<p>Our previous analysis showed that the correct answer is just under 77%.\u00a0So we\u00a0get\u00a0reasonable, but not perfect, estimates\u00a0when we use Beta(25,25) as our prior for u but the estimates deteriorate markedly when we use Beta(1,1). Yet the theory says that the prior on u is arbitrary.<\/p>\n<p>Beta(1,1) is equivalent to a uniform distribution, so occasionally under model M2 we might generate a value 0.95 for u. Suppose, at the same time, we\u00a0generate \u03be=0.45 then we will calculate \u00a0\u03a81=2\u03be-u=-0.05 and when we propose a move to model M1, we will set \u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>=-0.05 and we\u00a0are in trouble because \u03c0<sub><span style=\"font-size: small\">1 <\/span><\/sub>is a probability and it must be between 0 and 1.<\/p>\n<p>Our choice of making \u03be the average would be a sensible transformation if there were no constraints on the parameters but when the parameters must all lie in (0,1), it\u00a0will sometimes\u00a0fail. Now when we use Beta(25,25) this problem does not arise very often because\u00a0there is\u00a0such a small probability of picking an extreme value for u and so\u00a0we stay in the region where the elements of \u03a8\u00a0are both\u00a0between 0 and 1.<\/p>\n<p>It is more complicated, but much better to use the transformation<\/p>\n<p>Under M1<br \/>\n\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub> = invlogit(\u03a81)<br \/>\n\u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub> = invlogit(\u03a82)<\/p>\n<p>Under M2<br \/>\n\u03be = (invlogit(\u03a81)+invlogit(\u03a82))\/2<br \/>\nu = invlogit(\u03a82)<\/p>\n<p>In this set up, the elements of\u00a0\u03a8\u00a0are unconstrained and can take any value between -\u221e and \u221e and,\u00a0whatever those values, they\u00a0always transform to \u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub> and \u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub> or \u03be and u\u00a0that are between 0 and 1. As before, we need derivatives such as d\u03be\/\u03a81\u00a0and d\u03be\/\u03a82\u00a0although, in this case,\u00a0the algebra\u00a0is marginally easier\u00a0if we\u00a0calculate d\u03a81\/d\u03be and the use the relationship d\u03be\/\u03a81\u00a0= 1\/ (d\u03a81\/d\u03be).<\/p>\n<p>Let&#8217;s start with M1 and invert the transformation.<\/p>\n<p>\u03a81 = logit(\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>) = log( \u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>\/(1-\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>))<br \/>\n\u03a82 = logit(\u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub>) = log( \u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub>\/(1-\u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub>))<\/p>\n<p>So d\u03a81\/d\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub> = 1\/[\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>(1-\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>)] and inverting\u00a0this formula\u00a0we obtain d\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>\/d\u03a81 = \u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>(1-\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>) which means that\u00a0the Jacobian matrix is<\/p>\n<p>\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>(1-\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>)\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0<\/p>\n<p>0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub>(1-\u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub>)<\/p>\n<p>and the determinant is \u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>(1-\u03c0<sub><span style=\"font-size: small\">1<\/span><\/sub>)\u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub>(1-\u03c0<sub><span style=\"font-size: small\">2<\/span><\/sub>)<\/p>\n<p>Similar calculations show that the Jacobian for model 2 is \u03be(1-\u03be)u(1-u)\/2.<\/p>\n<p>Now we can write a much better RJMCMC program.<\/p>\n<p><span style=\"color: #3366ff\">program myRJMCMC <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 args logpost b ipar<\/span><\/p>\n<p><span style=\"color: #3366ff\">\u00a0\u00a0 local M = `b'[1,1]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 if( runiform() &lt; 0.5 ) {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 * update within the current model<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 if( `M&#8217; == 1 ) {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local pi1 = rbeta(13,17)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local pi2 = rbeta(26,24)<\/span><br \/>\n<span style=\"color: #ff0000\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0local psi1 = logit(`pi1&#8242;)<\/span><br \/>\n<span style=\"color: #ff0000\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = logit(`pi2&#8242;)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local xi = rbeta(44,46)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local u\u00a0 = rbeta(1,1)<\/span><br \/>\n<span style=\"color: #ff0000\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = 2*logit(`xi&#8217;)-logit(`u&#8217;)<\/span><br \/>\n<span style=\"color: #ff0000\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = logit(`u&#8217;)<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,2] = `psi1&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,3] = `psi2&#8242;<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0 * try to switch <\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi1 = `b'[1,2]<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local psi2 = `b'[1,3]<\/span><br \/>\n<span style=\"color: #ff0000\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local pi1 = invlogit(`psi1&#8242;)<\/span><br \/>\n<span style=\"color: #ff0000\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local pi2 = invlogit(`psi2&#8242;)<\/span><br \/>\n<span style=\"color: #ff0000\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local xi\u00a0 = (invlogit(`psi1&#8242;)+invlogit(`psi2&#8242;))\/2<\/span><br \/>\n<span style=\"color: #ff0000\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local u\u00a0\u00a0 = invlogit(`psi2&#8242;)<\/span><br \/>\n<span style=\"color: #ff0000\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local J1 = `pi1&#8217;*(1-`pi1&#8242;)*`pi2&#8217;*(1 -`pi2&#8242;)<\/span><br \/>\n<span style=\"color: #ff0000\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local J2 = `xi&#8217;*(1-`xi&#8217;)*`u&#8217;*(1-`u&#8217;)\/2<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local a1 = log(binomialp(20,8,`pi1&#8242;))+log(binomialp(30,16,`pi2&#8242;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0log(betaden(5,5,`pi1&#8242;))+log(betaden(10,10,`pi2&#8242;))+log(0.3)+<span style=\"color: #ff0000\">log(`J1&#8242;)<\/span><\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 local a2 = log(binomialp(20,8,`xi&#8217;))+log(binomialp(30,16,`xi&#8217;))+ \/\/\/<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0log(betaden(20,20,`xi&#8217;))+log(betaden(1,1,`u&#8217;))+log(0.7)+<span style=\"color: #ff0000\">log(`J2&#8242;)<\/span><\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if `M&#8217; == 1 {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0if log(runiform()) &lt; `a2&#8242; &#8211; `a1&#8242; \u00a0matrix `b'[1,1] = 2<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0if log(runiform()) &lt; `a1&#8242; &#8211; `a2&#8242;\u00a0 matrix `b'[1,1] = 1<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #3366ff\">end<\/span><\/p>\n<p>I ran this program for a the same range of priors on u and obtained the results.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/10\/rjmcmcbinomialu2-e1444224584833.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-942\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/10\/rjmcmcbinomialu2-e1444224584833.png\" alt=\"rjmcmcbinomialu2\" width=\"600\" height=\"436\" \/><\/a><\/p>\n<p>It is clear that the mean estimate does not depend on u but the variation between repeat runs is larger when we use Beta(1,1) because we make fewer moves between models.<\/p>\n<p>Here is a boxplot of the numbers of model switches per run of 10,000 iterations. Beta(10,10) looks best.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/10\/modelswitches-e1444224817920.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-943\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/10\/modelswitches-e1444224817920.png\" alt=\"modelswitches\" width=\"600\" height=\"436\" \/><\/a><\/p>\n<p>Before finishing I should point out that Peter Green&#8217;s original version of RJMCMC is very slightly different to the formulation that I have presented.<\/p>\n<p>Suppose that we have a large set of alternative models that can require anything between 1 and 20 parameters. We are currently in a model that requires 3 parameters and we propose a move to a model with 5 parameters. We do not need to worry about the 15 parameters that neither model requires. In other words instead of defining a full set of parameters, \u03a8, we can work with the reduced set that covers the two models that we are currently comparing. This is more efficient but it requires slightly more complicated programming.\u00a0In my opinion,\u00a0it is both easier to follow and easier to code if we always return to the full set of parameters.<\/p>\n<p>I think that we have done this example to death. Now we need to move on to more complex applications such as the flu epidemic analysis that started us on the RJMCMC\u00a0path.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the previous posting in this series on RJMCMC, I created a Stata program for the following problem. We have two binomial samples y1=8, n1=20 and y2=16, n2=30, but we are not sure whether to model them as, M1: binomials with different probabilities \u03c01 and \u03c02 M2: binomials with a common probability \u03be. For my [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[89,82],"class_list":["post-918","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-jacobian","tag-rjmcmc"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/918","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=918"}],"version-history":[{"count":11,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/918\/revisions"}],"predecessor-version":[{"id":959,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/918\/revisions\/959"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=918"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=918"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=918"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}