{"id":878,"date":"2015-10-02T08:32:24","date_gmt":"2015-10-02T08:32:24","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=878"},"modified":"2025-02-26T13:21:37","modified_gmt":"2025-02-26T13:21:37","slug":"reversible-jump-mcmc","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/10\/02\/reversible-jump-mcmc\/","title":{"rendered":"Reversible Jump MCMC"},"content":{"rendered":"<p>I want to introduce RJMCMC and demonstrate how it can be programmed in Stata. To understand this algorithm, you really do need to know a little about the basis of Metropolis-Hastings, so this week I will run through the ideas behind the equations. A bit of algebra I am afraid, but once it is out of the way, we can return to practical applications.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/09\/04\/google-flu-trends-part-ii\/\">A few weeks ago<\/a>\u00a0I looked at the Google Flu Trends data and fitted a Bayesian model that identifies the start and end of the annual flu epidemic. A limitation of that model is that it must have exactly one epidemic every year. This is unrealistic as there is a clear suggestion that in some years we do not see a flu epidemic at all and in\u00a0other years there appear to be two distinct outbreaks.<\/p>\n<p>These patterns can be seen in the data for Austria.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-882\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/09\/austria-e1442936882298.png\" alt=\"austria\" width=\"600\" height=\"439\" \/><\/p>\n<p>When\u00a0we allowed\u00a0exactly one epidemic per year, the Poisson model for the number of cases per week had an expected number, mu, where,<br \/>\n<span style=\"color: #3366ff\">log(mu) = a + b time\/1000 + c sin(0.060415*week)+\u03a3 H<sub>i<\/sub>\u03b4(year==i &amp; week&gt;=S<sub>i<\/sub> &amp; week&lt;=E<sub>i<\/sub>)<\/span><br \/>\nand\u00a0(H<sub>i<\/sub>,S<sub>i<\/sub>,E<sub>i<\/sub>) represent the height, start and end of the single epidemic in year i\u00a0and we sum over the different years.<\/p>\n<p>What I want to do now is to develop a model that it is able to allow the number of epidemics\u00a0per year to vary. Suppose that we were to generalize so that\u00a0there could be either 0, 1 or 2 epidemics\u00a0in any\u00a0year.\u00a0We have 12 years of data from Austria, so there will be\u00a03<sup>12<\/sup>=531,441\u00a0possible models each with a unique combination of\u00a00, 1 or 2\u00a0epidemics\u00a0in the different years.<\/p>\n<p>Since we cannot be certain which of the 531,441 models will apply, we need to develop a Bayesian analysis that will select the most appropriate\u00a0model as part of the fitting process and this means that our MCMC chain must move freely between the models.\u00a0The main obstacle\u00a0to developing such an algorithm\u00a0is that, as we move\u00a0from one\u00a0model to another, so the number of parameters will change with the number of epidemics.\u00a0We will\u00a0need to develop a version of the Metropolis-Hastings algorithm that can cope with this problem.<\/p>\n<p>The\u00a0most commonly used\u00a0algorithm for this type of Bayesian model selection is called reversible jump Markov chain Monte Carlo or RJMCMC.<\/p>\n<p>This week I want to outline the theory behind\u00a0RJMCMC and then next week I&#8217;ll take a simple\u00a0example and program it in Stata. Eventually, I want to return to the Austrian flu data and see if I can produce a Stata program to fit a Bayesian model to those data. I must admit that I have not tried this yet and I suspect that selection from half-a-million possible models might prove quite a challenge.<\/p>\n<p>When we allow up\u00a0to 2 epidemics per year, the Poisson model will have three parameters that control the pattern outside of an epidemic,\u00a0namely, the intercept, the coefficient of the linear time trend and the coefficient of the sine term. On top of these, there will be a further three parameters for each epidemic (start, end and height). So, in the extreme case, if there were 2 epidemics in every one of the 12 years, we would require 3+12*2*3=75 parameters. Let\u2019s imagine a vector \u03a8 that contains all 75.<\/p>\n<p>For a particular model we might have one epidemic during most years but occasionally no epidemic or two epidemics, so often we will need some of the 75 parameters and not others. Let\u2019s suppose that for model M<sub>k<\/sub> we will\u00a0need parameters \u03b8<sub>k<\/sub> and we\u00a0will not need parameters u<sub>k<\/sub>, where \u03b8<sub>k<\/sub> and u<sub>k<\/sub> together make up \u03a8.<\/p>\n<p>Now imagine that the chain is currently in Model k\u00a0 and\u00a0we are considering\u00a0a move\u00a0to Model j,\u00a0i.e. from parameters\u00a0(\u03b8<sub>k<\/sub>,u<sub>k<\/sub>) to (\u03b8<sub>j<\/sub>,u<sub>j<\/sub>).<\/p>\n<p>Our aim is to generate a chain\u00a0in which\u00a0we visit model M<sub>k<\/sub> with the posterior probability P(M<sub>k<\/sub>|Y,\u03a8), where \u03a8 is the current value of the parameters. Averaging over the whole chain will be like integrating over \u03a8 and will mean that\u00a0the chain is\u00a0in model k with probability P(M<sub>k<\/sub>|Y).<\/p>\n<p>If the chain really does have model probabilities P(M<sub>k<\/sub>|Y,\u03a8) and it is constructed by movements between models that have\u00a0transition probabilities such as P(move M<sub>j<\/sub> to M<sub>k<\/sub>) then for the chain to stay converged we will require that,<br \/>\n<span style=\"color: #3366ff\">Sum<sub>j<\/sub> P(M<sub>j<\/sub>|Y,\u03a8) P(move M<sub>j<\/sub> to M<sub>k<\/sub>) = P(M<sub><span style=\"font-size: small\">k<\/span><\/sub>|Y,\u03a8)<\/span><br \/>\nWhich, in words, means that the\u00a0total chance\u00a0of moving into Model k must be equal to the probability of model k.<\/p>\n<p>One way to ensure that this property holds is to have <strong>detailed balance<\/strong>, which says that,\u00a0the probability of being in Model k and moving to Model j must equal the chance of being in Model j and moving to Model k.<br \/>\n<span style=\"color: #3366ff\">P( M<sub>k<\/sub>|Y, \u03a8) P(move M<sub>k<\/sub> to M<sub>j<\/sub>) = P( M<sub>j<\/sub>|Y, \u03a8) P(move M<sub>j<\/sub> to M<sub>k<\/sub>)<\/span><br \/>\nIf detailed balance holds then,<br \/>\n<span style=\"color: #3366ff\">Sum<sub>j<\/sub> P(M<sub>j<\/sub>|Y,\u03a8) P(move M<sub>j<\/sub> to M<sub>k<\/sub>) = Sum<sub>j<\/sub> P( M<sub>k<\/sub>|Y, \u03a8) P(move M<sub>k<\/sub> to M<sub>j<\/sub>)<\/span><br \/>\nand as the sum of the probabilities of all of the allowed moves from Model k must be one,<br \/>\n<span style=\"color: #3366ff\">Sum<sub><span style=\"font-size: small\">j<\/span><\/sub> P(M<sub><span style=\"font-size: small\">j<\/span><\/sub>|Y,\u03a8) P(move M<sub><span style=\"font-size: small\">j<\/span><\/sub> to M<sub><span style=\"font-size: small\">k<\/span><\/sub>) = P( M<sub>k<\/sub>|Y, \u03a8) Sum<sub>j<\/sub> P(move M<sub>k<\/sub> to M<sub>j<\/sub>) = P(M<sub>k<\/sub>|Y, \u03a8)<\/span><br \/>\nwhich is just what we need.<\/p>\n<p>In Metropolis-Hastings algorithms, movement between models is a two stage process made out of a proposal and its acceptance or rejection. So detailed balance becomes,<br \/>\n<span style=\"color: #3366ff\">P( M<sub>k<\/sub>|Y, \u03a8) g(M<sub>k<\/sub> to M<sub>j<\/sub>) a(M<sub>k<\/sub> to M<sub>j<\/sub>) = P( M<sub>j<\/sub>|Y, \u03a8) g(M<sub>j<\/sub> to M<sub>k<\/sub>) a(M<sub>j<\/sub> to M<sub>k<\/sub>)<\/span><br \/>\nwhere I have used g to represent the proposal probability and a to represent the acceptance probability.<\/p>\n<p>This tells us that, at any point in the chain, the ratio of the acceptance probabilities a(M<sub>k<\/sub> to M<sub>j<\/sub>)\/ a(M<sub>j<\/sub> to M<sub>k<\/sub>) must equal<br \/>\n<span style=\"color: #3366ff\">P(M<sub>j<\/sub>|Y, \u03a8) g(M<sub>j<\/sub> to M<sub>k<\/sub>) \/ P(M<sub>k<\/sub>|Y, \u03a8) g(M<sub>k<\/sub> to M<sub>j<\/sub>)<\/span><\/p>\n<p>In essence this is the Metropolis-Hastings algorithm for movement between Models. We calculate this ratio based on the\u00a0posterior\u00a0P(M<sub>k<\/sub>|Y, \u03a8) and our personal choice of movement probabilities g(). Then any pair of acceptance probabilities that have this ratio will create a chain that converges to the correct posterior, so such a chain will\u00a0select\u00a0the different models with frequencies that are proportional to the posterior model probabilities.<\/p>\n<p>Since we want to encourage moves in order to obtain good mixing, it makes sense to make the larger of the two acceptance probabilities equal to one. For instance, if the larger is a(M<sub>j<\/sub> to M<sub>k<\/sub>) then we set it to\u00a01 and make,<br \/>\n<span style=\"color: #3366ff\">a(M<sub>k<\/sub> to M<sub>j<\/sub>) = P(M<sub>j<\/sub>|Y, \u03a8) g(M<sub>j<\/sub> to M<sub>k<\/sub>) \/ P(M<sub>k<\/sub>|Y, \u03a8) g(M<sub>k<\/sub> to M<sub>j<\/sub>)<\/span><br \/>\nand if a(M<sub>k<\/sub> to M<sub>j<\/sub>) is the larger then we set it to 1 and make,<br \/>\n<span style=\"color: #3366ff\">a(Mj to M<sub>k<\/sub>) = P(M<sub>k<\/sub>|Y, \u03a8) g(M<sub>k<\/sub> to M<sub>j<\/sub>) \/ P(M<sub>j<\/sub>|Y, \u03a8) g(M<sub>j<\/sub> to M<sub>k<\/sub>)<\/span><\/p>\n<p>This can be\u00a0summarised as<br \/>\n<span style=\"color: #3366ff\">a(M<sub>j<\/sub> to M<sub>k<\/sub>) = min { P( M<sub>k<\/sub>|Y, \u03a8) g(M<sub>k<\/sub> to Mj) \/ P( M<sub>j<\/sub>|Y, \u03a8) g(M<sub>j<\/sub> to M<sub>k<\/sub>) , 1}<\/span><br \/>\nwhich is how people usually write the M-H acceptance probability.<\/p>\n<p>All we need to do now is to decide how to calculate P(M<sub>k<\/sub>|Y, \u03a8). Bayes theorem tells us that<br \/>\n<span style=\"color: #3366ff\">P(M<sub>k<\/sub>|Y,\u03a8) \u03b1 P(Y|\u03a8,M<sub>k<\/sub>) P(M<sub>k<\/sub>|\u03a8) \u03b1 P(Y|\u03a8,M<sub>k<\/sub>) P(\u03a8|M<sub>k<\/sub>) P(M<sub>k<\/sub>)<\/span><\/p>\n<p>So<br \/>\n<span style=\"color: #3366ff\">a(M<sub>j<\/sub> to M<sub>k<\/sub>) = min { P(Y|\u03a8,M<sub>k<\/sub>) P(\u03a8|M<sub>k<\/sub>) P(M<sub>k<\/sub>) g(M<sub>k<\/sub> to M<sub>j<\/sub>) \/ P(Y|\u03a8,M<sub>j<\/sub>) P(\u03a8|M<sub>j<\/sub>) P(M<sub>j<\/sub>) g(M<sub>j<\/sub> to M<sub>k<\/sub>) , 1}<\/span><\/p>\n<p>This looks like a standard MH algorithm. So where is the problem? In fact there\u00a0three hidden issues.<\/p>\n<p>(a) we do not use \u03a8 in our model specification but a subset of \u03a8. Remember that \u03a8=(\u0398,u) and the split will not be the same for M<sub>k<\/sub> and M<sub>j<\/sub>.<\/p>\n<p>(b) the algebraic forms of the models M<sub>k<\/sub> and M<sub>j<\/sub> may be different (e.g. a gamma and a log-normal), in which case the constants in the formulae may not cancel as they do\u00a0in most MH calculations.<\/p>\n<p>(c) Bayes theorem applies to probabilities, P, but our models are usually expressed as probability densities, f. Since the area under f defines probability, you might say that P(\u03a8|M<sub>k<\/sub>) = f(\u03a8|M<sub>k<\/sub>) d\u03a8 where d\u03a8 defines a tiny region with the same dimension as \u03a8. So the acceptance probabilities ought to be written as<\/p>\n<p><span style=\"color: #3366ff\">a(M<sub>j<\/sub> to M<sub>k<\/sub>) = min { f(Y|\u03a8,M<sub>k<\/sub>)dY f(\u03a8|M<sub>k<\/sub>) d\u03a8 P(M<sub>k<\/sub>) g(M<sub>k<\/sub> to M<sub>j<\/sub>) \/ f(Y|\u03a8,M<sub>j<\/sub>)dY f(\u03a8|M<sub>j<\/sub>) d\u03a8 P(M<sub>j<\/sub>) g(M<sub>j<\/sub> to M<sub>k<\/sub>) , 1}<\/span><\/p>\n<p>Notice that there is no small increment associated with M<sub>k<\/sub> because the models are discrete and are described directly by probabilities and the\u00a0delta for Y will cancel as the data are the same whichever model we use.<\/p>\n<p>The reason that we do not usually bother with the d\u03a8 is that, in many circumstances, it too cancels. Unfortunately we have to be very careful over this cancellation when the parameterizations of the two model are different.\u00a0In that\u00a0case, the ratio of the deltas\u00a0becomes the Jacobian of the transformation from one set of parameters to the other.<\/p>\n<p>We will sort out the specifics of this calculation\u00a0when we consider particular examples.<\/p>\n<p>One question that we have not yet addressed is how we choose the moves; a\u00a0choice that is equivalent to defining the function g(). As with any MH algorithm this choice is arbitrary provided that a possible\u00a0move from M<sub>k<\/sub> to M<sub>j<\/sub> is matched by a possible move from M<sub>j<\/sub> to M<sub>k<\/sub> and we do not allow cul-de-sacs from where it is impossible to reach some of the other models. So the algorithm will work for a more or less\u00a0arbitrary\u00a0g() but some choices will lead to very poorly mixing chains in which we stay with the same model for a long time before we switch, while other types of proposals will lead to frequent moves and better mixing. Part of the skill of using RJMCMC is to come up with an efficient\u00a0strategy for proposing moves between models.<\/p>\n<p>In\u00a0the case of the epidemic models,\u00a0we could have a scheme that proposes a\u00a0move from M<sub>k<\/sub> to\u00a0a random choice among the\u00a0533,540\u00a0possible alternative models\u00a0but such an algorithm would be incredibly difficult to program and would lead to a lot of rejections, so it would be hugely inefficient. An algorithm in which we make local moves is likely to perform much better. Perhaps, at any stage in the chain, we could select a year at random and either propose that we add\u00a0one epidemic or drop\u00a0one epidemic. Once again this is best illustrated in the context of a specific example.<\/p>\n<p>So next week I will take an example, fill in the details and then program the algorithm in Stata.<\/p>\n<p>I&#8217;ll finish this week with a reference. The original\u00a0idea of\u00a0RJMCMC was developed by Peter Green in 1995 (<em>Biometrika<\/em>, <em>82<\/em>(4), 711-732). That paper is surprisingly readable for <em>Biometrika<\/em> but\u00a0it is probably\u00a0not the best introduction to the topic.\u00a0My suggestion is\u00a0to\u00a0start with\u00a0the\u00a0simplified\u00a0approach described\u00a0in,<\/p>\n<p>Barker, R. J., &amp; Link, W. A. (2013).<br \/>\nBayesian multimodel inference by RJMCMC: a Gibbs sampling approach.<br \/>\n<em>The American Statistician<\/em>, <em>67<\/em>(3), 150-156.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I want to introduce RJMCMC and demonstrate how it can be programmed in Stata. To understand this algorithm, you really do need to know a little about the basis of Metropolis-Hastings, so this week I will run through the ideas behind the equations. A bit of algebra I am afraid, but once it is out [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[85,80,40,84,82],"class_list":["post-878","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-detailed-balance","tag-google-flu-trends","tag-metropolis-hastings","tag-reversible-jump-mcmc","tag-rjmcmc"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/878","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=878"}],"version-history":[{"count":17,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/878\/revisions"}],"predecessor-version":[{"id":913,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/878\/revisions\/913"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=878"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=878"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=878"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}