{"id":550,"date":"2015-03-27T08:44:13","date_gmt":"2015-03-27T08:44:13","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=550"},"modified":"2025-02-26T13:21:38","modified_gmt":"2025-02-26T13:21:38","slug":"poisson-regression-with-two-random-effects-mcmc-by-data-augmentation","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/03\/27\/poisson-regression-with-two-random-effects-mcmc-by-data-augmentation\/","title":{"rendered":"Poisson regression with two random effects: MCMC by data augmentation"},"content":{"rendered":"<p>When I restarted this blog after the Christmas break I resolved to concentrate more on straightforward applications of Bayesian analysis with Stata; I believe that is what most Stata users are interested in. A couple of months in and I am going to ignore my good intensions and discuss a topic that I considered too advanced to mention in my book, namely MCMC by data augmentation.<\/p>\n<p>The idea is simple even if its application is sometimes involved; to simplify the MCMC algorithm we look to see if there\u00a0are any extra data, that, if only we had them, would make the analysis much simpler. Then we treat those data as being missing or unobserved and we add an extra stage to the analysis. In stage 1 we simulate the unobserved data, and in stage 2 we use the simulated values of the unobserved data to solve the problem. The calculation involves two stages instead of one, but if the two stages are both\u00a0quick then we might still have a very efficient algorithm.<\/p>\n<p style=\"text-align: left\">Let\u2019s put this method into symbols; y is the original data and z is the additional, unobserved data and ultimately we want the posterior p(\u03b8|y).<br \/>\np(\u03b8|y) \u03b1 \u222b p(\u03b8|y,z) p(z|\u03b8,y) dz<br \/>\nSampling \u03b8 directly from p(\u03b8|y) may be hard but if it is easy to sample the unobserved data z from p(z|\u03b8,y) and it is easy to sample \u03b8 from p(\u03b8|y,z) then we are on to a winner.<\/p>\n<p>The addition of unobserved data is usually called data augmentation and whether or not it works well is problem dependent. Sometimes we can spot a good augmentation and sometimes we can\u2019t.<\/p>\n<p>The problem of data augmentation for generalized linear models with random effects has been well-studied and there are some quite neat augmentation algorithms for those models. In fact,\u00a0data augmentation is\u00a0the basis of\u00a0the JAGS glm module that so dramatically improved the performance of JAGS for our Poisson regression problem.<\/p>\n<p><strong>The theory<\/strong><br \/>\nThere is a well-known result concerning Poisson distributions that we will need. This states that if events occur at random with a fixed rate then the number of events in a specified time will follow a Poisson distribution and the lengths of the intervals between successive events will follow an exponential distribution.<\/p>\n<p>With the epilepsy data we know the number of seizures a subject has in a 2 week period but we do not have the exact times of the seizures. Those exact times will be our unobserved data and the unobserved times between the seizures will have an exponential distribution.<\/p>\n<p>Actually it turns out that it is easier to work with the log of the times between the seizures. So we will need the distribution of the log of an exponential variable.<\/p>\n<p><strong>Approximating the density<\/strong><br \/>\nUsing the change of variables method discussed in section 2.9 of <em>Bayesian Analysis with Stata<\/em> it is not difficult to show that if t ~ exponential(\u03bb) so that its density is p(t) = exp(-t\/\u03bb)\/\u03bb, then the density of x=log(t\/\u03bb) is p(x) = exp(x-exp(x)). As is my usual practice I have put the mathematics in a pdf called\u00a0<a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/augmentation.pdf\">augmentation<\/a> for those who want the details.<\/p>\n<p>So we will need the likelihood of the distribution p(x) = exp(x-exp(x)). One way to get this is to approximate it by a mixture of normal distributions. Individual normal distributions are easy to handle and mixtures of normal distributions are not much more complicated. Here is some Stata code that shows the density exp(x-exp(x)) and a 5 component normal mixture that approximates it quite closely.<\/p>\n<p><span style=\"color: #0000ff\">range x -10 5 1000<\/span><br \/>\n<span style=\"color: #0000ff\"> gen f = exp(x-exp(x))<\/span><br \/>\n<span style=\"color: #0000ff\"> gen y = 0.2924*normalden(x,0.0982,0.4900) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>+ 0.2599*normalden(x,-1.5320,1.0896) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>+ 0.2480*normalden(x,-0.7433,0.6150) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>+ 0.1525*normalden(x,0.8303,0.4382) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>+ 0.0472*normalden(x,-3.1428,1.7993)<\/span><br \/>\n<span style=\"color: #0000ff\"> twoway (line f x, col(red)) (line y x, col(blue)), leg(off)<\/span><\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/logexpapprox.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-567\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/logexpapprox-1024x753.png\" alt=\"logexpapprox\" width=\"620\" height=\"456\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/logexpapprox-1024x753.png 1024w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/logexpapprox-300x221.png 300w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><\/p>\n<p>The actual distribution is shown in red and the normal mixture is shown as a blue dashed line. The approximation is really very impressive and plenty good enough for an MCMC algorithm.<\/p>\n<p>The constants in the mixture distribution are taken from a technical report by Sylvia Fruhwirth-Schnatter and Helga Wagner (<a href=\"http:\/\/www.jku.at\/ifas\/content\/e108280\/e146255\/e146306\/ifasrp04_2004.pdf\">http:\/\/www.jku.at\/ifas\/content\/e108280\/e146255\/e146306\/ifasrp04_2004.pdf<\/a>). They are the\u00a0researchers who developed this method.<\/p>\n<p><strong>Simulating the unobserved data<\/strong><br \/>\nThe next step is to consider how we might simulate the unobserved times of the seizures conditional on knowing how many there were (y) and the current estimate of the rate (\u03bc).<\/p>\n<p>Let\u2019s assume that \u03bc is the expected number of events in a 2 week period and we want to express the times as a proportion of that period, i.e. as numbers between (0,1). For illustration we will consider someone who has three seizures, y=3.<\/p>\n<p>Since the events in a Poisson process are completely random we could generate the times of the three seizures by taking three uniform(0,1) random numbers e.g. 0.451, 0.325, 0.772<br \/>\nSort them in ascending order gives 0.325, 0.451, 0.772 and the intervals are<br \/>\n0.325, 0.126, 0.321<br \/>\nThe interval up to the end of the period is 0.228 but at that time we have not yet seen the next seizure so the full interval will be at least 0.228.<\/p>\n<p>We can generate exponential intervals using \u2013log(u)\/ \u03bc , where u is another uniform(0,1) variable and completely random processes have no memory so a random exponential interval that is at least 0.228 long\u00a0can be simulated using\u00a00.228- log(u)\/\u03bc. Let\u2019s suppose that we end up with four intervals<br \/>\n0.325, 0.126, 0.321, 0.630<br \/>\nThese are the simulated values of the augmented or unobserved data that I will call t.<\/p>\n<p>&nbsp;<\/p>\n<p>Let\u2019s demonstrate that t sampled in this way does follow the distribution p(\u03b5) = exp(\u03b5 -exp(\u03b5)).<\/p>\n<p><span style=\"color: #0000ff\">clear<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0tempname pf<\/span><br \/>\n<span style=\"color: #0000ff\"> postfile `pf&#8217; t using temp.dta, replace<\/span><br \/>\n<span style=\"color: #0000ff\"> local mu = 5<\/span><br \/>\n<span style=\"color: #0000ff\"> local theta = log(`mu&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\"> forvalues iter=1\/5000 {<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 local n = rpoisson(`mu&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 drop _all<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 if `n&#8217; == 0 {<\/span><br \/>\n<span style=\"color: #0000ff\"> <span style=\"color: #0000ff\">\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<\/span>\u00a0 local t = 1 &#8211; log(runiform())\/exp(`theta&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>local t = log(`t&#8217;) + `theta&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>post `pf&#8217; (`t&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>}<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0else {<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>local m = `n&#8217; + 1<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>qui set obs `m&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>qui gen u = 0 in 1<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>qui replace u = runiform() in 2\/`m&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>sort u<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>qui gen t = u[_n+1]-u<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>qui replace t = 1 &#8211; u[`m&#8217;] &#8211; log(runiform())\/exp(`theta&#8217;) in `m&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>qui replace t = log(t) + `theta&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>forvalues i=1\/`m&#8217; {<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>post `pf&#8217; (t[`i&#8217;])<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>\u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>}<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0<span style=\"color: #0000ff\">\u00a0 <\/span>}<\/span><br \/>\n<span style=\"color: #0000ff\"> }<\/span><br \/>\n<span style=\"color: #0000ff\"> postclose `pf&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\"> use temp.dta, clear<\/span><\/p>\n<p><span style=\"color: #0000ff\">histogram t , density addplot( function y = exp(x-exp(x)) , range(-5 2) lwid(0.5) lcol(red) ) leg(off)<\/span><\/p>\n<p>Again there is very close agreement between theory and simulation.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/hostlogt.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-568\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/hostlogt-1024x753.png\" alt=\"hostlogt\" width=\"620\" height=\"456\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/hostlogt-1024x753.png 1024w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/hostlogt-300x221.png 300w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p><strong>Selecting one component from the mixture<\/strong><br \/>\nSo far we have seen that for Poisson data with rate \u03bc, the intervals between events will be exponential with mean \u03bb=1\/\u03bc and<br \/>\nlog t = log\u00a0\u03bb + \u03b5 = -log \u03bc + \u03b5<br \/>\nwhere \u03b5 is a variable from the distribution p(\u03b5) = exp(\u03b5 -exp(\u03b5)) , which we can approximate with a mixture of 5 normal distributions.<\/p>\n<p>One more step. The normal mixture is made up of 5 normal distributions. Life would be much simpler if we knew which of the five components our particular \u03b5 comes from, so let\u2019s make that information part of the unobserved data. Call it r, which will be a integer between 1 and 5. Simulate r\u00a0 and then<br \/>\nlog t = -log \u03bc + e<br \/>\nwere e comes from the rth normal distribution in the mixture.<\/p>\n<p>Given all of the unobserved data, the model is just a simple normal errors regression and provided that we choose conjugate priors we can solve this very easily with a Gibbs sampler. So our problem is solved in three stages<br \/>\n1. Simulate the intervals between the events, t<br \/>\n2. Simulate the number of the component from the mixture, r<br \/>\n3. Fit the normal errors model by Gibbs sampling<\/p>\n<p>The three stages are all quick and should mix well, potentially much better that running a Metropolis-Hastings algorithm for the original problem.<\/p>\n<p>There are a couple of points that I have jumped over, such as how to generate r and how to write the Gibbs sampler but those too are simple and\u00a0the first is\u00a0described in the pdf <a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/augmentation.pdf\">augmentation<\/a>\u00a0and the second will be the subject of my next posting.<\/p>\n<p>Enough for one week;\u00a0after Easter\u00a0I will program this algorithm in Mata and see how it performs when we analyse the epilepsy data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When I restarted this blog after the Christmas break I resolved to concentrate more on straightforward applications of Bayesian analysis with Stata; I believe that is what most Stata users are interested in. A couple of months in and I am going to ignore my good intensions and discuss a topic that I considered too [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[55,22,49,47],"class_list":["post-550","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-data-augmentation","tag-mcmc","tag-poisson-regression","tag-random-effects"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/550","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=550"}],"version-history":[{"count":14,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/550\/revisions"}],"predecessor-version":[{"id":582,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/550\/revisions\/582"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=550"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=550"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=550"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}