{"id":532,"date":"2015-03-20T09:16:46","date_gmt":"2015-03-20T09:16:46","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=532"},"modified":"2025-02-26T13:21:38","modified_gmt":"2025-02-26T13:21:38","slug":"poisson-regression-with-two-random-effects-better-mata-code","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/03\/20\/poisson-regression-with-two-random-effects-better-mata-code\/","title":{"rendered":"Poisson regression with two random effects: Better Mata code"},"content":{"rendered":"<p>Last time I presented a basic Mata program for fitting a Poisson model with two random effects to\u00a0some epilepsy data. The program was faster that its Stata equivalent but slower than WinBUGS and the mixing was quite poor. This week I will try to improve the performance of the Mata program. It would make sense to read <a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/03\/13\/poisson-regression-with-two-random-effects-mata\/\">last week&#8217;s posting<\/a> before trying to follow the development of the more efficient code.<\/p>\n<p><strong>Step 1: Gibbs sampling<\/strong><br \/>\nThere are 303 parameters in the model and only two of them have recognisable forms for their conditional posterior distributions and those\u00a0are the precisions on the two random effects. The gamma priors are conjugate, so the updates can be generated by sampling from a gamma distribution. I try to avoid equations in this blog so I have added the derivation of the Gibbs samplers to the pdf that I posted last week that contained the equations for this model. The updated pdf can be downloaded from here, <a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/epilepsyequationsv2.pdf\">epilepsyequationsv2<\/a>.<\/p>\n<p>Of more interest to me is the method that you have to use to add these Gibbs sampling steps into the algorithm. It involves writing the Mata code to create your own sampler and a modified call to mcmcrun. Since all improvements to the basic Metropolis-Hastings method\u00a0are likely to require the user to write their own samplers, I will look at this process in detail.<\/p>\n<p>Let&#8217;s start by writing the sampler in Mata. The key is to get the right arguments in the right order. We need,<br \/>\n<strong>real scalar logp:<\/strong> logp is the current value of the log-posterior, which if known saves you from having to re-calculate it. Often missing but even if it is missing, we still need to\u00a0refer to\u00a0it.<br \/>\n<strong>pointer(function) logpost:<\/strong> logpost is the name of a Mata function that calculates the value of the log-posterior. Not needed for a Gibbs sampler that samples from a recognised distribution, but we must still place a dummy function name in this place.<br \/>\n<strong>real matrix X:<\/strong> X is the data matrix usually created by the data option of the mcmcrun command<br \/>\n<strong>real rowvector t:<\/strong> t is the full vector of parameter values that the sampler will update<br \/>\n<strong>real scalar p:<\/strong> p is the number of the parameter that is to be updated, i.e. the position within the rowvector t<\/p>\n<p>Of course, the internal names for the arguments are arbitrary.<\/p>\n<p>The sampler\u00a0will update the contents of the rowvector t so does not need to return any values directly;\u00a0consequently it is declared to be void. Putting this all together we\u00a0can define the function.<\/p>\n<p><span style=\"color: #0000ff\">void mygibbs(real scalar logp,pointer(function) dummy,real matrix X,real rowvector t,real scalar p)<\/span><br \/>\n<span style=\"color: #0000ff\"> {<\/span><br \/>\n<span style=\"color: #0000ff\"> if( p == 7 ) t[7] = rgamma(1,1,37.5,1\/(2+0.5*t[|9\\67|]*t[|9\\67|]&#8217;))<\/span><br \/>\n<span style=\"color: #0000ff\"> else if( p == 8 ) t[8] = rgamma(1,1,126,1\/(2+0.5*t[|68\\303|]*t[|68\\303|]&#8217;))<\/span><br \/>\n<span style=\"color: #0000ff\"> }<\/span><\/p>\n<p>This function needs to be placed alongside the function logcpost in the Mata block of our Stata do file. Then we can tell mcmcrun to use it with the command,<\/p>\n<p><span style=\"color: #0000ff\">mcmcrun logcpost X theta using mataepilepsy.csv , \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 param(cons base trt bxt age v4 tauU tauE u1-u59 e1-e236) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 samp( 6(mhsnorm , sd(s)) 2(mygibbs ) 295(mhsnorm , sd(s) ) ) adapt \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 burn(5000) update(50000) thin(5) replace \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 data(X=(y_1 y_2 y_3 y_4 cLB cTrt cBXT clnAge) theta s) mata<\/span><\/p>\n<p>The only change comes in the samplers option in which we point to mygibbs for updating the two precisions rather than the Metropolis-Hastings\u00a0sampler mhslogn that we used previously.<\/p>\n<p>This change makes very little difference to the run time; it still takes just over 3 minutes on my desktop. It does improve mixing, but only marginally since the precisions mixed well when we used Metropolis-Hastings updates.<\/p>\n<p><strong>Step 2: Placing the random effects in vectors<\/strong><br \/>\nThe actual values of the random effects are rarely of any interest in themselves and usually there is little\u00a0benefit in saving them in the results file. When we programmed the problem in <a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/02\/06\/poisson-regression-with-two-random-effects\/\">Stata<\/a> we took advantage of this fact and placed the values of the random effects in 5 Stata variables. This meant that we did not need to unpack them one at a time from the row vector of parameter values and we could use vectors in the calculations, which are much quicker than loops.<\/p>\n<p>We can do much the same thing in Mata by adding the values of the random effects as columns in the data matrix X.<\/p>\n<p>Before we write the Mata code, let&#8217;s see how this change affects the Stata code that sets up the problem.<\/p>\n<p><span style=\"color: #0000ff\">gen u = rnormal(0,0.5)<\/span><br \/>\n<span style=\"color: #0000ff\"> gen e1 = rnormal(0,0.5)<\/span><br \/>\n<span style=\"color: #0000ff\"> gen e2 = rnormal(0,0.5)<\/span><br \/>\n<span style=\"color: #0000ff\"> gen e3 = rnormal(0,0.5)<\/span><br \/>\n<span style=\"color: #0000ff\"> gen e4 = rnormal(0,0.5)<\/span><\/p>\n<p><span style=\"color: #0000ff\">matrix theta = J(1,8,0)<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix theta[1,1] = 2.6<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix theta[1,2] = 1<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix theta[1,7] = 4<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix theta[1,8] = 4<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix s = J(1,6,0.1)<\/span><\/p>\n<p><span style=\"color: #0000ff\">mcmcrun logcpost X theta using mataepilepsy.csv , \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 param(cons base trt bxt age v4 tauU tauE ) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 samp( 6(mhsnorm , sd(s)) 2(mygibbs ) ) adapt \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 burn(5000) update(50000) thin(5) replace \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 data(X=(y_1 y_2 y_3 y_4 cLB cTrt cBXT clnAge u e1 e2 e3 e4) theta s) mata<\/span><\/p>\n<p>Initial values for the random effects are placed in 5 Stata variables, u, e1, e2, e3, e4 and these are added to the definition of X in the data option. Thus, for example, u\u00a0will be stored in column 9 of X. The row vector of parameter values, theta, is shortened to 8 values. Of course, we still have to update the values of random effects even if we do not refer to them explicitly. I will update them, on the quiet, in my Mata function\u00a0mygibbs.<\/p>\n<p><span style=\"color: #0000ff\">void mygibbs(real scalar logp,pointer(function) logpost,real matrix X,real rowvector t,real scalar p)<\/span><br \/>\n<span style=\"color: #0000ff\"> {<\/span><br \/>\n<span style=\"color: #0000ff\"> real scalar i, j, k<\/span><br \/>\n<span style=\"color: #0000ff\"> real colvector eta, eta1, eta2, eta3, eta4, oldlogp, newlogp, u, newt<\/span><\/p>\n<p><span style=\"color: #0000ff\">eta = t[1]:+t[2]:*X[,5]:+t[3]:*X[,6]:+t[4]:*X[,7]:+t[5]:*X[,8]:+X[,9]<\/span><br \/>\n<span style=\"color: #0000ff\"> if( p == 7 ) {<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0eta1 = eta + X[,10]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 eta2 = eta + X[,11]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 eta3 = eta + X[,12]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 eta4 = eta + X[,13] :+ t[6]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 oldlogp = -exp(eta1) + X[,1]:*eta1 -exp(eta2) + X[,2]:*eta2 -exp(eta3) + X[,3]:*eta3 -exp(eta4) + X[,4]:*eta4 -0.5*t[7]*X[,9]:*X[,9]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 u = rnormal(59,1,0,0.5)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 newt = u + X[,9]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 eta1 = eta1 + u<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 eta2 = eta2 + u<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 eta3 = eta3 + u<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 eta4 = eta4 + u<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 newlogp = -exp(eta1) + X[,1]:*eta1 -exp(eta2) + X[,2]:*eta2 -exp(eta3) + X[,3]:*eta3 -exp(eta4) + X[,4]:*eta4 -0.5*t[7]*newt:*newt<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 u = log(runiform(59,1))<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 for(i=1;i&lt;=59;i++) {<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 if( u[i] &lt; (newlogp[i]-oldlogp[i]) ) X[i,9] = newt[i]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 }<\/span><\/p>\n<p><span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 t[7] = rgamma(1,1,37.5,1\/(2+0.5*X[,9]&#8217;*X[,9]))<\/span><br \/>\n<span style=\"color: #0000ff\"> }<\/span><br \/>\n<span style=\"color: #0000ff\"> else if( p == 8 ) {<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 for( j=1;j&lt;=4;j++) {<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 k = j + 9<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 eta1 = eta + X[,k]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 if( j == 4 ) eta1 = eta1 :+ t[6]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 oldlogp = -exp(eta1) + X[,j]:*eta1 -0.5*t[8]*X[,k]:*X[,k]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 u = rnormal(59,1,0,0.5)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 newt = X[,k] + u<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 eta1 = eta1 + u<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 newlogp = -exp(eta1) + X[,j]:*eta1 -0.5*t[8]*newt:*newt<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 u = log(runiform(59,1))<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 for(i=1;i&lt;=59;i++) {<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 if( u[i] &lt; (newlogp[i]-oldlogp[i]) ) X[i,k] = newt[i]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0 t[8] = rgamma(1,1,126,1\/(2+0.5*(X[,10]&#8217;*X[,10]+X[,11]&#8217;*X[,11]+X[,12]&#8217;*X[,12]+X[,13]&#8217;*X[,13])))<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">}<\/span><br \/>\n<span style=\"color: #0000ff\"> }<\/span><br \/>\nYou will see that when the function mygibbs is called to update the precision of the subject-specific\u00a0random effect, we also update the individual values of that random effect and when it is called to update the precision of the within subject\u00a0random effects, we also update the values of those random effects.<\/p>\n<p>The code for updating the values of the random effects is simply a Metropolis-Hastings algorithm programmed to refer to the values in the data matrix columns 9 to 13.<\/p>\n<p>This code is quick; 55,000 iterations in\u00a037 seconds compared with over 3 minutes for the previous Mata program. A few weeks ago we found that the <a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/02\/20\/poisson-regression-with-two-random-effects-winbugs-openbugs-or-jags\/\">BUGS-like programs take about 2 minutes<\/a>. The problem that remains is the mixing, which is still not as good as you get with OpenBUGS.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/epilepsytracemata2a.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-551\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/epilepsytracemata2a.png\" alt=\"epilepsytracemata2a\" width=\"538\" height=\"394\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/epilepsytracemata2a.png 538w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/03\/epilepsytracemata2a-300x220.png 300w\" sizes=\"auto, (max-width: 538px) 100vw, 538px\" \/><\/a><\/p>\n<p><strong>Step 3: skip the dots.<\/strong><br \/>\nBy default mcmcrun shows a dot on the screen after every 250 iterations. There is a dots option to mcmcrun that enables you to alter this frequency and if we set the frequency to a negative number then no dots are displayed. Sometimes this will make a difference to the run time because displaying to the screen is a relatively slow process and the calculation stops while this happens. However, the time saved seems to be very computer dependent . On\u00a0my work computer, dropping the dots reduced the run time from\u00a037 seconds to\u00a026 seconds, while when I ran the program on my home computer the time reduced from 22 seconds to 18 seconds.<\/p>\n<p><strong>Step 4: block updating<\/strong><\/p>\n<p>Block updating has two advantages; the most important is that, if we allow for the correlation between parameters, it ought to improve the mixing. The secondary advantage is that block updating is usually slightly quicker because we are not required to repeat very similar calculations.<\/p>\n<p>The key to efficient block updating by Metropolis-Hastings is to have a good estimate of the covariance matrix of the parameters. To obtain this I extracted the covariance matrix from an earlier analysis.<\/p>\n<p><span style=\"color: #0000ff\">insheet using mataepilepsy.csv, clear case<\/span><br \/>\n<span style=\"color: #0000ff\"> mcmcstats cons base trt bxt age v4, cov<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix V = r(V)<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix L = cholesky(V)\/4<\/span><\/p>\n<p>We need to supply the cholesky decomposition of the matrix to save the block updater from having to calculate it over and over again. By trial and error I discovered that the cholesky decomposition of V gave slightly too large proposals and so I divided by 4.<\/p>\n<p>Now we just use L in the call to mcmcrun; of course L needs to be passed to Mata in the data option as well as being referred to in the sampler.<\/p>\n<p><span style=\"color: #0000ff\">mcmcrun logcpost X theta using mataepilepsy4.csv , \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> param(cons base trt bxt age v4 tauU tauE ) dots(-1) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> samp( (mhsmnorm , chol(L)) 2(mygibbs ) ) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> burn(5000) update(50000) thin(5) replace \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> data(X=(y_1 y_2 y_3 y_4 cLB cTrt cBXT clnAge u e1 e2 e3 e4) theta L) mata<\/span><\/p>\n<p>Because L is a 6&#215;6 matrix, mcmcrun is able to work out that you want mhsmnorm to update six parameters.<\/p>\n<p>As it turns out that the gain is speed is more impressive that the improvement in mixing. The run time drops from 26 seconds to 14 seconds but the mixing is still slightly worse than we get with WinBUGS and much worse than OpenBUGS or JAGS with the glm module.<\/p>\n<p><strong>Step 5: What next?<\/strong><br \/>\nI think that we have pushed Metropolis-Hastings sampling as far as we can. Since the program only takes 14 seconds to run compared with about 2 minutes for OpenBUGS we could just run the chain for much longer and so get comparable accuracy without improving the mixing.<\/p>\n<p>The other option is to look to replace the Metropolis-Hastings algorithm with a better sampler. Generalized linear models with random effects are so commonly used that the question of improving the MCMC algorithms for this type of model has been given a lot of consideration in the literature. Next time I will consider the use of a data augmentation algorithm that I believe is similar to the method used in the JAGS glm module. As I write, I have not actually tried it yet, but I expect it to do quite well.<\/p>\n<p>The data and code referred to in this posting can be downloaded from my home page at <a href=\"https:\/\/www2.le.ac.uk\/Members\/trj\">https:\/\/www2.le.ac.uk\/Members\/trj<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last time I presented a basic Mata program for fitting a Poisson model with two random effects to\u00a0some epilepsy data. The program was faster that its Stata equivalent but slower than WinBUGS and the mixing was quite poor. This week I will try to improve the performance of the Mata program. It would make sense [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[16,49,4],"class_list":["post-532","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-mata","tag-poisson-regression","tag-stata"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/532","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=532"}],"version-history":[{"count":12,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/532\/revisions"}],"predecessor-version":[{"id":566,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/532\/revisions\/566"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=532"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=532"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=532"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}