{"id":453,"date":"2015-02-13T09:48:09","date_gmt":"2015-02-13T09:48:09","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=453"},"modified":"2025-02-26T13:21:38","modified_gmt":"2025-02-26T13:21:38","slug":"poisson-regression-with-two-random-effects-improving-performance","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/02\/13\/poisson-regression-with-two-random-effects-improving-performance\/","title":{"rendered":"Poisson regression with two random effects: Improving performance"},"content":{"rendered":"<p>&nbsp;<\/p>\n<p>Last time I analysed the epilesy data taken from\u00a0Thall and Vail(1990). The model involved a Poisson regression with two random effects and I showed how this can be programmed in Stata by storing the values of the random effects in Stata variables rather than in a Stata matrix. However, the resulting do file was not very efficient and on my desktop computer it took a rather unrealistic 42 minutes to create 55,000 updates. What is worse, the mixing was quite poor so even more updates would have been necessary to produce reliable estimates.<\/p>\n<p>This week I want to address the two issues of speed and mixing, although to understand the changes you will first need to read <a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/02\/06\/poisson-regression-with-two-random-effects\/\">last week\u2019s posting<\/a>.<\/p>\n<p><strong>Improving the speed<\/strong><\/p>\n<p>First let us make the program more efficient. The do file\u00a0is slow to run because I used the program\u00a0<strong>logdensity<\/strong> to evaluate the log-densities of the distributions that make up the posterior.\u00a0Use of logdensity simplifies\u00a0the\u00a0coding but it evaluates terms in the densities that are constant and which could\u00a0have been\u00a0dropped from the calculation without altering the posterior. What is more,\u00a0logdensity repeatedly checks the inputs and so carries a lot overheads that slow down the program.<\/p>\n<p>In our model we have a Poisson distribution in the likelihood, normal distributions over the random effects and normal and gamma priors. These distributions are simple to program so we ought to be able to speed up the calculations. In models with a few parameters, I\u2019ve found that programming the densities directly can halve the run time over\u00a0programs that use\u00a0logdensity; in this model there are 303 parameters so the saving ought to be even more dramatic.<\/p>\n<p>Below is a re-written version of the logpost program for the epilepsy model that does not call logdensity. It should be compared with the version that we used last week.<\/p>\n<p><span style=\"color: #0000ff\">program logpost<\/span><br \/>\n<span style=\"color: #0000ff\"> args logp b<\/span><\/p>\n<p><span style=\"color: #0000ff\">\u00a0\u00a0\u00a0 local cons = `b'[1,1]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 local base = `b'[1,2]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 local trt = `b'[1,3]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 local bxt = `b'[1,4]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 local age = `b'[1,5]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 local v4 = `b'[1,6]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 local tU = `b'[1,7]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 local tE = `b'[1,8]<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 scalar `logp&#8217; = 0<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0 tempvar eta mu1 mu2 mu3 mu4 LL v<\/span><br \/>\n<span style=\"color: #0000ff\"> * log-likelihood of y_1 to y_4<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 gen `eta&#8217; = `cons&#8217; + `base&#8217;*cLB + `trt&#8217;*cTrt + `bxt&#8217;*cBXT + `age&#8217;*clnAge<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 gen `mu1&#8242; = exp(`eta&#8217; + U + E1)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 gen `mu2&#8242; = exp(`eta&#8217; + U + E2)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 gen `mu3&#8242; = exp(`eta&#8217; + U + E3)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 gen `mu4&#8242; = exp(`eta&#8217; + `v4&#8242; + U + E4)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 gen `v&#8217; = -`mu1&#8242; + y_1*log(`mu1&#8242;) -`mu2&#8242; + y_2*log(`mu2&#8242;) + \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 -`mu3&#8242; + y_3*log(`mu3&#8242;) -`mu4&#8242; + y_4*log(`mu4&#8242;)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 qui su `v&#8217; , meanonly<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 scalar `logp&#8217; = `logp&#8217; + r(sum)<\/span><br \/>\n<span style=\"color: #0000ff\"> * log-density of the random effects<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 qui replace `v&#8217; = -0.5*`tE&#8217;*(E1*E1 + E1*E2 + E3*E3 + E4*E4) &#8211; 0.5*`tU&#8217;*U*U \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 + 2*log(`tE&#8217;) + 0.5*log(`tU&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 qui su `v&#8217; , meanonly<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 scalar `logp&#8217; = `logp&#8217; + r(sum)<\/span><br \/>\n<span style=\"color: #0000ff\"> * priors<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 scalar `logp&#8217; = `logp&#8217; -0.5*(`cons&#8217;-2.6)^2 -0.5*(`base&#8217;-1)^2 + \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 -0.5*`trt&#8217;*`trt&#8217; -0.5*`bxt&#8217;*`bxt&#8217; -0.5*`age&#8217;*`age&#8217; -0.5*`v4&#8217;*`v4&#8242; + \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0 7*log(`tU&#8217;) &#8211; 2*`tU&#8217; + 7*log(`tE&#8217;) &#8211; 2*`tE&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\"> end<\/span><\/p>\n<p>The improvement is speed is dramatic. The original version took 42 minutes to produce 55,000 iterations of the sampler, while this new version takes 6 minutes, faster by a factor of 7.<\/p>\n<p><strong>Improving the mixing<\/strong><\/p>\n<p>Unfortunately the improvement in speed does nothing for the mixing which is still very poor as this trace plot shows.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/02\/trace2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-462\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/02\/trace2.png\" alt=\"trace2\" width=\"811\" height=\"590\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/02\/trace2.png 811w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/02\/trace2-300x218.png 300w\" sizes=\"auto, (max-width: 811px) 100vw, 811px\" \/><\/a><\/p>\n<p>I ran <strong>mcmcstats<\/strong> to obtain summary statistics from the saved results and added the <strong>correlation<\/strong> option in order to help me diagnose the cause of the problem. Not surprisingly the three parameters that mix poorly, namely base, trt and bxt (basextrt interaction) are all highly correlated.<\/p>\n<pre>        |    cons base     trt     bxt     age    v4 sigmaU sigmaE\r\n -------+----------------------------------------------------------\r\n cons   | 1.0000\r\n base   | -0.1506 1.0000\r\n trt    | -0.0661 0.7239 1.0000\r\n bxt    | 0.0521 -0.7922 -0.9514 1.0000\r\n age    | -0.0020 -0.0817 -0.1465 0.2038 1.0000\r\n v4     | -0.2659 0.0258 0.0388 -0.0382 -0.0493 1.0000\r\n sigmaU | -0.0796 0.0131 0.0387 -0.0543 -0.0155 0.0398 1.0000\r\n sigmaE | -0.0495 -0.0639 -0.0678 0.0783 -0.0732 0.0744 -0.0104 1.0000<\/pre>\n<p>One tactic for handling correlated parameters is to use a multivariate normal proposal so that the three parameters are updated in a single step, then\u00a0we can choose the new multivariate proposal to allow for the anticipated correlation.<\/p>\n<p>The multivariate normal proposal version of the Metropolis-Hasting algorithm described in the book is called mhsmnorm and this takes the Cholesky decomposition of the variance matrix as its input. We do not need an exact value for this matrix so it is not important to base it on exact correlations or standard deviations. The proposal distributions for the three troublesome parameters as produced by the current algorithm are 0,17, 0.49 and 0.26 so we can obtain the Cholesky decomposition by,<br \/>\n<span style=\"color: #0000ff\">matrix R = (1, 0.70, -0.80 \\ 0.70, 1, -0.95 \\ -0.80, -0.95, 1)<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix sd = (0.17, 0.49, 0.26)<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix D = diag(sd)<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix C = D*R*D<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix L = cholesky(C)<\/span><br \/>\n<span style=\"color: #0000ff\"> matrix list L<\/span><\/p>\n<p>and we obtain<br \/>\nc1\u00a0\u00a0\u00a0 \u00a0c2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 c3<br \/>\nc1 .17\u00a0\u00a0\u00a0\u00a0 0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 0<br \/>\nc2 .343 .34992999 0<br \/>\nc3 -.208 -.1419884 .06461652<\/p>\n<p>The only change that we need to make is to replace the three normal proposal calls to mhsnorm for base, trt and bxt with a single call to mhsmnorm with L set close to this Cholesky matrix.<br \/>\n<span style=\"color: #0000ff\">matrix L = (0.17, 0, 0 \\ 0.343, 0.350, 0 \\ -0.208, -0.142, 0.065)<\/span><br \/>\n<span style=\"color: #0000ff\"> mhsnorm `logpost&#8217; `b&#8217; 1, sd(0.1)<\/span><br \/>\n<span style=\"color: #0000ff\"> mhsmnorm `logpost&#8217; `b&#8217; 2, chol(L)<\/span><br \/>\n<span style=\"color: #0000ff\"> mhsnorm `logpost&#8217; `b&#8217; 5, sd(0.1)<\/span><\/p>\n<p>In fact these steps are a little large and trial and error shows that we get slightly better performance with L\/4 that is with shorter steps with the same correlation.<br \/>\nThe trace plot is now<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/02\/trace3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-463\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/02\/trace3.png\" alt=\"trace3\" width=\"811\" height=\"590\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/02\/trace3.png 811w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/02\/trace3-300x218.png 300w\" sizes=\"auto, (max-width: 811px) 100vw, 811px\" \/><\/a><\/p>\n<p>The new summary statistics are,<\/p>\n<pre>----------------------------------------------------------------------------\r\nParameter         n     mean       sd      sem   median         95% CrI\r\n----------------------------------------------------------------------------\r\n cons         10000    1.593    0.081   0.0039    1.593 (   1.432,   1.746 )\r\n base         10000    0.904    0.125   0.0078    0.905 (   0.667,   1.146 )\r\n trt          10000   -0.764    0.368   0.0212   -0.780 (  -1.477,  -0.004 )\r\n bxt          10000    0.254    0.181   0.0109    0.256 (  -0.105,   0.599 )\r\n age          10000    0.408    0.307   0.0147    0.414 (  -0.199,   0.993 )\r\n v4           10000   -0.099    0.090   0.0022   -0.099 (  -0.271,   0.077 )\r\n sigmaU       10000    0.492    0.053   0.0009    0.488 (   0.398,   0.606 )\r\n sigmaE       10000    0.409    0.037   0.0011    0.409 (   0.339,   0.481 )\r\n----------------------------------------------------------------------------<\/pre>\n<p><strong>Tuning the proposal standard deviations<\/strong><\/p>\n<p>Another way to improve the mixing is to use better standard deviations for the proposal distributions. The program mcmcrun has an adapt option, which when set instructs the program to adapt the proposal standard devaitions during the burn-in phase by increasing them when a proposal is accepted and decreasing them when a proposal is rejected. As we are using our own sampler we cannot use this automatic facility but we can program it for ourselves.<\/p>\n<p>First I declared a global called NITER that started at zero and was updated by one in our sampler every time was called. The starting values of the proposal standard deviations were stored in a row matrix called SD. Then, depending on the current value of NITER, the program would call the sampler mhsEPY or a new version called mhsEPYburnin. mhsEPY works just as before but mhsEPYburnin modifies the contents of SD depending on whether a proposal is accepted as not.<\/p>\n<p>I will not list the code but it can be downloaded with the other programs from my homepage. Using this adpatation the results were,<\/p>\n<pre>----------------------------------------------------------------------------\r\nParameter\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 n\u00a0\u00a0\u00a0\u00a0 mean\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 sd\u00a0\u00a0\u00a0\u00a0\u00a0 sem\u00a0\u00a0 median\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 95% CrI\r\n----------------------------------------------------------------------------\r\n\u00a0cons\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0\u00a0 1.597\u00a0\u00a0\u00a0 0.080\u00a0\u00a0 0.0033\u00a0\u00a0\u00a0 1.596 (\u00a0\u00a0 1.446,\u00a0\u00a0 1.757 )\r\n\u00a0base\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0\u00a0 0.930\u00a0\u00a0\u00a0 0.129\u00a0\u00a0 0.0087\u00a0\u00a0\u00a0 0.928 (\u00a0\u00a0 0.675,\u00a0\u00a0 1.181 )\r\n\u00a0trt\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0 -0.776\u00a0\u00a0\u00a0 0.362\u00a0\u00a0 0.0225\u00a0\u00a0 -0.770 (\u00a0 -1.500,\u00a0 -0.074 )\r\n\u00a0bxt\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0\u00a0 0.267\u00a0\u00a0\u00a0 0.187\u00a0\u00a0 0.0124\u00a0\u00a0\u00a0 0.269 (\u00a0 -0.098,\u00a0\u00a0 0.637 )\r\n\u00a0age\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0\u00a0 0.419\u00a0\u00a0\u00a0 0.334\u00a0\u00a0 0.0136\u00a0\u00a0\u00a0 0.429 (\u00a0 -0.261,\u00a0\u00a0 1.054 )\r\n\u00a0v4\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0 -0.102\u00a0\u00a0\u00a0 0.092\u00a0\u00a0 0.0021\u00a0\u00a0 -0.102 (\u00a0 -0.280,\u00a0\u00a0 0.083 )\r\n\u00a0sigmaU\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0\u00a0 0.495\u00a0\u00a0\u00a0 0.055\u00a0\u00a0 0.0009\u00a0\u00a0\u00a0 0.491 (\u00a0\u00a0 0.397,\u00a0\u00a0 0.611 )\r\n\u00a0sigmaE\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0\u00a0 0.409\u00a0\u00a0\u00a0 0.035\u00a0\u00a0 0.0008\u00a0\u00a0\u00a0 0.407 (\u00a0\u00a0 0.344,\u00a0\u00a0 0.481 )\r\n----------------------------------------------------------------------------<\/pre>\n<p>In fact the sem do not change much because my original guesses\u00a0for\u00a0the proposal standard devaitions were reasonable.<\/p>\n<p>If we compare these with the results that were obtained last week we see that for the same number of iterations the sem\u2019s have reduced dramatically. For base 0.0145 has become 0.0087, for trt 0.0526 has become 0.0225, for the interaction 0.0417 has become 0.0124.<\/p>\n<p>These improvements roughly halve the sem which means that to obtain our new performance the old program would need to have been run for 4&#215;55000 =220,000 iterations and even with the new efficient coding of logpost this would take 24 minutes rather than the current 6 minutes.<\/p>\n<p><strong>Combined effect<\/strong><\/p>\n<p>The modifications that we have made have not changed the answer but they do make a dramatic difference to the performance. We have reduced the run time by a factor of 7 by recoding logpost and by a factor of 4 by improving the mixing. A combined factor of 28 so that something that previously took 1 minute now takes about 2 seconds.<\/p>\n<p>No doubt the code could be made even quicker and the mixing could be improved even further but additional changes are unlikely to have such a marked impact and would probably not justify the effort. If we want further improvements then we need to switch to Mata or WinBUGS and I\u2019ll consider that possibility next time.<\/p>\n<p>Meanwhile the programs referred to in this posting can be downloaded from my homepage <a href=\"https:\/\/www2.le.ac.uk\/Members\/trj\">https:\/\/www2.le.ac.uk\/Members\/trj<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>&nbsp; Last time I analysed the epilesy data taken from\u00a0Thall and Vail(1990). The model involved a Poisson regression with two random effects and I showed how this can be programmed in Stata by storing the values of the random effects in Stata variables rather than in a Stata matrix. However, the resulting do file was [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-453","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/453","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=453"}],"version-history":[{"count":11,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/453\/revisions"}],"predecessor-version":[{"id":479,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/453\/revisions\/479"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=453"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=453"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=453"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}