Last time I presented a basic Mata program for fitting a Poisson model with two random effects to some epilepsy data. The program was faster that its Stata equivalent but slower than WinBUGS and the mixing was quite poor. This week I will try to improve the performance of the Mata program. It would make sense to read last week’s posting before trying to follow the development of the more efficient code.
Step 1: Gibbs sampling
There are 303 parameters in the model and only two of them have recognisable forms for their conditional posterior distributions and those are the precisions on the two random effects. The gamma priors are conjugate, so the updates can be generated by sampling from a gamma distribution. I try to avoid equations in this blog so I have added the derivation of the Gibbs samplers to the pdf that I posted last week that contained the equations for this model. The updated pdf can be downloaded from here, epilepsyequationsv2.
Of more interest to me is the method that you have to use to add these Gibbs sampling steps into the algorithm. It involves writing the Mata code to create your own sampler and a modified call to mcmcrun. Since all improvements to the basic Metropolis-Hastings method are likely to require the user to write their own samplers, I will look at this process in detail.
Let’s start by writing the sampler in Mata. The key is to get the right arguments in the right order. We need,
real scalar logp: logp is the current value of the log-posterior, which if known saves you from having to re-calculate it. Often missing but even if it is missing, we still need to refer to it.
pointer(function) logpost: logpost is the name of a Mata function that calculates the value of the log-posterior. Not needed for a Gibbs sampler that samples from a recognised distribution, but we must still place a dummy function name in this place.
real matrix X: X is the data matrix usually created by the data option of the mcmcrun command
real rowvector t: t is the full vector of parameter values that the sampler will update
real scalar p: p is the number of the parameter that is to be updated, i.e. the position within the rowvector t
Of course, the internal names for the arguments are arbitrary.
The sampler will update the contents of the rowvector t so does not need to return any values directly; consequently it is declared to be void. Putting this all together we can define the function.
void mygibbs(real scalar logp,pointer(function) dummy,real matrix X,real rowvector t,real scalar p)
{
if( p == 7 ) t[7] = rgamma(1,1,37.5,1/(2+0.5*t[|9\67|]*t[|9\67|]’))
else if( p == 8 ) t[8] = rgamma(1,1,126,1/(2+0.5*t[|68\303|]*t[|68\303|]’))
}
This function needs to be placed alongside the function logcpost in the Mata block of our Stata do file. Then we can tell mcmcrun to use it with the command,
mcmcrun logcpost X theta using mataepilepsy.csv , ///
param(cons base trt bxt age v4 tauU tauE u1-u59 e1-e236) ///
samp( 6(mhsnorm , sd(s)) 2(mygibbs ) 295(mhsnorm , sd(s) ) ) adapt ///
burn(5000) update(50000) thin(5) replace ///
data(X=(y_1 y_2 y_3 y_4 cLB cTrt cBXT clnAge) theta s) mata
The only change comes in the samplers option in which we point to mygibbs for updating the two precisions rather than the Metropolis-Hastings sampler mhslogn that we used previously.
This change makes very little difference to the run time; it still takes just over 3 minutes on my desktop. It does improve mixing, but only marginally since the precisions mixed well when we used Metropolis-Hastings updates.
Step 2: Placing the random effects in vectors
The actual values of the random effects are rarely of any interest in themselves and usually there is little benefit in saving them in the results file. When we programmed the problem in Stata we took advantage of this fact and placed the values of the random effects in 5 Stata variables. This meant that we did not need to unpack them one at a time from the row vector of parameter values and we could use vectors in the calculations, which are much quicker than loops.
We can do much the same thing in Mata by adding the values of the random effects as columns in the data matrix X.
Before we write the Mata code, let’s see how this change affects the Stata code that sets up the problem.
gen u = rnormal(0,0.5)
gen e1 = rnormal(0,0.5)
gen e2 = rnormal(0,0.5)
gen e3 = rnormal(0,0.5)
gen e4 = rnormal(0,0.5)
matrix theta = J(1,8,0)
matrix theta[1,1] = 2.6
matrix theta[1,2] = 1
matrix theta[1,7] = 4
matrix theta[1,8] = 4
matrix s = J(1,6,0.1)
mcmcrun logcpost X theta using mataepilepsy.csv , ///
param(cons base trt bxt age v4 tauU tauE ) ///
samp( 6(mhsnorm , sd(s)) 2(mygibbs ) ) adapt ///
burn(5000) update(50000) thin(5) replace ///
data(X=(y_1 y_2 y_3 y_4 cLB cTrt cBXT clnAge u e1 e2 e3 e4) theta s) mata
Initial values for the random effects are placed in 5 Stata variables, u, e1, e2, e3, e4 and these are added to the definition of X in the data option. Thus, for example, u will be stored in column 9 of X. The row vector of parameter values, theta, is shortened to 8 values. Of course, we still have to update the values of random effects even if we do not refer to them explicitly. I will update them, on the quiet, in my Mata function mygibbs.
void mygibbs(real scalar logp,pointer(function) logpost,real matrix X,real rowvector t,real scalar p)
{
real scalar i, j, k
real colvector eta, eta1, eta2, eta3, eta4, oldlogp, newlogp, u, newt
eta = t[1]:+t[2]:*X[,5]:+t[3]:*X[,6]:+t[4]:*X[,7]:+t[5]:*X[,8]:+X[,9]
if( p == 7 ) {
eta1 = eta + X[,10]
eta2 = eta + X[,11]
eta3 = eta + X[,12]
eta4 = eta + X[,13] :+ t[6]
oldlogp = -exp(eta1) + X[,1]:*eta1 -exp(eta2) + X[,2]:*eta2 -exp(eta3) + X[,3]:*eta3 -exp(eta4) + X[,4]:*eta4 -0.5*t[7]*X[,9]:*X[,9]
u = rnormal(59,1,0,0.5)
newt = u + X[,9]
eta1 = eta1 + u
eta2 = eta2 + u
eta3 = eta3 + u
eta4 = eta4 + u
newlogp = -exp(eta1) + X[,1]:*eta1 -exp(eta2) + X[,2]:*eta2 -exp(eta3) + X[,3]:*eta3 -exp(eta4) + X[,4]:*eta4 -0.5*t[7]*newt:*newt
u = log(runiform(59,1))
for(i=1;i<=59;i++) {
if( u[i] < (newlogp[i]-oldlogp[i]) ) X[i,9] = newt[i]
}
t[7] = rgamma(1,1,37.5,1/(2+0.5*X[,9]’*X[,9]))
}
else if( p == 8 ) {
for( j=1;j<=4;j++) {
k = j + 9
eta1 = eta + X[,k]
if( j == 4 ) eta1 = eta1 :+ t[6]
oldlogp = -exp(eta1) + X[,j]:*eta1 -0.5*t[8]*X[,k]:*X[,k]
u = rnormal(59,1,0,0.5)
newt = X[,k] + u
eta1 = eta1 + u
newlogp = -exp(eta1) + X[,j]:*eta1 -0.5*t[8]*newt:*newt
u = log(runiform(59,1))
for(i=1;i<=59;i++) {
if( u[i] < (newlogp[i]-oldlogp[i]) ) X[i,k] = newt[i]
}
}
t[8] = rgamma(1,1,126,1/(2+0.5*(X[,10]’*X[,10]+X[,11]’*X[,11]+X[,12]’*X[,12]+X[,13]’*X[,13])))
}
}
You will see that when the function mygibbs is called to update the precision of the subject-specific random effect, we also update the individual values of that random effect and when it is called to update the precision of the within subject random effects, we also update the values of those random effects.
The code for updating the values of the random effects is simply a Metropolis-Hastings algorithm programmed to refer to the values in the data matrix columns 9 to 13.
This code is quick; 55,000 iterations in 37 seconds compared with over 3 minutes for the previous Mata program. A few weeks ago we found that the BUGS-like programs take about 2 minutes. The problem that remains is the mixing, which is still not as good as you get with OpenBUGS.
Step 3: skip the dots.
By default mcmcrun shows a dot on the screen after every 250 iterations. There is a dots option to mcmcrun that enables you to alter this frequency and if we set the frequency to a negative number then no dots are displayed. Sometimes this will make a difference to the run time because displaying to the screen is a relatively slow process and the calculation stops while this happens. However, the time saved seems to be very computer dependent . On my work computer, dropping the dots reduced the run time from 37 seconds to 26 seconds, while when I ran the program on my home computer the time reduced from 22 seconds to 18 seconds.
Step 4: block updating
Block updating has two advantages; the most important is that, if we allow for the correlation between parameters, it ought to improve the mixing. The secondary advantage is that block updating is usually slightly quicker because we are not required to repeat very similar calculations.
The key to efficient block updating by Metropolis-Hastings is to have a good estimate of the covariance matrix of the parameters. To obtain this I extracted the covariance matrix from an earlier analysis.
insheet using mataepilepsy.csv, clear case
mcmcstats cons base trt bxt age v4, cov
matrix V = r(V)
matrix L = cholesky(V)/4
We need to supply the cholesky decomposition of the matrix to save the block updater from having to calculate it over and over again. By trial and error I discovered that the cholesky decomposition of V gave slightly too large proposals and so I divided by 4.
Now we just use L in the call to mcmcrun; of course L needs to be passed to Mata in the data option as well as being referred to in the sampler.
mcmcrun logcpost X theta using mataepilepsy4.csv , ///
param(cons base trt bxt age v4 tauU tauE ) dots(-1) ///
samp( (mhsmnorm , chol(L)) 2(mygibbs ) ) ///
burn(5000) update(50000) thin(5) replace ///
data(X=(y_1 y_2 y_3 y_4 cLB cTrt cBXT clnAge u e1 e2 e3 e4) theta L) mata
Because L is a 6×6 matrix, mcmcrun is able to work out that you want mhsmnorm to update six parameters.
As it turns out that the gain is speed is more impressive that the improvement in mixing. The run time drops from 26 seconds to 14 seconds but the mixing is still slightly worse than we get with WinBUGS and much worse than OpenBUGS or JAGS with the glm module.
Step 5: What next?
I think that we have pushed Metropolis-Hastings sampling as far as we can. Since the program only takes 14 seconds to run compared with about 2 minutes for OpenBUGS we could just run the chain for much longer and so get comparable accuracy without improving the mixing.
The other option is to look to replace the Metropolis-Hastings algorithm with a better sampler. Generalized linear models with random effects are so commonly used that the question of improving the MCMC algorithms for this type of model has been given a lot of consideration in the literature. Next time I will consider the use of a data augmentation algorithm that I believe is similar to the method used in the JAGS glm module. As I write, I have not actually tried it yet, but I expect it to do quite well.
The data and code referred to in this posting can be downloaded from my home page at https://www2.le.ac.uk/Members/trj
Hi John,
How do you set up the data file to be read by the stata?. I am trying to reproduce Case study 8: Growth of sea cows from your book Bayesian Analysis with stata. What I did was to copy and paste the data that comes with winbugs (Dugongs: nonlinear growth curve ) in the stata do-editor and saved in to a txt file (data.txt). I created a separate file for the model and inits and I put everything in the same folder to be read by STATA. Then, I prepared a script file with the model.txt, the data.txt and inits.txt as in your book. However,when I run the script file I doesn’t work. The data file can’t be read. I suspect that the data file is not well set up. I tried different ways without success. For ejample
Fist option
list(x = c( 1.0, 1.5, 1.5, 1.5, 2.5, 4.0, 5.0, 5.0, 7.0,
8.0, 8.5, 9.0, 9.5, 9.5, 10.0, 12.0, 12.0, 13.0,
13.0, 14.5, 15.5, 15.5, 16.5, 17.0, 22.5, 29.0, 31.5),
Y = c(1.80, 1.85, 1.87, 1.77, 2.02, 2.27, 2.15, 2.26, 2.47,
2.19, 2.26, 2.40, 2.39, 2.41, 2.50, 2.32, 2.32, 2.43,
2.47, 2.56, 2.65, 2.47, 2.64, 2.56, 2.70, 2.72, 2.57), N = 27)
No success
Second option
x y
1 1.80
1.5 1.85
…..
31.5 2.57
No success either.
Can you help please?
Juan Suarez
Biochemistry Student
Concordia University
Montreal. Canada
We solved this problem via an exchange of emails – I’m happy to help when I can
John
Thank you very much!
Juan Reynel