In my last posting I introduced a program for comparing different methods of convergence assessment in multi-dimensional MCMC analyses. Essentially the program samples from an imaginary posterior that takes the form of a user specified mixture of multivariate normal distributions. Previously, I illustrated some of the problems of assessing convergence by looking at a simulated chain from a single three dimensional normal distribution. This time I would like to assess convergence when the posterior takes the form of a mixture of two normal distributions.
To start with, I simulated an artificial chain from a mixture of a wide-ranging normal distribution which represents 80% of the posterior and a much tighter normal distribution that represents the other 20%. The chain had high autocorrelation (0.95) so I chose to simulate 100,000 updates and then thinned by 5 to leave 20,000. Here is a plot of the chain with the first 100 points shown in red and the two components of the mixture shown in blue and green. Remember that although the simulation is created from two separate components the posterior is a single bimodal distribution.
The chain finds both components but spends about 35% of its time in the smaller component rather than the desired 20%, so the posterior is not well represented. Because of the high autocorrelation the chain tends to stay in whichever part of the posterior that it finds itself. We can see this clearly from the trace plot that shows the poor mixing between regions. Once in the tighter mode the chain tends not to move to the outer regions of lower probability and once outside the other mode the chain can move far enough away that it does not find the first mode again for a while.
The section plots better demonstrate that the proportions of the chain in the two regions have not yet converged. I deliberately placed the second component so that its mode is slightly offset in the direction of theta1-theta2 and so I have included the section plot for the derived parameter dtheta12=theta1-theta2. In this plot the bimodel nature of the distribution is more evident and illustrates how, in multi-dimensional models, we can miss important features if we only look at simple marginal posterior distributions.
Here are the section plots for a new chain identical in every respect to the first except that it is produced with an autocorrelation of 0.50 instead of 0.95. The lower autocorrelation makes it more likely that the algorithm will switch between the main mode and the rest of the posterior and the new chain represents this particular non-normal posterior reasonably accurately. If we were unable to design a sampler with a lower autocorrelation we would have had to adopt the brute force solution of running a much longer chain.
The trace plots alert you to poor mixing but will give no indication whether the chain has been run for long enough; section plots are much more useful although even these can be misleading if important features lie in directions that do not correspond to the current parameterization.
Perhaps the hardest problems arise in models with scores or even hundreds of parameters of which just a few exhibit this type of bimodal shape in the posterior. It is very demanding to insist that the marginal plots of every parameter are inspected to pick up the few that have a problem. As Bayesian models get more complex there is a serious need for programs that automatically search through the MCMC results and only print out the plots for those parameters that have a possible problem. For instance it would be easy to write a program that calculates the D statistic for each of 100 parameters but which only plots the section plots for, say, the worst 10 or those with D below some specified level. Such a search becomes even more demanding if want to ensure that there are no important features hidden from us by the current choice of parameterization.
This will be my last posting for a few weeks as I will be working abroad. I will post again in the second half of July.