{"id":182,"date":"2014-06-20T14:50:14","date_gmt":"2014-06-20T14:50:14","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=182"},"modified":"2025-02-26T13:21:38","modified_gmt":"2025-02-26T13:21:38","slug":"monitoring-convergence-in-high-dimensions-continued","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2014\/06\/20\/monitoring-convergence-in-high-dimensions-continued\/","title":{"rendered":"Monitoring Convergence in High Dimensions continued"},"content":{"rendered":"<p>In my last posting I introduced a program for comparing different methods of convergence assessment in multi-dimensional MCMC analyses. Essentially the program samples from an imaginary posterior that takes the form of a user specified mixture of multivariate normal distributions. Previously, I illustrated some of the problems of assessing convergence by looking at a simulated chain from a single three dimensional normal distribution. This time I would like to assess convergence when the posterior takes the form of a mixture of two normal distributions.<\/p>\n<p>To start with, I simulated an artificial chain from a mixture of a wide-ranging normal distribution which represents 80% of the posterior and a much tighter normal distribution that represents the other 20%. The chain had high autocorrelation (0.95) so I chose to simulate 100,000 updates and then thinned by 5 to leave 20,000. Here is a plot of the chain with the first 100 points shown in red and the two components of the mixture shown in blue and green. Remember that although the simulation is created from two separate components the posterior is\u00a0a single\u00a0bimodal distribution.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-185\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_1-1024x752.png\" alt=\"convergePart2_1\" width=\"620\" height=\"455\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_1-1024x752.png 1024w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_1-300x220.png 300w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><\/p>\n<p>The chain finds both components but spends about 35% of its time in the smaller component rather than the\u00a0desired 20%, so the posterior is not well represented. Because of the high autocorrelation the chain tends to stay in whichever part of the posterior that it finds itself. We can see this clearly from the trace plot that shows the poor mixing between regions. Once in the\u00a0tighter mode\u00a0the chain tends not to move to the outer regions of lower probability and once outside\u00a0the other\u00a0mode\u00a0the chain can move far enough away that it does not find the\u00a0first mode\u00a0again for a while.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-186\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_2-1024x752.png\" alt=\"convergePart2_2\" width=\"620\" height=\"455\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_2-1024x752.png 1024w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_2-300x220.png 300w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><\/p>\n<p>The section plots better demonstrate that the proportions of the chain in the two regions have not yet converged. I deliberately placed the second component so that its mode is slightly offset in the direction of theta1-theta2 and so I have included the section plot for the derived parameter dtheta12=theta1-theta2. In this plot the bimodel nature of the distribution is more evident and illustrates\u00a0how, in multi-dimensional models, we can miss important features if we only look at simple marginal posterior distributions.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_31.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-191\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_31-1024x745.png\" alt=\"convergePart2_3\" width=\"620\" height=\"451\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_31-1024x745.png 1024w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_31-300x218.png 300w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_31.png 1028w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><\/p>\n<p>Here\u00a0are the section plots for\u00a0a new chain identical in every respect to the first except that it is produced with an autocorrelation\u00a0of 0.50 instead of 0.95. The lower autocorrelation makes it more likely that the algorithm will switch between the\u00a0main mode\u00a0and the rest of the posterior and\u00a0the new chain represents this particular non-normal posterior reasonably accurately.\u00a0If we\u00a0were unable to\u00a0design a sampler with a lower autocorrelation we would have had to adopt the brute force solution of running a much longer chain.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_41.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-192\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_41-1024x745.png\" alt=\"convergePart2_4\" width=\"620\" height=\"451\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_41-1024x745.png 1024w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_41-300x218.png 300w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/06\/convergePart2_41.png 1028w\" sizes=\"auto, (max-width: 620px) 100vw, 620px\" \/><\/a><\/p>\n<p>The trace plots alert you to poor mixing but will give no indication whether the chain has been run for long enough; section plots are much more useful although even these can be misleading if important features lie in directions that do not correspond to the current parameterization.<\/p>\n<p>Perhaps the hardest problems arise in\u00a0models\u00a0with\u00a0scores or even hundreds of parameters\u00a0of which\u00a0just a few\u00a0exhibit this type of\u00a0bimodal shape\u00a0in the posterior. It is very demanding to insist that the marginal plots of every parameter are inspected to pick up the few that have a problem. As Bayesian models get more complex there is a serious need for programs that automatically search through the MCMC results and only print out the plots for those parameters that\u00a0have a possible problem. For instance it would be easy to write a program that calculates the D statistic for each of 100 parameters but which\u00a0only plots the section plots for, say,\u00a0the worst 10 or those with D below some specified level.\u00a0Such a\u00a0search becomes even more demanding if want to ensure that there are no important features hidden from us by the current choice of parameterization.<\/p>\n<p><em><strong>This will be my last posting for a few weeks as I will be working abroad. I will post again in the second half of July<\/strong><\/em>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In my last posting I introduced a program for comparing different methods of convergence assessment in multi-dimensional MCMC analyses. Essentially the program samples from an imaginary posterior that takes the form of a user specified mixture of multivariate normal distributions. Previously, I illustrated some of the problems of assessing convergence by looking at a simulated [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[23,22],"class_list":["post-182","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-convergence","tag-mcmc"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/182","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=182"}],"version-history":[{"count":6,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/182\/revisions"}],"predecessor-version":[{"id":195,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/182\/revisions\/195"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=182"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=182"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=182"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}