{"id":254,"date":"2014-08-22T10:39:38","date_gmt":"2014-08-22T10:39:38","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=254"},"modified":"2025-02-26T13:21:38","modified_gmt":"2025-02-26T13:21:38","slug":"gibbs-sampling-with-a-dirichlet-process","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2014\/08\/22\/gibbs-sampling-with-a-dirichlet-process\/","title":{"rendered":"Gibbs Sampling with a Dirichlet Process"},"content":{"rendered":"<p>This is the third in a series of postings on the use of Dirichlet processes for non-parametric Bayesian analysis and their implementation in Stata.<\/p>\n<p>In this posting I will create a Stata program that fits a Bayesian model that incorporates a Dirichlet process prior. As my example I will model the peak expiratory flow measurements that were first used to illustrate agreement study analysis in an article from 1986 published in the <em>Lancet<\/em>. These data were re-analysed by Sophia Rabe-Hesketh and Anders Skrondal\u2019s in their book \u2018<em>Multilevel and longitudinal modeling using Stata<\/em>\u2019. The data can be downloaded\u00a0to Stata using<br \/>\n<span style=\"color: #0000ff\"> . use http:\/\/www.stata-press.com\/data\/mlmus\/pefr, clear<\/span><br \/>\nand the values are:<\/p>\n<pre style=\"padding-left: 150px\"><span style=\"color: #0000ff\">  wp1 wp2 wm1 wm2\r\n  494 490 512 525\r\n  395 397 430 415\r\n  516 512 520 508\r\n  434 401 428 444\r\n  476 470 500 500\r\n  557 611 600 625\r\n  413 415 364 460\r\n  442 431 380 390\r\n  650 638 658 642\r\n  433 429 445 432\r\n  417 420 432 420\r\n  656 633 626 605\r\n  267 275 260 227\r\n  478 492 477 467\r\n  178 165 259 268\r\n  423 372 350 370\r\n  427 421 451 443<\/span><\/pre>\n<p>What we have here are repeated measurements of peak expiratory flow (how fast you can blow air from your lungs in Litres\/min) on 17 subjects. The measurements were made twice using the Wright flow meter (wp1 and wp2) and then twice using a simpler device called a mini Wright meter (wm1 wm2). The objective was to measure the agreement between the two devices.<\/p>\n<p>This problem has a hierarchical structure in that the subjects can be viewed as being sampled from a population with some distribution of peak flow. Then there is a random measurement error term associated with each reading that also has a distribution. The usual analysis assumes that both of these distributions are normal but we will create a model that keeps the normal distribution for the measurement errors but uses a Dirichlet process to give greater flexibility to the distribution across subjects.<\/p>\n<p>We can write the model for subject i=1\u202617 with method j=1,2 on occasion k=1,2 as<br \/>\n<span style=\"color: #0000ff\">Yijk = \u03b1i + \u03b2 Xijk + \u03b5ijk<\/span><br \/>\nwhere Xi1k = 0 and Xi2k = 1 to denote which type of meter was used. ei1k ~ N(0,\u03c41) and \u03b5i2k ~ N(0,\u03c42) since the measurement errors might differ between meters. Finally the subject effects, \u03b1i ~ f(), where f is the distribution across subjects.<\/p>\n<p>Now we need some priors. Peak flow varies a lot with age, height and gender with values ranging between 300 and 700L\/min. If a typical value is 500L\/min then be would not expect the meters to differ by more than say, 10%, so \u03b2 will be below 50, hopefully a lot below 50. Measurement error would perhaps be 5%-10% and might be higher with the mini meter.<\/p>\n<p>In the end I decided on the following priors:<br \/>\nf will be drawn from a Dirichlet process <span style=\"color: #0000ff\">DP(M,F)<\/span> with M = 100 and F=N(500,sd=100).<br \/>\n<span style=\"color: #0000ff\">\u03b2 ~ N(0,sd=25)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u03c41 ~ G(3,0.001) \u03c42 ~ G(2,0.001)<\/span><br \/>\nThe program given below could easily be modified to accommodate your own priors should you not like mine.<\/p>\n<p>So we want an MCMC algorithm that samples from the posterior<br \/>\n<span style=\"color: #0000ff\">p(\u03b2, \u03c41, \u03c42, \u03b1, f | Y )\u00a0 \u03b1\u00a0\u00a0 \u00a0p(Y | \u03b2, \u03c41, \u03c42, \u03b1, f ) p(\u03b2) p(\u03c41) p(\u03c42) p(\u03b1 | f) p(f)<\/span><\/p>\n<p>The first term on the right hand side is the likelihood\u00a0of the data, Y, and it does not depend on f() when we know \u03b1i, so the posterior simplifies to,<br \/>\n<span style=\"color: #0000ff\">p(\u03b2, \u03c41, \u03c42, \u03b1, f | Y )\u00a0 \u03b1\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">p(Y | \u03b2, \u03c41, \u03c42, \u03b1 ) p(\u03b2) p(\u03c41) p(\u03c42) p(\u03b1 | f) p(f)<\/span><\/p>\n<p>If we\u00a0integrate over this unknown distribution f() we get,<br \/>\n<span style=\"color: #0000ff\">p(\u03b2, \u03c41, \u03c42, \u03b1 | Y )\u00a0 \u03b1\u00a0\u00a0\u00a0 p(Y | \u03b2, \u03c41, \u03c42, \u03b1 ) p(\u03b2) p(\u03c41) p(\u03c42) \u222b p(\u03b1 | f) p(f) df<\/span><\/p>\n<p>Although the integral is complex we do have the Chinese Restaurant algorithm that tells us how to sample successive \u03b1 from \u222b p(\u03b1 | f) p(f) df.<\/p>\n<p><em><strong>A Gibbs Sampler<\/strong><\/em><\/p>\n<p>The likelihood of the data, p(Y | \u03b2, \u03c41, \u03c42, \u03b1 ), has a normal structure created by the measurement error so my chosen priors for \u03b2 and the \u03c4\u2019s are conjugate and updating is relatively straight forward (see chapter 4 of <em>Bayesian Analysis with Stata<\/em>).<\/p>\n<p>p(f) is a Dirichlet process and as we have already noted we can sample \u03b1 from \u222b p(\u03b1 | f) p(f) df using the Chinese restaurant algorithm. However, now we have an extra problem because there is information about \u03b1 in the likelihood as well as in this prior.<\/p>\n<p>The Chinese restaurant algorithm says that we draw \u03b1i conditional on all other parameters (including the other \u03b1\u2019s) from a mixture distribution that we can write as<br \/>\n<span style=\"color: #0000ff\">\u03b4(\u03b1j)\/(M+16) + M F(\u03b1)\/(M+16)<\/span><br \/>\nThis is to be read as meaning that we take either one of the other 16 existing \u03b1\u2019s, each with probability 1\/(M+16), or we make a fresh draw from F with probability M\/(M+16).<\/p>\n<p>If we denote the part of the likelihood that contains \u03b1i as L then we must draw one of the\u00a0existing \u03b1\u2019s, say \u03b1j, with probability proportional to L(\u03b1j)\/(M+16) and a fresh value with probability proportional to M L(\u03b1i)F(\u03b1i) \/(M+16). The constant of proportionality can be found by using the fact that the 17 probabilities must sum to one.<\/p>\n<p>Generally the need to sample new values from L(\u03b1)F(\u03b1) will severely limit the usefulness of this algorithm but here we have a likelihood based on a normal distribution and a normal distribution for F; it turns out that the algebra is tractable.<\/p>\n<p>The algebra is a bit painful and quite difficult to write out in a blog that does not make it easy to create equations, so I have written the algebra in full in a downloadable pdf file <a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/08\/wright.pdf\">wright<\/a>.<\/p>\n<p>Here is a Stata program that fits the model using the algorithm described in the pdf<\/p>\n<pre><span style=\"color: #0000ff\">use http:\/\/www.stata-press.com\/data\/mlmus\/pefr, clear<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> * PRIORS<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> local a1 = 3<\/span>\r\n<span style=\"color: #0000ff\"> local b1 = 0.001<\/span>\r\n<span style=\"color: #0000ff\"> local a2 = 2<\/span>\r\n<span style=\"color: #0000ff\"> local b2 = 0.001<\/span>\r\n<span style=\"color: #0000ff\"> local m = 0<\/span>\r\n<span style=\"color: #0000ff\"> local t = 1\/625<\/span>\r\n<span style=\"color: #0000ff\"> local A = 500<\/span>\r\n<span style=\"color: #0000ff\"> local T = 0.0001<\/span>\r\n<span style=\"color: #0000ff\"> local M = 100<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> * Initial values<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> gen alpha = (wp1+wp2)\/2<\/span>\r\n<span style=\"color: #0000ff\"> gen tmp = wp1 - wp2<\/span>\r\n<span style=\"color: #0000ff\"> qui su tmp<\/span>\r\n<span style=\"color: #0000ff\"> local t1 = 2\/r(Var)<\/span>\r\n<span style=\"color: #0000ff\"> qui replace tmp = wm1 - wm2<\/span>\r\n<span style=\"color: #0000ff\"> qui su tmp<\/span>\r\n<span style=\"color: #0000ff\"> local t2 = 2\/r(Var)<\/span>\r\n<span style=\"color: #0000ff\"> qui replace tmp = (wp1+wp2)\/2 - alpha<\/span>\r\n<span style=\"color: #0000ff\"> qui su tmp<\/span>\r\n<span style=\"color: #0000ff\"> local b = r(mean)<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> * Updates<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> tempname pf<\/span>\r\n<span style=\"color: #0000ff\"> postfile `pf' beta tau1 tau2 alpha1 alpha2 alpha3 alpha4 alpha5 alpha6 alpha7 alpha8 alpha9 alpha10 alpha11 alpha12 \/\/\/<\/span>\r\n<span style=\"color: #0000ff\"> alpha13 alpha14 alpha15 alpha16 alpha17 using temp.dta, replace<\/span>\r\n<span style=\"color: #0000ff\"> forvalues iter=1\/10500 {<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> * beta (b)<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> local tb = 34*`t2'+`t'<\/span>\r\n<span style=\"color: #0000ff\"> qui replace tmp = (wm1-alpha) + (wm2-alpha)<\/span>\r\n<span style=\"color: #0000ff\"> qui su tmp<\/span>\r\n<span style=\"color: #0000ff\"> local mb = (`t2'*r(sum)+`t'*`m')\/`tb'<\/span>\r\n<span style=\"color: #0000ff\"> local b = rnormal(`mb',1\/sqrt(`tb'))<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> * tau1 (t1)<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> qui replace tmp = (wp1-alpha)*(wp1-alpha)+(wp2-alpha)*(wp2-alpha)<\/span>\r\n<span style=\"color: #0000ff\"> qui su tmp<\/span>\r\n<span style=\"color: #0000ff\"> local at1 = `a1'+17<\/span>\r\n<span style=\"color: #0000ff\"> local bt1 = 1\/(0.5*r(sum)+1\/`b1')<\/span>\r\n<span style=\"color: #0000ff\"> local t1 = rgamma(`at1',`bt1')<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> * tau2 (t2)<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> qui replace tmp = (wm1-alpha-`b')*(wm1-alpha-`b')+(wm2-alpha-`b')*(wm2-alpha-`b')<\/span>\r\n<span style=\"color: #0000ff\"> qui su tmp<\/span>\r\n<span style=\"color: #0000ff\"> local at2 = `a2'+17<\/span>\r\n<span style=\"color: #0000ff\"> local bt2 = 1\/(0.5*r(sum)+1\/`b2')<\/span>\r\n<span style=\"color: #0000ff\"> local t2 = rgamma(`at2',`bt2')<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> * alpha<\/span>\r\n<span style=\"color: #0000ff\"> *------------------------<\/span>\r\n<span style=\"color: #0000ff\"> forvalues i=1\/17 {<\/span>\r\n<span style=\"color: #0000ff\"> local C2 = `T' +2*`t1'+2*`t2'<\/span>\r\n<span style=\"color: #0000ff\"> local C1 = `T'*`A' + `t1'*(wp1[`i']+wp2[`i']) + \/\/\/<\/span>\r\n<span style=\"color: #0000ff\"> `t2'*((wm1[`i']-`b')+(wm2[`i']-`b'))<\/span>\r\n<span style=\"color: #0000ff\"> local C0 = `T'*`A'*`A' + `t1'*(wp1[`i']*wp1[`i']+wp2[`i']*wp2[`i']) \/\/\/<\/span>\r\n<span style=\"color: #0000ff\"> + `t2'*((wm1[`i']-`b')*(wm1[`i']-`b')+(wm2[`i']-`b')*(wm2[`i']-`b'))<\/span>\r\n<span style=\"color: #0000ff\"> local q0 = `M'*sqrt(`T'\/`C2')*exp(-0.5*(`C0'-`C1'*`C1'\/`C2'))<\/span>\r\n<span style=\"color: #0000ff\"> local sq = `q0'<\/span>\r\n<span style=\"color: #0000ff\"> forvalues j=1\/17 {<\/span>\r\n<span style=\"color: #0000ff\"> if `j' == `i' local q`j' = 0<\/span>\r\n<span style=\"color: #0000ff\"> else local q`j' = exp(-0.5*(`t1'*(wp1[`i']-alpha[`j'])* \/\/\/<\/span>\r\n<span style=\"color: #0000ff\"> (wp1[`i']-alpha[`j']) + `t1'*(wp2[`i']-alpha[`j'])*(wp2[`i']-alpha[`j']) \/\/\/<\/span>\r\n<span style=\"color: #0000ff\"> +`t2'*(wm1[`i']-alpha[`j']-`b')*(wm1[`i']-alpha[`j']-`b') + \/\/\/<\/span>\r\n<span style=\"color: #0000ff\"> `t2'*(wm2[`i']-alpha[`j']-`b')*(wm2[`i']-alpha[`j']-`b') ))<\/span>\r\n<span style=\"color: #0000ff\"> local sq = `sq' + `q`j''<\/span>\r\n<span style=\"color: #0000ff\"> }<\/span>\r\n<span style=\"color: #0000ff\"> forvalues j=0\/17 {<\/span>\r\n<span style=\"color: #0000ff\"> local q`j' = `q`j''\/`sq'<\/span>\r\n<span style=\"color: #0000ff\"> }<\/span>\r\n<span style=\"color: #0000ff\"> local j = 0<\/span>\r\n<span style=\"color: #0000ff\"> local sq = `q0'<\/span>\r\n<span style=\"color: #0000ff\"> local u = runiform()<\/span>\r\n<span style=\"color: #0000ff\"> while `u' &gt; `sq' {<\/span>\r\n<span style=\"color: #0000ff\"> local ++j<\/span>\r\n<span style=\"color: #0000ff\"> local sq = `sq' + `q`j''<\/span>\r\n<span style=\"color: #0000ff\"> }<\/span>\r\n<span style=\"color: #0000ff\"> if `j' == 0 qui replace alpha = rnormal(`C1'\/`C2',1\/sqrt(`C2')) in `i'<\/span>\r\n<span style=\"color: #0000ff\"> else qui replace alpha = alpha[`j'] in `i'<\/span>\r\n<span style=\"color: #0000ff\"> }<\/span>\r\n<span style=\"color: #0000ff\"> local plist \"(`b') (`t1') (`t2')\"<\/span>\r\n<span style=\"color: #0000ff\"> forvalues i=1\/17 {<\/span>\r\n<span style=\"color: #0000ff\"> local a = alpha[`i']<\/span>\r\n<span style=\"color: #0000ff\"> local plist \"`plist' (`a')\"<\/span>\r\n<span style=\"color: #0000ff\"> }<\/span>\r\n<span style=\"color: #0000ff\"> post `pf' `plist'<\/span>\r\n<span style=\"color: #0000ff\"> }<\/span>\r\n<span style=\"color: #0000ff\"> postclose `pf'<\/span>\r\n<span style=\"color: #0000ff\"> use temp.dta, clear<\/span>\r\n<span style=\"color: #0000ff\"> gen sd1 = 1\/sqrt(tau1)<\/span>\r\n<span style=\"color: #0000ff\"> gen sd2 = 1\/sqrt(tau2)<\/span><\/pre>\n<p>I ran this program to create 10,500 updates and then discarded the first 500 although this length of burn-in was rather more than I needed as my starting values are quite good. Here are the results<\/p>\n<p><span style=\"color: #0000ff\">. mcmcstats *<\/span><\/p>\n<pre><span style=\"color: #0000ff\">----------------------------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\">Parameter\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 n\u00a0\u00a0\u00a0\u00a0 mean\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 sd\u00a0\u00a0\u00a0\u00a0\u00a0 sem\u00a0\u00a0 median\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 95% CrI<\/span>\r\n<span style=\"color: #0000ff\">----------------------------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\">\u00a0beta\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0\u00a0 4.759\u00a0\u00a0\u00a0 5.766\u00a0\u00a0 0.0806\u00a0\u00a0\u00a0 4.796 (\u00a0 -6.585,\u00a0 16.060 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0tau1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0\u00a0 0.003\u00a0\u00a0\u00a0 0.001\u00a0\u00a0 0.0000\u00a0\u00a0\u00a0 0.003 (\u00a0\u00a0 0.001,\u00a0\u00a0 0.005 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0tau2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0\u00a0 0.001\u00a0\u00a0\u00a0 0.001\u00a0\u00a0 0.0000\u00a0\u00a0\u00a0 0.001 (\u00a0\u00a0 0.001,\u00a0\u00a0 0.003 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 499.418\u00a0\u00a0 11.764\u00a0\u00a0 0.1309\u00a0 499.185 ( 477.360, 523.361 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 405.752\u00a0\u00a0 11.634\u00a0\u00a0 0.1332\u00a0 405.933 ( 383.264, 428.662 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 511.760\u00a0\u00a0 11.172\u00a0\u00a0 0.1151\u00a0 511.822 ( 489.389, 534.095 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha4\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 422.584\u00a0\u00a0 10.841\u00a0\u00a0 0.1178\u00a0 422.445 ( 401.530, 444.636 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha5\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 481.346\u00a0\u00a0 11.436\u00a0\u00a0 0.1284\u00a0 481.084 ( 459.092, 504.483 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha6\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 591.071\u00a0\u00a0 11.740\u00a0\u00a0 0.1301\u00a0 590.951 ( 568.720, 615.034 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 413.595\u00a0\u00a0 10.816\u00a0\u00a0 0.1153\u00a0 413.818 ( 391.458, 434.218 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 418.100\u00a0\u00a0 12.824\u00a0\u00a0 0.1795\u00a0 419.044 ( 390.077, 441.458 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha9\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 642.327\u00a0\u00a0 10.814\u00a0\u00a0 0.1136\u00a0 642.332 ( 620.764, 663.581 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha10\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 431.244\u00a0\u00a0 10.917\u00a0\u00a0 0.1148\u00a0 430.854 ( 410.868, 453.090 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha11\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 420.439\u00a0\u00a0 10.662\u00a0\u00a0 0.1117\u00a0 420.406 ( 399.531, 441.874 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha12\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 631.905\u00a0\u00a0 12.173\u00a0\u00a0 0.1495\u00a0 632.674 ( 606.440, 654.117 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha13\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 262.062\u00a0\u00a0 12.832\u00a0\u00a0 0.1573\u00a0 262.716 ( 234.243, 285.572 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha14\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 479.474\u00a0\u00a0 11.340\u00a0\u00a0 0.1235\u00a0 479.730 ( 456.457, 501.834 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha15\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 205.544\u00a0\u00a0 17.827\u00a0\u00a0 0.3663\u00a0 203.432 ( 175.602, 245.963 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha16\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 385.349\u00a0\u00a0 13.025\u00a0\u00a0 0.1674\u00a0 385.670 ( 359.127, 410.078 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha17\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 429.787\u00a0\u00a0 11.392\u00a0\u00a0 0.1265\u00a0 429.315 ( 408.605, 453.572 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0sd1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0 19.689\u00a0\u00a0\u00a0 4.161\u00a0\u00a0 0.0974\u00a0\u00a0 18.929 (\u00a0 13.626,\u00a0 29.886 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0sd2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0 27.858\u00a0\u00a0\u00a0 4.881\u00a0\u00a0 0.0887\u00a0\u00a0 27.631 (\u00a0 18.978,\u00a0 38.418 )<\/span>\r\n<span style=\"color: #0000ff\">----------------------------------------------------------------------------<\/span><\/pre>\n<p><span style=\"color: #0000ff\"><br \/>\n<\/span>Let us contrast these results with the same model but M=20 in the Dirichlet process prior. This represents less confidence in the normal base distribution.<\/p>\n<pre><span style=\"color: #0000ff\">----------------------------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\">Parameter\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 n\u00a0\u00a0\u00a0\u00a0 mean\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 sd\u00a0\u00a0\u00a0\u00a0\u00a0 sem\u00a0\u00a0 median\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 95% CrI<\/span>\r\n<span style=\"color: #0000ff\">----------------------------------------------------------------------------<\/span>\r\n<span style=\"color: #0000ff\">\u00a0beta\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0\u00a0 5.028\u00a0\u00a0\u00a0 5.785\u00a0\u00a0 0.0805\u00a0\u00a0\u00a0 5.051 (\u00a0 -6.592,\u00a0 16.412 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0tau1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0\u00a0 0.003\u00a0\u00a0\u00a0 0.001\u00a0\u00a0 0.0000\u00a0\u00a0\u00a0 0.003 (\u00a0\u00a0 0.001,\u00a0\u00a0 0.006 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0tau2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0\u00a0 0.001\u00a0\u00a0\u00a0 0.001\u00a0\u00a0 0.0000\u00a0\u00a0\u00a0 0.001 (\u00a0\u00a0 0.001,\u00a0\u00a0 0.003 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 498.001\u00a0\u00a0 11.686\u00a0\u00a0 0.1313\u00a0 497.852 ( 476.034, 521.054 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 407.969\u00a0\u00a0 11.642\u00a0\u00a0 0.1351\u00a0 408.766 ( 384.703, 428.393 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha3\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 510.131\u00a0\u00a0 11.398\u00a0\u00a0 0.1212\u00a0 509.970 ( 488.062, 532.669 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha4\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 421.499\u00a0\u00a0\u00a0 9.441\u00a0\u00a0 0.1087\u00a0 421.396 ( 402.823, 441.031 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha5\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 482.188\u00a0\u00a0 11.449\u00a0\u00a0 0.1306\u00a0 481.866 ( 460.235, 505.658 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha6\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 591.471\u00a0\u00a0 12.285\u00a0\u00a0 0.1438\u00a0 590.892 ( 568.875, 617.403 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha7\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 414.766\u00a0\u00a0\u00a0 9.953\u00a0\u00a0 0.1116\u00a0 415.716 ( 393.231, 432.483 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha8\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 418.245\u00a0\u00a0 11.662\u00a0\u00a0 0.1671\u00a0 419.348 ( 391.538, 439.163 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha9\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 640.930\u00a0\u00a0 10.551\u00a0\u00a0 0.1171\u00a0 640.892 ( 620.243, 662.228 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha10\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 427.722\u00a0\u00a0 10.377\u00a0\u00a0 0.1164\u00a0 426.519 ( 409.715, 450.368 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha11\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 419.820\u00a0\u00a0\u00a0 9.452\u00a0\u00a0 0.1044\u00a0 420.137 ( 400.499, 438.751 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha12\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 633.032\u00a0\u00a0 12.139\u00a0\u00a0 0.1641\u00a0 634.452 ( 605.031, 653.556 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha13\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 260.309\u00a0\u00a0 14.378\u00a0\u00a0 0.2157\u00a0 261.895 ( 229.308, 285.398 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha14\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 480.481\u00a0\u00a0 11.333\u00a0\u00a0 0.1265\u00a0 480.546 ( 456.773, 502.093 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha15\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 206.370\u00a0\u00a0 19.913\u00a0\u00a0 0.4823\u00a0 202.659 ( 175.060, 248.730 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha16\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 387.677\u00a0\u00a0 13.881\u00a0\u00a0 0.1853\u00a0 388.309 ( 358.905, 413.419 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0alpha17\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0 426.614\u00a0\u00a0 10.398\u00a0\u00a0 0.1213\u00a0 425.484 ( 408.662, 449.858 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0sd1\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0 19.694\u00a0\u00a0\u00a0 4.503\u00a0\u00a0 0.1144\u00a0\u00a0 18.673 (\u00a0 13.332,\u00a0 30.498 )<\/span>\r\n<span style=\"color: #0000ff\">\u00a0sd2\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10000\u00a0\u00a0 28.151\u00a0\u00a0\u00a0 5.137\u00a0\u00a0 0.0956\u00a0\u00a0 27.970 (\u00a0 18.771,\u00a0 38.937 )<\/span>\r\n<span style=\"color: #0000ff\">----------------------------------------------------------------------------<\/span><\/pre>\n<p>It is interesting to see that the prior\u00a0on the distribution of the subject effects does have a noticeable influence on the estimate of beta, the parameter which measures the average difference between the two meters.<\/p>\n<p>In my\u00a0recent postings I have tried to explain the idea behind the Dirichlet process in a simple, non-mathematical way and to demonstrate that these priors\u00a0can be incorporated into practical Stata programs for data analysis.<\/p>\n<p>So far I have only scratched the surface of a vast and very active area of research. Other possibilities include making the base distribution depend on parameters in a hierarchical structure, placing a prior on the concentration parameter M, extending to non-normal likelihoods and non-normal base distributions.<\/p>\n<p>Dirichlet processes priors give rise to data sets with clusters of repeat observations so the results are inherently discrete even though we are modelling a continuous distribution; some people find this unsatisfactory and so a lot of work has gone into developing Dirichlet process mixture models in which the Dirichlet process defines the distribution of the parameters of a standard distribution such as the normal. The data are then randomly drawn from a mixture of those\u00a0standard distributions and consequently there are no repeated values. This creates a natural extension of a finite mixture model.<\/p>\n<p>For\u00a0all of these\u00a0modified Dirichlet process models\u00a0there are competing algorithms that might be used and these could be\u00a0exact or approximate, fast or slow.<\/p>\n<p>Dirichlet process\u00a0methods are very powerful and have found application in virtually every area of applied research.\u00a0The peak flow\u00a0example\u00a0demonstrates that they can be implemented in Stata but\u00a0in my next posting\u00a0I want to ask whether using Stata\u00a0for\u00a0such analyses\u00a0is a good idea. After all, why implement\u00a0an analysis in Stata if it is easier to program\u00a0it\u00a0in R, or even better,\u00a0if someone else has already written the R code?<\/p>\n<p>Unfortunately, that discussion will have to wait for a couple of weeks as I have a conference to go to.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is the third in a series of postings on the use of Dirichlet processes for non-parametric Bayesian analysis and their implementation in Stata. In this posting I will create a Stata program that fits a Bayesian model that incorporates a Dirichlet process prior. As my example I will model the peak expiratory flow measurements [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[30,32],"class_list":["post-254","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-dirichlet-process","tag-non-parametric-bayesian-analysis"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/254","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=254"}],"version-history":[{"count":13,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/254\/revisions"}],"predecessor-version":[{"id":288,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/254\/revisions\/288"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=254"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=254"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=254"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}