{"id":754,"date":"2015-08-07T08:12:55","date_gmt":"2015-08-07T08:12:55","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=754"},"modified":"2025-02-26T13:21:37","modified_gmt":"2025-02-26T13:21:37","slug":"bayesian-experimental-design-part-ii","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/08\/07\/bayesian-experimental-design-part-ii\/","title":{"rendered":"Bayesian Experimental Design Part II"},"content":{"rendered":"<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/07\/31\/bayesian-experimental-design\/\">Last time<\/a> I started to discuss Bayesian experiment design and to illustrate that discussion I presented a simple clinical trial comparing a corticosteroid cream with a placebo for patients with eczema. The measurement of response was to\u00a0be the patient\u2019s rating of the severity of their eczema on a 0-10 visual analogue scale (VAS). Patients would only be randomized if their baseline VAS was over 7 and success was to\u00a0be defined as a VAS below 3 after one week. I posed the question, how many patients do we need in the trial?<\/p>\n<p>In discussing the\u00a0traditional power-based calculation of\u00a0sample size, I\u00a0supposed that the researcher believed that 50% of patients on the placebo will see enough improvement to be categorised as successful and\u00a0 that the proportion of successes for the new drug\u00a0would be\u00a070%, but that any success rate over 60% would be of clinical importance.<\/p>\n<p>The power-based sample size has many limitations so I went on to introduce a Bayesian algorithm based on the researcher&#8217;s prior for p0, the success\u00a0probability under\u00a0the placebo and p1, the success\u00a0probability under the treatment.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/07\/strategy-e1438076291247.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-756\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/07\/strategy-e1438076291247.png\" alt=\"strategy\" width=\"600\" height=\"403\" \/><\/a><\/p>\n<p>Now I would like to see how this algorithm\u00a0might be implemented in Stata.<\/p>\n<p><strong>Eliciting the priors<\/strong><\/p>\n<p>The basic parameters for this study are p0 and p1, but as we noted last time, beliefs about p1 and p0 are unlikely to be independent. So before we try to elicit priors for those probabilities directly, it is worth considering the alternatives.<\/p>\n<p>We might start by\u00a0considering the scale that will be used for measuring the benefit of treatment, which I will call \u03b8. Possible options include, \u03b8=p1-p0, \u03b8=p1\/p0, and \u03b8=[p1\/(1-p1)]\/[p0\/(1-p0)]. The first two have the advantage of simplicity but the disadvantage that there range depends on p0. So if p0 is 0.5 then \u03b8=p1-p0 can lie in (-0.5,0.5), while if p0 is 0.3 then \u03b8 can lie in (-0.3,0.7). Were we to set priors on \u03b8 and p0 then we could not reasonably treat them as independent.<\/p>\n<p>The odds ratio \u03b8=[p1\/(1-p1)]\/[p0\/(1-p0)] has the disadvantage of being harder to understand but the advantage that its range does not depend on p0 and so it is more reasonable to set independent priors on p0 and the odds ratio.<\/p>\n<p>Our researcher said that their best guess was p0=0.5 and p1=0.7, which on the odds ratio scale implies \u03b8=[0.7\/0.3]\/[0.5\/0.5] = 2.3 and it turns out that p0=0.3 the p1=0.5 would give the same odds ratio \u03b8=[0.5\/0.5]\/[0.3\/0.7] = 2.3. So\u00a0a key question is whether the researcher is happy that an improvement from 50% to 70% success is equivalent to a change from 30% to 50%.<\/p>\n<p>We might take a range of values of p0 and see what p1\u2019s are implied by an odds ratio of 2.3 (the red line on the plot). The black lines show the relationships for other selected values of the odds ratio.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/oddsratiop0-e1438698395853.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-768\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/oddsratiop0-e1438698395853.png\" alt=\"oddsratiop0\" width=\"600\" height=\"439\" \/><\/a><\/p>\n<p>If this scale seems acceptable then we might present the\u00a0researcher with a plot of\u00a0the impact\u00a0of different odds ratios when p0 is 0.5, as they expect.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/oddsratiop1-e1438698513352.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-769\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/oddsratiop1-e1438698513352.png\" alt=\"oddsratiop1\" width=\"600\" height=\"439\" \/><\/a><\/p>\n<p>Now they can judge that if the treatment is successful for between 50% and 80% of subjects then the implied odds ratio will be between 1 and 4.<\/p>\n<p>Once you have explained the scale, the researcher will need to commit themselves to priors for p0 and \u03b8. My preference is to let them sketch the priors freehand and then to try to find a standard distribution that is a reasonable approximation to their sketch.<\/p>\n<p>Remember the old maxim that &#8216;all models are wrong, but some are useful&#8217;, well it is also true that\u00a0&#8216;all priors are wrong, but some are useful&#8217;. So don&#8217;t be too particular about capturing those priors.<\/p>\n<p>An alternative would be to get them to draw a histogram to represent the prior. For example, p0 could be described by a histogram with blocks that have a width of 0.05 or 5% depending on the units that they find easier. The histogram could itself be used as the prior or, as before, we could seek to approximate it by a standard distribution.<\/p>\n<p>One advantage of finding a standard distribution is that one can select a distribution that does not totally rule out extreme values of the parameters.\u00a0This\u00a0is sometimes\u00a0useful as values ruled out by the prior cannot be found in the posterior.<\/p>\n<p>For illustration I am going to suppose that our researcher settles on a beta(30,30) prior for p0,<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/priorp0-e1438698680965.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-770\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/priorp0-e1438698680965.png\" alt=\"priorp0\" width=\"600\" height=\"439\" \/><\/a><\/p>\n<p>and an independent gamma(20,0.115). prior on the odds ratio, \u03b8.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/priortheta-e1438698768546.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-771\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/priortheta-e1438698768546.png\" alt=\"priortheta\" width=\"600\" height=\"439\" \/><\/a><\/p>\n<p>To continue working through the algorithm we need to define the criterion for selecting the best design. Here we have two options depending on whether\u00a0the\u00a0researcher is\u00a0happy\u00a0with a partial Bayesian solution in which they\u00a0judge performance in terms of some\u00a0measure associated with the posterior, or whether\u00a0they want a full Bayesian design based on utility.<\/p>\n<p><strong>Partial Bayesian Design<\/strong><\/p>\n<p>Let\u2019s start with the partial Bayesian design and suppose that\u00a0the researcher is\u00a0happy to measure performance in terms of the width of the 95% credible interval\u00a0for the odds ratio. The more informative the experiment, the narrower will be this credible interval.<\/p>\n<p>Before we collect any data, the 95% credible interval based on the researcher\u2019s gamma prior can be calculated by the Stata commands,<\/p>\n<p><span style=\"color: #0000ff\">. local lb = invgammap(20,0.025)*0.115<\/span><br \/>\n<span style=\"color: #0000ff\">. local ub = invgammap(20,0.975)*0.115<\/span><\/p>\n<p>so the interval is (1.405, 3.412) and has a width of almost exactly 2. The researcher must therefore tell us what width of credible interval they are hoping for once the trial is over. Perhaps they want to reduce the posterior width to 0.5.<\/p>\n<p>Now we can simulate a random set of parameters from the priors and use them to simulate a data set of a chosen size. Let\u2019s suppose that we decide to try a sample size of 100 in each group.<\/p>\n<p><span style=\"color: #0000ff\">local p0 = rbeta(30,30)<\/span><br \/>\n<span style=\"color: #0000ff\">local or = rgamma(20,0.115)<\/span><br \/>\n<span style=\"color: #0000ff\">local z = `or&#8217;*`p0&#8217;\/(1-`p0&#8242;)<\/span><br \/>\n<span style=\"color: #0000ff\">local p1 = `z&#8217;\/(1+`z&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\">scalar y0 = rbinomial(100,`p0&#8242;)<\/span><br \/>\n<span style=\"color: #0000ff\">scalar y1 = rbinomial(100,`p1&#8242;)<\/span><\/p>\n<p>For illustration, I ran this once and got p0=0.50, or=1.96, p1=0.66 which led to y0=53 successes in the 100 placebo subjects and y1=59 successes in the 100 treated subjects. Now all we need to do is to calculate the posterior given these data and find the width of the posterior credible interval. Then we can repeat the simulation a few times to find the average or expected width and see if it is greater or less than the target. If n=100 proves too small or too large, we adjust the sample size and repeat until the expected width is close to 0.5.<\/p>\n<p>So we need a program that calculates the posterior given data y0 and y1. The log-posterior is calculated by,<\/p>\n<p><span style=\"color: #0000ff\">program logpost<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 args lnf b<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 local p0 = `b'[1,1]<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 local or = `b'[1,2]<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 local p1 = `or&#8217;*`p0&#8217;\/(1-`p0&#8217;+`or&#8217;*`p0&#8242;)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 scalar `lnf&#8217; = y0*log(`p0&#8242;)+(100-y0)*log(1-`p0&#8242;)+<\/span><span style=\"color: #0000ff\">y1*log(`p1&#8242;)+(100-y1)*log(1-`p1&#8242;) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 + 29*log(`p0&#8242;) + 29*log(1-`p0&#8242;) + 19*log(`or&#8217;) &#8211; `or&#8217;\/0.115<\/span><br \/>\n<span style=\"color: #0000ff\">end<\/span><\/p>\n<p>Now we can find the expected width of the 95% credible interval. Since n=100 is likely to be a poor guess, I only ran the simulation and analysis 10 times.<\/p>\n<p><span style=\"color: #0000ff\">tempname pf<\/span><br \/>\n<span style=\"color: #0000ff\">postfile `pf&#8217; w using width.dta, replace<\/span><br \/>\n<span style=\"color: #0000ff\">forvalues sim = 1\/10 {<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">drop _all<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">local p0 = rbeta(30,30)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">local or = rgamma(20,0.115)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">local z = `or&#8217;*`p0&#8217;\/(1-`p0&#8242;)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">local p1 = `z&#8217;\/(1+`z&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">scalar y0 = rbinomial(100,`p0&#8242;)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">scalar y1 = rbinomial(100,`p1&#8242;)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">matrix b = (0.5, 2.3)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">mcmcrun logpost b using temp.csv, replace \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">samplers( (mhstrnc , sd(0.25) lb(0) ub(1) ) (mhslogn , sd(0.25) ) ) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">param(p0 theta) burn(500) update(2000)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0 insheet using temp.csv, clear<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">mcmcstats theta<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">local w = r(ub1)-r(lb1)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <\/span><span style=\"color: #0000ff\">post `pf&#8217; (`w&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\">}<\/span><br \/>\n<span style=\"color: #0000ff\">postclose `pf&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">use width.dta, clear<\/span><br \/>\n<span style=\"color: #0000ff\">ci w<\/span><\/p>\n<p>The result was<\/p>\n<pre>\u00a0\u00a0\u00a0 Variable |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Obs\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Mean\u00a0\u00a0\u00a0 Std. Err.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 [95% Conf. Interval]\r\n-------------+---------------------------------------------------------------\r\n\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 w |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 10\u00a0\u00a0\u00a0\u00a0 1.45993\u00a0\u00a0\u00a0 .0581376\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 1.328413\u00a0\u00a0\u00a0 1.591446<\/pre>\n<p>So the expected width is about 1.5 and 10 simulations were enough to place this expectation between 1.3 and 1.6. Since we are aiming for a width of 0.5 it is clear than n=100 is not nearly enough. A bit of trial and error followed and eventually I tried 25 simulations with each of the sample sizes in the range n=1500(250)3000. It only takes a couple of minutes to run.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/expectedwidth-e1438704065238.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-775\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/expectedwidth-e1438704065238.png\" alt=\"expectedwidth\" width=\"600\" height=\"439\" \/><\/a><\/p>\n<p>The graph is not very smooth because I only ran 25 simulations at each sample size, none the less it is clear that we need around 2500 subjects in each group to get the\u00a095% credible interval\u00a0width down to 0.5.<\/p>\n<p>This solution is almost certainly not practical and reflects the fact that we have based the design on an aspiration for a\u00a0very precise experiment\u00a0without considering the cost of the trial. We have also dodged the question of why the researcher wants so much accuracy.<\/p>\n<p>Researchers naturally want to do highly informative experiments, but\u00a0did our researcher\u00a0really need to reduce the credible interval\u00a0width to 0.5 or would\u00a01.0 have been almost as useful? After all, they were expecting the odds ratio to be around 2.3, so a typical credible interval\u00a0with width 0.5 would be (2.05,2.55), which is\u00a0a very long way from what the researcher\u00a0originally called a clinically important difference, p0=0.5 and p1=0.6, odds ratio 1.5.<\/p>\n<p>As a postscript to the partial Bayesian analysis. I ought to mention the paper,<\/p>\n<p><strong>Wang F and Gelfand AE<\/strong><br \/>\n<strong>A simulation-based approach to Bayesian sample size determination for performance under a given model and for separating models<\/strong><br \/>\n<strong><em>Statistical Science<\/em> 2002;17:193-208<\/strong><\/p>\n<p>In this article the authors\u00a0discuss the partial Bayesian solution based on the algorithm in the flowchart given earlier in this posting. The paper is especially useful because it considers many of the possible variations on this approach.<\/p>\n<p>In their paper,\u00a0Wang and Gelfand\u00a0advocate using two prior distributions, one for sampling the data and a second for performing the analysis.\u00a0This idea needs to be used with care as it\u00a0produces an answer based on two\u00a0contradictory assumptions. However, it can be useful.\u00a0Imagine two people, A and B,\u00a0who have\u00a0different prior opinions. A asks the statistician, what sample size\u00a0should I use in order to persuade B? This would require data simulated according to A&#8217;s prior and an analysis performed using B&#8217;s prior.<\/p>\n<p>We will see an example of\u00a0two priors in\u00a0next week&#8217;s posting\u00a0when I discuss the full Bayesian\u00a0sample size calculation\u00a0incorporating utility. That analysis will address the two issues of the precision that is actually required and the cost of the trial.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Last time I started to discuss Bayesian experiment design and to illustrate that discussion I presented a simple clinical trial comparing a corticosteroid cream with a placebo for patients with eczema. The measurement of response was to\u00a0be the patient\u2019s rating of the severity of their eczema on a 0-10 visual analogue scale (VAS). Patients would [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[74,73],"class_list":["post-754","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-elicitation","tag-experiment-design"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/754","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=754"}],"version-history":[{"count":12,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/754\/revisions"}],"predecessor-version":[{"id":790,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/754\/revisions\/790"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=754"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=754"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=754"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}