{"id":824,"date":"2015-08-28T07:46:20","date_gmt":"2015-08-28T07:46:20","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=824"},"modified":"2025-02-26T13:21:37","modified_gmt":"2025-02-26T13:21:37","slug":"google-flu-trends","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2015\/08\/28\/google-flu-trends\/","title":{"rendered":"Google Flu Trends"},"content":{"rendered":"<p>I am always on the lookout for interesting medical data sets that I can use\u00a0for teaching and the other day, quite by accident,\u00a0I came across &#8220;Google flu trends&#8221;. This is an internet project operated by Google that estimates the numbers of cases of influenza each week in 25 different countries. The records go back to the beginning of 2003.<\/p>\n<p>The novel aspect of\u00a0this project\u00a0is that influenza is monitored by tracking online Google searches related to influenza. The number of searches has been calibrated against surveys\u00a0of the number of medically diagnosed cases.\u00a0You can find\u00a0more information on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Google_Flu_Trends\">Wikipedia<\/a>.<\/p>\n<p>There\u00a0is a fascinating on-going debate in the media and on the internet about the ability of such search engine data to predict cases of influenza. What is most interesting is the\u00a0emotion that this\u00a0debate generates. People who\u00a0are attracted by\u00a0big data get enthusiastic about\u00a0the project and play down its limitations,\u00a0while people who equate big data\u00a0with big brother stress the weakness in the flu estimates and overlook the potential. It is as though they are not really talking about Google flu trends but merely using\u00a0the debate\u00a0as a vehicle for justifying their wider prejudices.<\/p>\n<p>Anyway, I will just accept the data and see how we might model\u00a0them to detect influenza epidemics. After all, even if the search based predictions are\u00a0wrongly calibrated they might still rise when the true number of cases of flu rises, which could be enough to enable us to detect the start of an outbreak.\u00a0As an example, here is\u00a0the data for Chile.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/chile-e1439993972637.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-829\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/chile-e1439993972637.png\" alt=\"chile\" width=\"600\" height=\"439\" \/><\/a><\/p>\n<p>The dashed vertical lines denote periods of 52 weeks. Remember that Chile is in the southern hemisphere so the winter influenza peak occurs around the middle of the year. Just looking at the plot one can see a small seasonal trend of the type that is most obvious in year 4 (around 200 weeks). Then on top of that there is usually an annual epidemic that causes a sharp spike of variable size and\u00a0variable duration. For the purposes of monitoring influenza, we would like to analyse these data in real time and to decide as soon as possible when an epidemic is underway.<\/p>\n<p>I will try to derive a\u00a0Bayesian analysis of these data and in the process of developing and fitting\u00a0such a\u00a0model I will need to create a new sampler and integrate it into the general\u00a0structure described in my book.<\/p>\n<p>In my model the weekly counts\u00a0will be thought of as being dependent on a latent variable that denotes whether or not an\u00a0epidemic is in progress.\u00a0We will define\u00a0a variable Z that takes the value 0\u00a0when there is no current epidemic, and 1, when there is a current\u00a0epidemic. Of course, Z is not observed but has to be inferred from the pattern of counts.\u00a0Our aim is\u00a0to detect as soon as possible when Z switches from 0 to 1.<\/p>\n<p>It appears from the plot that it would be reasonable to start by assuming that there is exactly one flu epidemic every year\u00a0and that\u00a0the severity varies from the very minor (e.g. year 4) to the very extreme (year 7).<\/p>\n<p>Each\u00a0epidemic is characterised by the week that it starts, S, and the week that it ends, E. These parameters will take integer values and of course S must be less that E. It would be nice to be able to update\u00a0estimates of S and E\u00a0using a Metropolis-Hastings algorithm but the programs given in my book only work for parameters defined over a continuous range: <strong>mhsnorm<\/strong> is good for parameters defined over an infinite range (-\u221e,\u221e), <strong>mhslogn<\/strong> is good for (0,\u221e) and <strong>mhstrnc<\/strong> is good for (a,b), where a and b are finite. Our first job therefore is to design a new Metropolis-Hastings sampler that can deal with integer valued parameters.<\/p>\n<p>Let\u2019s suppose that \u03b8 is a parameter defined over the integers (1, 2, \u2026 , n) and and that we have a program that works out the log-posterior given \u03b8. If the current value of the parameter is \u03b8<sub>0<\/sub>, we need to make a proposal say \u03b8<sub>1<\/sub> and compare logpost(\u03b8<sub>0<\/sub>) with logpost(\u03b8<sub>1<\/sub>) to see if we accept a move to \u03b8<sub>1<\/sub> or reject it and stay at \u03b8<sub>0<\/sub>.<\/p>\n<p>We might propose a random move to any other integer in the range [1,n]. The disadvantage of completely random moves\u00a0is that\u00a0many of the proposed moves will take us to poorly supported values and such\u00a0moves\u00a0are very likely to be rejected. Lots of rejected moves will mean poor mixing.<\/p>\n<p>An alternative strategy would be to propose a move to a neighbouring value, i.e. from \u03b8<sub>0<\/sub>=12 to \u03b8<sub>1<\/sub>=11 or 13. Such local moves are more likely to be accepted but\u00a0the change is small\u00a0so the algorithm will\u00a0require lots of moves to cover the posterior and this might also be inefficient, especially early in the chain when we have not located the peak of the posterior.<\/p>\n<p>On balance my gut feeling\u00a0favours short moves to neighbouring points. So let\u2019s assume that when \u03b8<sub>0<\/sub>=12 we are equally likely to suggest a move up to 13 or a move down to 11. Similarly if \u03b8<sub>0<\/sub>=11 we are equally likely to propose a move to 10 or 12. The key feature is Pr(propose 11 | \u03b8<sub>0<\/sub>=12 ) = Pr(propose 12 | \u03b8<sub>0<\/sub>=11 ). This is the characteristic\u00a0that defines\u00a0the special case known as a\u00a0Metropolis algorithm in which the proposal probabilities cancel and we only need to worry about the comparative values of the log-posterior.<\/p>\n<p>There is a small problem at the end points of the range. Consider the case\u00a0when \u03b8<sub>0<\/sub>=1, the\u00a0move to\u00a0\u03b8<sub>1<\/sub>=0 is not allowed because the range is [1,n]. We can cope with this, either by\u00a0modifying the rule for making proposals, or by making the log-posterior\u00a0at \u03b8<sub>1<\/sub>=0 so small that the move is never accepted. The latter is easier to program.<\/p>\n<p>The program for the new sampler will follow the same pattern as <strong>mhsnorm<\/strong>. So let\u2019s look at that program and see what needs to be changed. I have highlighted the few places that\u00a0need our attention\u00a0in green.<\/p>\n<p><span style=\"color: #0000ff\">program <span style=\"color: #008080\"><strong>mhsnorm<\/strong><\/span>, rclass<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 syntax anything [fweight] [if] [in],\u00a0 <\/span><span style=\"color: #0000ff\"><strong><span style=\"color: #008080\">SD(string)<\/span>\u00a0\u00a0\u00a0<\/strong><\/span><span style=\"color: #0000ff\">\u00a0\u00a0 [\u00a0<\/span><span style=\"color: #0000ff\">\u00a0\u00a0 LOGP(real 1e20)<\/span><span style=\"color: #0000ff\">\u00a0\u00a0 DEBUG\u00a0\u00a0\u00a0\u00a0<\/span><span style=\"color: #0000ff\">]<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 tokenize &#8220;`anything'&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 local logpost &#8220;`1&#8242;&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 local b\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 &#8220;`2&#8242;&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\">local ipar\u00a0\u00a0 = `3&#8242;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 tempname logl newlogl<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 if &#8220;`debug'&#8221; == &#8220;&#8221; local qui &#8220;qui&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 local oldvalue = `b'[1,`ipar&#8217;]<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 scalar `logl&#8217; = .<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 if `logp&#8217; &lt; 1e19 scalar `logl&#8217; = `logp&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0 if `logl&#8217; == . {<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 `qui&#8217; `logpost&#8217; `logl&#8217; `b&#8217; `ipar&#8217; [`weight&#8217;`exp&#8217;] `if&#8217; `in&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 if `logl&#8217; == . {<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 di as err &#8220;<span style=\"color: #008080\"><strong>mhsnorm<\/strong><\/span>: cannot calculate log-posterior for parameter `ipar'&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 matrix list `b&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 exit(1400)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 <span style=\"color: #008080\"><strong>matrix `b'[1,`ipar&#8217;] = `oldvalue&#8217; + `sd&#8217;*rnormal()<\/strong><\/span><\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 scalar `newlogl&#8217; = .<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 `qui&#8217; `logpost&#8217; `newlogl&#8217; `b&#8217; `ipar&#8217; [`weight&#8217;`exp&#8217;] `if&#8217; `in&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 if `newlogl&#8217; == . {<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 di as err &#8220;<span style=\"color: #008080\"><strong>mhsnorm<\/strong><\/span>: cannot calculate log-posterior for parameter `ipar'&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 matrix list `b&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 exit(1400)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 if log(uniform()) &lt; (`newlogl&#8217;-`logl&#8217;) local accept = 1<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 local accept = 0<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 matrix `b'[1,`ipar&#8217;] = `oldvalue&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 scalar `newlogl&#8217; = `logl&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 return scalar accept = `accept&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 return scalar logp = `newlogl&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">end<\/span><\/p>\n<p>Most of the program is involved in housekeeping and error checking so the changes are minimal. First, since I\u00a0am going to call the new sampler <strong>mhsint<\/strong>,\u00a0we need to change all references to mhsnorm into mhsint, then we need to drop the redundant SD option and finally we change the line that makes the new proposal from<\/p>\n<p><span style=\"color: #008000\"><strong>matrix `b'[1,`ipar&#8217;] = `oldvalue&#8217; + `sd&#8217;*rnormal()<\/strong><\/span><\/p>\n<p>to<\/p>\n<p><span style=\"color: #0000ff\"><strong>matrix `b'[1,`ipar&#8217;] = `oldvalue&#8217; + 1 \u2013 2*(runiform()&lt;0.5)<\/strong><\/span><\/p>\n<p>This ensures that the proposal has an equal chance of being 1 more or 1 less that the\u00a0previous value.<\/p>\n<p>Let\u2019s design a simple test to\u00a0show how mhsint can be used with mcmcrun in just the same way as the other samplers mentioned in my book. I will suppose that a variable can take the values 1 or 2 or 3 with posterior probabilities 0.2, 0.3 and 0.5. Here is a program for calculating the log-posterior. If we move outside the range [1,3] then the log-posterior is set to a very low number so as to effectively block the move.<\/p>\n<p><span style=\"color: #0000ff\">program logpost<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 args lnf b<\/span><\/p>\n<p><span style=\"color: #0000ff\">\u00a0 local k = `b'[1,1]<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 if `k&#8217; == 1 scalar `lnf&#8217; = log(0.2)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 else if `k&#8217; == 2 scalar `lnf&#8217; = log(0.3)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 else if `k&#8217; == 3 scalar `lnf&#8217; = log(0.5)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0 else scalar `lnf&#8217; = -999.9<\/span><br \/>\n<span style=\"color: #0000ff\"> end<\/span><\/p>\n<p>I\u2019ll start the run with the variable (that I\u2019m calling k) set to 2. This I use mcmcrun to create a short burnin and then 500 samples.<\/p>\n<p><span style=\"color: #0000ff\">matrix b = (2)<\/span><br \/>\n<span style=\"color: #0000ff\"> mcmcrun logpost b using temp.csv, replace\u00a0 \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 param(k) burn(50) update(500) samp( mhsint )<\/span><br \/>\n<span style=\"color: #0000ff\"> insheet using temp.csv , clear<\/span><br \/>\n<span style=\"color: #0000ff\"> tabulate k<\/span><\/p>\n<pre>          k |\u00a0\u00a0\u00a0\u00a0 Freq.\u00a0\u00a0\u00a0\u00a0 Percent\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Cum.\r\n------------+-----------------------------------\r\n          1 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 102\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 20.40\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 20.40\r\n          2 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 145\u00a0\u00a0\u00a0\u00a0   29.00\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 49.40\r\n          3 |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 253\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 50.60\u00a0\u00a0\u00a0\u00a0  100.00\r\n------------+-----------------------------------\r\n      Total |\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 500\u00a0\u00a0\u00a0\u00a0 100.00<\/pre>\n<p>Clearly the chain has settled to the correct proportions.<\/p>\n<p>Next let\u2019s design a test that is closer to the flu problem. Suppose that we generate at sequence of 50 values, i.e. the flu counts for each week over most of a year. At the start of the year the counts\u00a0will be Poisson with a mean of 2 but at some random point between 10 and 40 weeks they will switch to a Poisson with a mean of 4.<\/p>\n<p><span style=\"color: #0000ff\">clear<\/span><br \/>\n<span style=\"color: #0000ff\">set seed 626169<\/span><br \/>\n<span style=\"color: #0000ff\">set obs 50<\/span><br \/>\n<span style=\"color: #0000ff\">local start = int(10+runiform()*31)<\/span><br \/>\n<span style=\"color: #0000ff\">gen t = _n<\/span><br \/>\n<span style=\"color: #0000ff\">gen mu = 2 + 2*(t &gt;= `start&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\">gen y = rpoisson(mu)<\/span><br \/>\n<span style=\"color: #0000ff\">scatter y t, c(l) xline(`start&#8217;)<\/span><\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/random50-e1439994748304.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-830\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/random50-e1439994748304.png\" alt=\"random50\" width=\"600\" height=\"439\" \/><\/a><\/p>\n<p>In this simulation the switching point is at 31. What we need to do is to estimate the time of the switch, which I\u2019ll again call k, from the 50 counts.<\/p>\n<p>Here is a program that calculates the log-posterior. If we step outside the range [10,40] then the\u00a0move is blocked by setting a very small log-posterior, otherwise we calculate a standard Poisson log-likelihood with known means 2 or 4 and a switch at the current k, which is stored as the first element of our parameter vector, b.<\/p>\n<p><span style=\"color: #0000ff\">program logpost<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 args lnf b<\/span><\/p>\n<p><span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if `b'[1,1] &lt; 10 | `b'[1,1] &gt; 40 scalar `lnf&#8217; = -9999.9<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 tempvar mu lnL<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 gen `mu&#8217; = 2<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 qui replace `mu&#8217; = 4 if _n &gt;= `b'[1,1]<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 gen `lnL&#8217; = y*log(`mu&#8217;) &#8211; `mu&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 qui su `lnL&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 scalar `lnf&#8217; = r(sum)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #0000ff\">end<\/span><\/p>\n<p>We can run the analysis much as before<\/p>\n<p><span style=\"color: #0000ff\">matrix b = (25)<\/span><br \/>\n<span style=\"color: #0000ff\">mcmcrun logpost b using temp.csv, replace\u00a0 \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0 param(k) burn(50) update(500) samp( mhsint )<\/span><br \/>\n<span style=\"color: #0000ff\">insheet using temp.csv , clear<\/span><br \/>\n<span style=\"color: #0000ff\">histogram k , start(9.5) width(1) xlabel(10(5)40)<\/span><\/p>\n<p>The estimated log-posterior does indeed peak at 31 but given the nature of the data that we saw in the time series plot it is not surprising that the switch looks more likely to have come later in the sequence rather than earlier.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/posthistogram-e1439994845560.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-831\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/posthistogram-e1439994845560.png\" alt=\"posthistogram\" width=\"600\" height=\"439\" \/><\/a><\/p>\n<p>If we inspect the trace plot of the chain then we can see that we only made steps of size one but,\u00a0despite this,\u00a0the mixing looks reasonable.<\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/flutrace1-e1439995139435.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-834\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/flutrace1-e1439995139435.png\" alt=\"flutrace1\" width=\"600\" height=\"439\" \/><\/a><\/p>\n<p>It would be a simple matter to change from steps of size 1 to steps of size 1 or 2 or steps of size 1 or 2 or 3. We just need to change the line of code\u00a0that creates the random proposal.<\/p>\n<p>As it stands, the program logpost does not include a prior on k, so the implicit assumption is that prior to seeing the data we thought that all switching points in the range [10,40] were equally likely.\u00a0As it happens, this\u00a0accords with the way that the data were generated but had the data been real counts of flu in a small population, we might well have had prior belief about when the switch would occur. For example, we\u00a0might believe\u00a0that 25 is the most likely switching point and that the prior probability is proportional to 1\/(1+abs(k-25))<\/p>\n<p><span style=\"color: #0000ff\">clear<\/span><br \/>\n<span style=\"color: #0000ff\">range k 10 40 31<\/span><br \/>\n<span style=\"color: #0000ff\">gen prior = 1\/(1+abs(k-25))<\/span><br \/>\n<span style=\"color: #0000ff\">twoway bar prior k<\/span><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-832\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/priorhistogram-e1439994931856.png\" alt=\"priorhistogram\" width=\"600\" height=\"439\" \/><\/p>\n<p>We can incorporate this prior into the calculation of the log-posterior<\/p>\n<p><span style=\"color: #0000ff\">program logpost<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 args lnf b<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 if `b'[1,1] &lt; 10 | `b'[1,1] &gt; 40 scalar `lnf&#8217; = -9999.9<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 else {<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 tempvar mu lnL<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 gen `mu&#8217; = 2<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 qui replace `mu&#8217; = 4 if _n &gt;= `b'[1,1]<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 gen `lnL&#8217; = y*log(`mu&#8217;) &#8211; `mu&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 qui su `lnL&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 scalar `lnf&#8217; = r(sum) &#8211; log(1+abs(`b'[1,1]-25))<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 }<\/span><br \/>\n<span style=\"color: #0000ff\">end<\/span><\/p>\n<p>and the posterior is pulled slightly to the left.<img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-833\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2015\/08\/posthistogram2-e1439995037463.png\" alt=\"posthistogram2\" width=\"600\" height=\"439\" \/><\/p>\n<p>Next time I will use the new sampler to help me build a model for the flu data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I am always on the lookout for interesting medical data sets that I can use\u00a0for teaching and the other day, quite by accident,\u00a0I came across &#8220;Google flu trends&#8221;. This is an internet project operated by Google that estimates the numbers of cases of influenza each week in 25 different countries. The records go back to [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[80,40],"class_list":["post-824","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-google-flu-trends","tag-metropolis-hastings"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/824","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=824"}],"version-history":[{"count":13,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/824\/revisions"}],"predecessor-version":[{"id":853,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/824\/revisions\/853"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=824"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=824"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=824"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}