I have not posted for the last few weeks, not because I have nothing to say but rather because I have been thinking about the future of this blog.
I have had in mind for some time that there is a need to say something about the Bayesian analysis facilities that were introduced in Stata14 and while preparing for that topic I was forced to question the sense of advocating my own parallel set of Stata commands. When I wrote the book, Bayesian analysis with Stata, there were no Bayesian options in Stata and I was assured repeatedly that none were planned. Then, shortly after publication, there was a change of mind and some very basic Bayesian commands were added to the latest release of Stata.
Clearly there is no sense in having two competing systems and equally obviously StataCorp have the resources and experience to produce far better commands than I could. Although it must be said that the official additions to Stata14 are pretty basic and cannot produce the same variety of Bayesian analyses as my own commands.
It looks to me as though Stata have rushed out a very limited command for Metropolis-Hastings sampling without paying much attention to the way in which it will fit within the broader Bayesian software that they will eventually need to produce. While it is true that the Bayesian commands in Stata14 are limited, experience suggests that StataCorp will build quickly on those foundations and eventually they will produce something very impressive. I would not be surprised if they wrote a general Gibbs sampling command using Mata, similar to OpenBUGS or JAGS, or perhaps they will opt for HMC and write a Stan-like program in Mata.
StataCorp seem to have a policy of not linking Stata to other software. One can see the sense of this from an economic point of view as it makes users dependent on Stata and encourages them to buy the new releases. It also allows StataCorp to control the quality of the analyses that Stata produces. The downside is that developments are slower than they need to be and StataCorp is constantly engaged in reinventing the wheel. I think that the experience of more flexible approaches, as typified by R and Wikipedia, is that looser control has many advantages and quality is better maintained by the continual testing that results from heavy use.
Anyway, there is little point in my continuing to develop my own Stata commands for Bayesian analysis. It is a race that I cannot win.
There is, I think, still scope for a blog explaining Bayesian methods and illustrating different types of Bayesian analyses but this ought to be based on the official version of Stata and not my own commands. This creates a problem because the official commands are still too basic to do anything really interesting. In that sense, this blog is a few years ahead of its time.
There is one final factor that needs to be taken into account. I enjoy writing this blog, I certainly would not do it otherwise.
I have not made a final decision but I will probably leave this blog open but post less frequently and meantime give thought to starting a different blog to keep myself amused.
If I were to write another book, it would be called Bayesian methods in genetic epidemiology and I would try to follow the style of Bayesian analysis with Stata, by which I mean I would attempt to emphasise practical application and the understanding of the basic concepts, but not worry too much about the rigour of the explanation. There are many people analysing genetic data who have not had a formal training in statistics and it interests me to try to explain complex statistical ideas to such non-specialists, though I realise that this is challenging and my attempts will not always be successful.
Writing a book is a major undertaking and I do not have time for it at present but I am attracted by the idea of self-publishing a book as a blog and releasing it in regular instalments, rather as Dickens released Pickwick Papers (and no, I do not have any delusions about my writing style, which I know to be very limited).
The trouble is that Stata is not a good program for handling genetic epidemiology datasets, which are often huge. StataCorp made some initial decisions that have been overtaken by events and StataCorp has shown itself slow to adapt. Clearly the idea of only having one spreadsheet of data open at a time is too limiting and restricting that spreadsheet to a few thousand columns makes it virtually impossible to use Stata for many modern applications. I’m sure that Stata will evolve, but progress is so slow that I fear for its long-term market share.
Anyway, a serially released book called Bayesian methods in genetic epidemiology would have to use R, so it would not be a natural continuation of this blog. I will give myself a few months to think it over and if I have the energy, I will make a start on Bayesian methods in genetic epidemiology as a fresh blog in the New Year. Meantime this blog will continue, at least for a while, but perhaps with a posting every month rather than a posting every week.