{"id":266,"date":"2014-09-05T08:50:19","date_gmt":"2014-09-05T08:50:19","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=266"},"modified":"2025-02-26T13:21:38","modified_gmt":"2025-02-26T13:21:38","slug":"stata-and-advanced-statistical-methods","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2014\/09\/05\/stata-and-advanced-statistical-methods\/","title":{"rendered":"Stata and Advanced Statistical Methods"},"content":{"rendered":"<p>My recent postings have been\u00a0on Bayesian non-parametric analysis with Dirichlet processes and they have raised a couple of questions in my mind that I should now like to discuss.<\/p>\n<p><em><strong>Is Stata a suitable vehicle for advanced statistical analysis?<\/strong><\/em><br \/>\nand if so,<br \/>\n<em><strong>Who should do the programming?<\/strong><\/em><\/p>\n<p>Dirichlet processes (DPs) provide very flexible Bayesian priors with an infinite number of parameters and as such they are quite complex and they require special algorithms. Despite this, they have been widely used and they are the subject of a lot of interesting\u00a0recent research.<\/p>\n<p>I tried to pitch my\u00a0postings at what I consider to be a typical Stata user; in my mind this means someone who is keen to use\u00a0the best\u00a0statistical models but who is not themselves developing new theory or designing new algorithms. So I wrote my postings to explain the principles of a DP and I took a very simple example to show how a Gibbs sampler written in Stata could be used to fit a DP model. My hope was that, if the reader understood the principles, then they would be in a better position to follow more\u00a0complex applications.<\/p>\n<p>What next? Should I provide Stata programs for other DP applications? Or would it be better to advise a Stata user interested in such advanced analyses to learn R, because R is the language used by most of the statisticians who are developing these methods. There is, for instance, a package in R called, rather boringly, DPpackage, that already contains a host of programs for different Bayesian analyses with DP priors.<\/p>\n<p>Before I get too negative let me comment on a\u00a0few of the positives about using Stata. First, the interface is much better than R, better even than RStudio. Second Stata is more uniformly reliable than R; R does not have the equivalent of StataCorp to produce new commands and so has to rely on its users to do all of the\u00a0applications programming. Next, Mata is fantastic and gives Stata a big advantage over many other statistics programs in that we can use it to produce fast compiled code in a fully\u00a0integrated way. So a suite of Mata programs for DP models would be fast and perfectly practical. Finally, Stata has a large number of users doing important applied work who would benefit from access to such advanced methods, but\u00a0these researchers\u00a0might never\u00a0use methods like DPs if it means transferring to R. Statistical programs have an important educational role\u00a0because they introduce non-specialist users to new methods of analysis.<\/p>\n<p>Despite these positives, there is no way that a I have the time to duplicate all of the work that went into DPpackage in order to create an equivalent set of programs in Stata and even if I did have the time, it is questionable whether this would be a good way forward.<\/p>\n<p>One alternative would be for StataCorp to take on this programming task. They have the resources and would, I\u2019m sure, do a fine job, but DP models are just one of a myriad of advanced statistical techniques and if they were to join the queue, they would have relatively low priority and we would probably be at Stata release 50 before they saw the light of day. As yet Stata does not even have an integrated program for basic Bayesian analysis, which is why I wrote\u00a0the book on which this blog is based.<\/p>\n<p>So what is the way forward? Well, in my opinion, no statistics package can ever hope to offer everything and so it should not try. When I want to run a basic Bayesian analysis I might use WinBUGS or OpenBUGS, programs specifically written for that purpose and so, to facilitate that, I wrote a set of Stata commands that make it easy to use Stata to configure the data for WinBugs and then call the program and read the results back into Stata. This model of working could be more widely used.<\/p>\n<p>In my day job, I analyse a lot of genetic data so I have\u00a0written a\u00a0Stata program that enables me to send an analysis to a program called PLINK and to read the results back into Stata. So let\u2019s not try to duplicate R\u2019s DPpackage in Stata, instead let us write a program that creates an R job within Stata and sends it to R and then reads back the results. The key would then be to make this interface as smooth as possible.<\/p>\n<p>Under this\u00a0model of working, Stata becomes the control centre sending jobs to whatever software is most suitable for that particular computation.<\/p>\n<p>Can Stata talk to R? Certainly it can, I know because I regularly use this approach. The\u00a0limitation is that unless you can program in the R language, the interface between the two packages needs to do that work for you; effectively\u00a0you would need a Stata ado file that understands how to prepare the R script for the particular task that you want to perform.<\/p>\n<p>My applications are much simpler.\u00a0R can read Stata .dta files so I do not need to prepare special data files. I write a text file containing the R script because I know the R language and then\u00a0I call R from within Stata and I read the results into Stata\u00a0via a text file. I could, of course, just use R and drop Stata entirely (as sometimes I do), but I like working in Stata particularly when I want to explore my data or summarize the results and anyway, Stata is useful when I am collaborating with colleagues who do not know R.<\/p>\n<p>To help a Stata user who is not familiar with R,\u00a0I\u00a0could write a Stata ado file that understands the options in DPpackage and which creates\u00a0the desired R job and runs it. This would certainly be quicker and easier for me than it would be to re-write the whole DPpackage.<\/p>\n<p>If I could request two things that would really increase the scope of Stata, they would be; integrated methods for communicating with other statistical programs and some way of handling very large data sets. With these facilities we would not need to duplicate the code for every specialised application that takes our fancy. Perhaps though, this would be against the interests of StataCorp, after all who would upgrade to Stata15 if they already had access to all of its new features by calling equivalent functions available in R?<\/p>\n<p>Over\u00a0my next few postings I will take another example\u00a0that uses\u00a0a Dirichlet process prior and show how it can be analysed using Stata in combination with the DPpackage in R. Then I will draft a Stata ado file that automates the process.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>My recent postings have been\u00a0on Bayesian non-parametric analysis with Dirichlet processes and they have raised a couple of questions in my mind that I should now like to discuss. Is Stata a suitable vehicle for advanced statistical analysis? and if so, Who should do the programming? Dirichlet processes (DPs) provide very flexible Bayesian priors with [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[34,33,4],"class_list":["post-266","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-advanced-statistical-methods","tag-r","tag-stata"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/266","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=266"}],"version-history":[{"count":8,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/266\/revisions"}],"predecessor-version":[{"id":292,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/266\/revisions\/292"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=266"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=266"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=266"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}