28 responses to “Stata vs R”

  1. Antoine Baldassari

    Hi John, interesting post. I am a genetic epidemiology PhD student at the University of North Carolina, and I am going through a similar transition from Stata/mata to R, motivated by the excellent ggplot2 package, git integration and, as you mentioned, output control.

    With that being said, I am very glad I got started with Stata. Perhaps its greatest feature is the stellar quality of its help files (whoever is in charge of this at Statacorp deserves a raise, a yacht and a thank-you card). The manual provides excellent statistical overviews (the preface to their new Bayesian suite is a succinct yet effective introduction), and intuitively introduces commands so that newcomers can easily understand the tools they work with, and implement them without much frustration. Further, since all main packages are regulated by Statacorp, syntax and options are highly consistent across functions, allowing new users to quickly form up intuition.

    When I got started with statistical computing (from mostly C), I also loved the clarity of Stata’s interface, helpful error codes, easy logging, and default settings encouraging smart practices (such as modular programming). While Stata lacked project-management tools at the time, this has now been rectified somewhat (better than out-of-the-box R in my opinion).

    I’ll end this by mentioning that SAS is a miserable piece of software maintained by what seem to be time-travelling FORTRAN enthusiasts. I rank SAS somewhere between paper-and-pencil and the google calculator.

  2. Kredittkort

    I learned Stata at the economics course I took at University of Essex, but I have since learned R myself, and I can’t for the love of God understand how anyone would think teaching Stata over R is a good idea. Outside academia, Stata is not widely used, at least not compared to R, in my experience. There also seem to a be a trend in the energy sector at least, to gradually use R more. Being familiar and comfortable with using R, will thus be a key skill for many jobs in the very near future.

    PS. Great article, I found it on Google by searching for Stata VS R as I am involved in a Facebook argument over the matter. 😀

  3. Kristin

    I’m a SAS programmer who is trying to learn R because the healthcare agency I work for cannot afford SAS. But I’m finding R very difficult to learn from a book and thought about switching to STATA. I’m interested in something that’s easy to learn because I’ll need to teach the non-statistical people on my team to use it for basic data manipulation, freqs, etc. Would you recommend that I push forward with R and maybe it will eventually click with me? Or do you think STATA would be better since the others on my team are not well-versed with programming and statistics?

  4. Noman Paracha

    I’m also slowly moving towards R and found datacamp.com extremely useful. They teach you online using R studio which is helpful. I still use Stata from time to time as i find the user interface a lot nicer and intuitive. But i think as mentioned above the gap between R and Stata is increasing day by day and opting for R may be a better option for the future.

  5. Nigel

    I also came to this article through a ‘Stata vs. R’ web search, and it’s very enlightening, so thank you!

    am currently trailing both of these (along with a few also-ranks that have now fallen out of the race) for my, admittedly limited, post-hoc data analysis needs.

    The thing that I like a bout Stata is that, as a non-statistician, it is realively easy to get results for simple analyses (one you get around the overcrowded menu system). But it is very much lacking in some of the post-hoc tests that I need.

    With R, on the other hand, I can run these tests and do so relatively confidently, with the hand-holding of RStudio, but do you think I can get my head around the complexity of something as simple as quantiles over several groups? It seems that R makes some of the most straightforward procedures almost willfully obtuse. And there are several ways of achieving what appears to be the same goal, but differently. In some respects I suspect that this might be a case of too many cooks and a lack of a true systematic approach to development.

    I just wish that Stata would support those couple of missing tests and I wouldn’t even have to think about R,

  6. Hypersphere

    Thank you for the thoughtful post. You have captured the essence of the R dilemma for those approaching R for the first time:

    “R on the other hand requires a lot of basic skills before you can do even the simplest analysis but comes into its own for more complex tasks.”

    Therein lies much of the barrier to learning R. Those who are not committed to getting to the later stages of using it for complex tasks tend to go away in frustration when they cannot figure out how to do the simple things. Thus, many people will fall back to using more limited menu-driven programs or hybrids like Stata.

    It would certainly be in the best interests of Stata and its customers to embrace R rather than continuing in its efforts to compete against it. I see a similar thing happening in the physical science world, where there is competition between Matlab and R and/or Matlab and Python. Whereas some scientific graphics and data visualization programs like Origin Pro have provided an interface to R, Matlab continues to assert its superiority — such hubris is surely self-destructive and ultimately not a good way to further the aims of the company or its clients.

  7. Tom

    Very interesting article (and ditto comments). I’m a student finishing off my degree in statistics and economics. I started with STATA when i was still in my econ bachelors, but now in my statistics classes its all about R.

    Another reason why I think that R will eventually “take over”, so to speak, is that not only fields of study or business but also academia and business in general are getting more and more intertwined. Some industries/fields are switching quickly to R, where new methods, techniques and packages are created. Once some of these methods trickle down to other fields, they will want to use R as well. R can be used for basicly anything and will therefore have a much higher adoption and development rate.

    A final reason is the fact that the world is globalizing very fast and income differences are large. For us the Stata License fees are bothersome, but nothing to worry about. Someone doing research or business in a poor country however, will not want to pay a year worth of food for some software.

    Once R is basicly a staple in academia as well as in business, who will want to learn Stata?

  8. Rob

    Totally agree about the strengths/weakness comparison outlined here as a teacher of an upper level undergrad/masters econometrics class. I especially love the Rmarkdown-knitr-latex-pdf capability. Historically I have been using stata (for teaching only) and actively use Python and Matlab (and R to a lesser degree) in my own research.

    Seriously contemplated converting the class to R and began porting stata code over to R. I found that things economists love about stata- easy clustering for standard errors, robust standard errors, marginal effects and elasticities can be done in R, after lots of code and jumping through hoops. Of course, I found code snippets on the web for doing this, but it just works in stata.

    While also probably alot less computationally efficient than R, I find the mata environment a better intuitive approach toward linear algebra programming (which I require the students to do) than R.

    Not trying to start a flame war, because I really wanted to do this class in R, but thought it more important for students to focus on the conceptual material rather than peeling time away for coding. I have no doubts a student who knows the conceptual material and masters R will have better job market prospects over the same student who knows stata.

  9. Lian Dee

    I was a molecular biologist in cancer genetics and have been using SPSS or Graphpad Prism for my statistical analysis in the past. Recently, I have gone into a career in medical education research and SPSS/Graphpad Prism is not readily available in my organization. Therefore, I have jump started the use of Rstudio but have been progressing very slowly. While considering whether to stick with Rstudio or switch to STATA to learn and advance my statistical analysis, I came to this website through the search of ‘STATA vs R’. My struggle with R was that it took me a lot of time finding info on getting one single command to either tidy up my data or to run one analysis. Currently, I am interested to learn CFA/PCA/SEM analysis to advanced my current work in education research, would you recommend me to stick with R or make a switch to STATA?

  10. Martin

    Actually lets hope that you are right and STATA disappears so i dont have to learn another statistics package when i already learned R.

  11. Carl

    Nice discussion above.

    I’ve been looking around for quantitative studies comparing different statistical software and haven’t found any. The best studies would specify some approximation of a population of tasks or uses of statistical software. Then it would randomly sample from that “population.” Then it would assess how many lines of code are necessary to perform the tasks (or menu selections, etc.) or other metrics of how long it takes to complete tasks. Best of all would be to produce some type of prediction model that would identify the likelihood that a given task would be more efficiently done with one software versus another. Or perhaps different areas of science have subtle differences (one has more interaction variables, one has more dummy variables, one has more time series, etc., some of these differences are widely known) which would imply one software is more efficient than another.

    Perhaps stata is better for data management and R is better for statistical analysis. Perhaps different tasks are done better by one or the other.

    Even if there are a mere 100,000 people using statistical software for 100 hours a year, that’s 1,000,000 hours of work. If a rigorous, quantitatively based study could bring the efficiency with which these people work by just 1%, that’s a 10,000 hour gain, more than enough to justify the resources that would go into such a series of studies. The fact there are no such studies is a sign of a collective action problem / a system of incentives for individual researchers that is not conducive to maximizing the good.

    Instead of a few different statistical environments (R, stata, etc.) perhaps there should be a large number of radically different environments, that one individual would know 10 or so of, because the different approaches are so well suited for different tasks. Again, perhaps a sign of a collective action problem that numerous such tools aren’t being developed, but then again, a lot of people are creating all kinds of packages in R.

    As far as how students react, etc. You have to live in the real world and make things work in the real world. But individuals’ subjective reaction to a software is not necessarily a good indication of whether paying the cost of learning the software (perhaps over “the medium term / five years”) is outweighed by the increased efficacy of the software over the lengthy amount of time the veteran statistician uses it. The impact of just a 1% rise in efficiency over a 30 year career is enormous, and may justify a “steep learning curve.”

    Thanks

  12. Amel Euldji

    I find this article so helpful, thank you Professor. I’m learning R programming and i had a doubt that stata is better for learning, but now i’ll keep be attached to R, but when i’ll have time, i’ll take a look at stata, it won’t be a waste of time of course ^^.

  13. Carl

    I’m the one that left the December 19, 2016 comment. Yes, it would be hard to quantify whether R or STATA is better. But quantification is inescapable. If I choose STATA over R, I’ve implicitly said that the utility of using STATA is greater than the utility of using R. That is a quantification. So the question is, how do we bring down measurement error? Casual assessments of utility are going to have more measurement error than a rigorous attempt at measuring utility. There should at least be recommendations for people based on more rigorous evidence given a specific type of task: if you want to do X, then STATA users did the task in 4.5 hours on average, R users in 6.0 hours on average, etc.

  14. Joe

    Thanks John for this article. As somebody starting out in statistics and needing to choose a software package to work with this has been very useful.

  15. John Le Quesne

    Dear John,

    I was delighted to stumble across your tackling of this question. I love STATA, and learned what little statistics I have managed grasp with its help. I use it, in Leicester as it happens, mostly for fairly ‘standard’ survival analyses in patient data. But I am starting to feel major shortcomings, especially when trying to handle whole genome gene expression data from cell culture experiments, and when trying to make beautiful graphs.

    I think I will continue to use it for now for survival analyses, but I need to build up some skills in r. Where should I start? The tidyverse?

    BW,

    John

Leave a Reply to Lian Dee Click here to cancel reply.

Network-wide options by YD - Freelance Wordpress Developer