{"id":323,"date":"2014-10-10T08:51:00","date_gmt":"2014-10-10T08:51:00","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=323"},"modified":"2025-02-26T13:21:38","modified_gmt":"2025-02-26T13:21:38","slug":"using-r-with-stata-part-iv","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2014\/10\/10\/using-r-with-stata-part-iv\/","title":{"rendered":"Using R with Stata: Part IV"},"content":{"rendered":"<p>This is the last in a series of posting about using conducting advanced statistical analyses in Stata by sending a job to R and then reading the results back into Stata. I have spent a while on this topic and although I believe it to be very important, next time I want to return to discussing simpler matters.<\/p>\n<p>Our task for illustrating the process has been to use\u00a0an R package called DPpackage to fit a Bayesian Dirichlet process mixture (DPM) model for smoothing a scatter plot. The dataset for demonstrating the code contains 298 IgG measurements made on children of different ages.<\/p>\n<p>So far we have seen how to write a Stata ado file that prepares an R script, sends it to R and then reads back the results. To simplify the task, up until now\u00a0I have supposed that the R script includes R commands for packaging the results into Stata .dta files so that reading the results into Stata is simple.<\/p>\n<p>Many R packages create very complicated list structures of their results and so R code for extraction and re-packaging for export to Stata is not trivial. For this reason I usually adopt a different approach; I dump the entire results structure into a text file and then I use Stata ado files to read the text file and extract any elements that I want. This makes my R scripts simpler and it means that I have access to all of the results in Stata and I am not limited to those exported by the R script.<\/p>\n<p>First let\u2019s see how we dump the results to a text file. Here is the R template that I developed last time but with the lines that extract the results replaced by new code for dumping the results structure (in green).<\/p>\n<p><span style=\"color: #0000ff\">setwd(&#8216;@&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0<span style=\"color: #0000ff\">if( !require(&#8216;foreign&#8217;) ) {<br \/>\ninstall.packages(&#8216;foreign&#8217;,repos=&#8217;http:\/\/cran.ma.imperial.ac.uk\/&#8217;,dep=TRUE)<br \/>\nif( !require(&#8216;foreign&#8217;) ) stop() }<br \/>\nif( !require(&#8216;DPpackage&#8217;) ) {<br \/>\ninstall.packages(&#8216;DPpackage&#8217;,repos=&#8217;http:\/\/cran.ma.imperial.ac.uk\/&#8217;,dep=TRUE)<br \/>\nif( !require(&#8216;DPpackage&#8217;) ) stop() }<\/span><\/span><br \/>\n<span style=\"color: #0000ff\">D &lt;- read.dta(&#8216;@&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\"> zbar &lt;- c(@)<\/span><br \/>\n<span style=\"color: #0000ff\"> Sigma &lt;- matrix(c(@),2,2)<\/span><br \/>\n<span style=\"color: #0000ff\"> MyPrior &lt;- list(alpha=@, k0=@, nu1=@, m1=zbar, psiinv1=@*solve(Sigma))<\/span><br \/>\n<span style=\"color: #0000ff\"> MyDesign &lt;- list(nburn=@, nsave=@, nskip=@, ndisplay=1000)<\/span><br \/>\n<span style=\"color: #0000ff\"> xp &lt;- c(@)<\/span><br \/>\n<span style=\"color: #0000ff\"> fitIgG &lt;- DPcdensity(y=D\u00a3@, x=D\u00a3@, xpred=xp, ngrid=@,<\/span><br \/>\n<span style=\"color: #0000ff\"> compute.band=TRUE, type.band=&#8221;HPD&#8221;, prior=MyPrior, mcmc=MyDesign,<\/span><br \/>\n<span style=\"color: #0000ff\"> state=NULL, status=TRUE)<\/span><br \/>\n<span style=\"color: #008000\">sink(&#8216;@&#8217;)<\/span><br \/>\n<span style=\"color: #008000\"> dput(fitIgG)<\/span><br \/>\n<span style=\"color: #008000\"> sink()<\/span><\/p>\n<p>In R, the sink()\u00a0function redirects output to a file, the name of which will be specified later as it is represented by an @ in the template. Then I write out the\u00a0list structure of results\u00a0using the dput()\u00a0function and finally I redirect the output back to the log file. This approach makes for much simpler R code and places the emphasis of the programming where it belongs, in Stata.<\/p>\n<p>This\u00a0code saves the entire set of results to a text file, but it is not a text file that is easy to use because the structure of results is quite complex. To illustrate this here are the first few lines of the results created by this particular R script.<\/p>\n<p><span style=\"color: #0000ff\">structure(list(call = quote(DPcdensity.default(y = D$igg, x = D$age,<\/span><br \/>\n<span style=\"color: #0000ff\"> xpred = xpred, ngrid = 50, compute.band = TRUE, type.band = &#8220;PD&#8221;,<\/span><br \/>\n<span style=\"color: #0000ff\"> prior = prior, mcmc = mcmc, state = NULL, status = TRUE)),<\/span><br \/>\n<span style=\"color: #0000ff\"> y = c(1.5, 2.70000004768372, 1.89999997615814, 4, 1.89999997615814,<\/span><br \/>\n<span style=\"color: #0000ff\"> 4.40000009536743, 1.5, 2.20000004768372, 1.60000002384186,<\/span><br \/>\n<span style=\"color: #0000ff\"> 4.69999980926514, 1.60000002384186, 1.39999997615814, 3,<\/span><\/p>\n<p>The full text file has 7640 lines.<\/p>\n<p>To enable me to extract data from such files I have written two Stata ado files. One called <strong>rdescribe.ado<\/strong> that reads dumped results and lists all of the items that could be extracted and one called <strong>rextract.ado<\/strong> that extracts a requested item.<\/p>\n<p>Let\u2019s suppose that I dumped my DPcdensity results to a file called fitigg.txt. The Stata command<\/p>\n<p><span style=\"color: #0000ff\">.\u00a0rdescribe using fitigg.txt<\/span><br \/>\nwould list the names of the items in the results file but not their values. For this example the list is quite long, so I just give the first few lines.<\/p>\n<p><span style=\"color: #0000ff\">Structures and sub-structures within file: fitigg.txt<\/span><br \/>\n<span style=\"color: #0000ff\">call \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (string)<\/span><br \/>\n<span style=\"color: #0000ff\">y\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (single variable)<\/span><br \/>\n<span style=\"color: #0000ff\">coefficients\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (matrix or set of variables)<\/span><br \/>\n<span style=\"color: #0000ff\">varnames \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (single variable)<\/span><br \/>\n<span style=\"color: #0000ff\">modelname \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (single value)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 cpo \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (single variable)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 fso \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (single variable)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 prior\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0(list)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 alpha\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0(single value)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 k0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0(single value)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 m1 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (single variable)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 nu1 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (single value)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 psiinv1 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (matrix or set of variables)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Dim: 2, 2<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 mcmc \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (list)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 nburn\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0(single value)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 nsave\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0(single value)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 nskip\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0(single value)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 ndisplay \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (single value)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 state \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (list)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 alpha \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (single value)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 m1 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (matrix or set of variables)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Dim: 2, 1<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 muclus \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (matrix or set of variables)<\/span><br \/>\n<span style=\"color: #0000ff\"> \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 Dim: 398, 2<\/span><\/p>\n<p>We see that the results include an item called <strong>call<\/strong>, which is a string containing the R command that was executed, next is an item containing the values of the y variable and so it goes on.<\/p>\n<p>The results structure called <strong>prior<\/strong> contains the priors that I chose and is itself a list including items such as <strong>alpha<\/strong> and <strong>k0<\/strong>. In fact you will see that there are two items called <strong>alpha<\/strong>, one in <strong>prior<\/strong> and another in <strong>state<\/strong> (state gives the contents of the sampler at the end of the run and can be used to continue the chain if desired).<\/p>\n<p>Once we know what is in the results file we can use <strong>rextract.ado<\/strong> to extract any of the items. For instance,<br \/>\n<span style=\"color: #0000ff\">\u00a0. rextract using iggresults.txt , item(y) clear<\/span><br \/>\nWill clear any existing data from Stata\u2019s memory and extract y into a Stata variable with that name.<\/p>\n<p><span style=\"color: #0000ff\">.\u00a0rextract using iggresults.txt , item(prior(alpha))<\/span><br \/>\nExtracts the value of alpha (a single number) from the list called prior and returns it as the local r(prior_alpha).<\/p>\n<p><span style=\"color: #0000ff\">.\u00a0rextract using iggresults.txt , item(state(muclus)) matrix<\/span><br \/>\nExtracts muclus (cluster means at the end of the run) and stores them in a 398&#215;2 Stata matrix called state_mclus.<\/p>\n<p><span style=\"color: #0000ff\">.\u00a0rextract using iggresults.txt , item(state(muclus)) clear<\/span><br \/>\nAlso extracts muclus but this time stores the results in two Stata variables called state_muclus_1 and state_muclus_2.<\/p>\n<p>So there you have it. There is no need to program advanced statistical analyses in Stata or even in Mata. Instead we only\u00a0need to prepare a conversion program that enables a Stata user to select their options and that then prepares the script and sends it to R. In this way any Stata user can have access to these advanced analyses even if they have no idea how to program in R.<\/p>\n<p>The code for <strong>rdescribe.ado<\/strong> and <strong>rextract.ado<\/strong> are contained in the pdf <a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/10\/rdescribe_rextract.pdf\">rdescribe_rextract<\/a>\u00a0that you can download from here and then copy and paste into separate ado files. The programs were prepared for personal use rather than general release, but you should not have a problem getting them to work.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is the last in a series of posting about using conducting advanced statistical analyses in Stata by sending a job to R and then reading the results back into Stata. I have spent a while on this topic and although I believe it to be very important, next time I want to return to [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[33,4],"class_list":["post-323","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-r","tag-stata"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/323","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=323"}],"version-history":[{"count":9,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/323\/revisions"}],"predecessor-version":[{"id":363,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/323\/revisions\/363"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=323"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=323"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=323"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}