{"id":314,"date":"2014-09-26T08:31:42","date_gmt":"2014-09-26T08:31:42","guid":{"rendered":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/?p=314"},"modified":"2025-02-26T13:21:38","modified_gmt":"2025-02-26T13:21:38","slug":"using-r-with-stata-part-ii","status":"publish","type":"post","link":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2014\/09\/26\/using-r-with-stata-part-ii\/","title":{"rendered":"Using R with Stata: Part II"},"content":{"rendered":"<p>This is\u00a0the second\u00a0in a series of posting about conducting advanced statistical analyses in Stata by sending a job to R and then reading the results back into Stata. Our task for testing this process is to use the R package, DPpackage, to fit a Bayesian Dirichlet process mixture (DPM) model for smoothing a scatter plot. The data that we are using consist of 298 IgG measurements made on children of different ages.<\/p>\n<p>In <a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/2014\/09\/19\/using-r-with-stata-part-i\/\">my last posting<\/a> I described the R code needed to fit the model and this time\u00a0I will consider how that code can be submitted and run without leaving Stata.<\/p>\n<p>Let us first assume that we have a text file called DPC.R than contains the R script. Repeating what we had last time, the file will contain,<\/p>\n<p><span style=\"color: #0000ff\">setwd(&#8216;E:\/IGG&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\"> require(foreign)<\/span><br \/>\n<span style=\"color: #0000ff\"> require(DPpackage)<\/span><br \/>\n<span style=\"color: #0000ff\"> D &lt;- read.dta(&#8216;igg.dta&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\"> zbar &lt;- c(7,3)<\/span><br \/>\n<span style=\"color: #0000ff\"> Sigma &lt;- matrix(c(4,1,1,1),2,2)<\/span><br \/>\n<span style=\"color: #0000ff\"> MyPrior &lt;- list(alpha=5, k0=0.25, nu1=10, m1=zbar, psiinv1=7*solve(Sigma))<\/span><br \/>\n<span style=\"color: #0000ff\"> MyDesign &lt;- list(nburn=5000, nsave=5000, nskip=3, ndisplay=1000)<\/span><br \/>\n<span style=\"color: #0000ff\"> xp \u2190 c(1,2,3,4,5)<\/span><br \/>\n<span style=\"color: #0000ff\"> fitIgG &lt;- DPcdensity(y=D$igg, x=D$age, xpred=xp,ngrid=100,<\/span><br \/>\n<span style=\"color: #0000ff\"> compute.band=TRUE, type.band=&#8221;HPD&#8221;, prior=MyPrior, mcmc=MyDesign,<\/span><br \/>\n<span style=\"color: #0000ff\"> state=NULL, status=TRUE)<\/span><br \/>\n<span style=\"color: #0000ff\"> nClu &lt;- as.data.frame(fitIgG$save.state$thetasave[,7])<\/span><br \/>\n<span style=\"color: #0000ff\"> names(nClu) &lt;- &#8220;nclus&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\"> write.dta(nClu,file=&#8217;nclusters.dta&#8217;)<\/span><br \/>\n<span style=\"color: #0000ff\"> cDen &lt;- as.data.frame(cbind(fitIgG$grid,t(rbind(fitIgG$densp.m,<\/span><br \/>\n<span style=\"color: #0000ff\"> fitIgG$densp.l,fitIgG$densp.h))))<\/span><br \/>\n<span style=\"color: #0000ff\"> names(cDen) &lt;- c(&#8220;IgG&#8221;,&#8221;m1&#8243;,&#8221;m2&#8243;,&#8221;m3&#8243;,\u201dm4\u201d,\u201dm5\u201d,<\/span><br \/>\n<span style=\"color: #0000ff\"> &#8220;l1&#8243;,&#8221;l2&#8243;,&#8221;l3&#8243;,&#8221;l4&#8243;,&#8221;l5&#8243;,&#8221;h1&#8243;,&#8221;h2&#8243;,&#8221;h3&#8243;,&#8221;h4&#8243;,&#8221;h5&#8221;)<\/span><br \/>\n<span style=\"color: #0000ff\"> write.dta(cDen,file=&#8217;cDensity.dta&#8217;)<\/span><\/p>\n<p>Having created this text file, we can send it to R using Stata&#8217;s shell command. For instance, if we were to type the Stata command,<br \/>\n<span style=\"color: #0000ff\">. shell notepad<\/span><\/p>\n<p>Stata would be suspended while the operating system opens the notepad text editor. Stata would become active again when notepad is closed.<\/p>\n<p>When R is installed from <a href=\"http:\/\/www.r-project.org\/\">http:\/\/www.r-project.org\/<\/a> (R is free),\u00a0a bin directory\u00a0 is created that will include a file called R.exe. Within Stata, we can run\u00a0the R script from file DPC.R if we type,<br \/>\n<span style=\"color: #0000ff\">. shell <em>path<\/em>\/R.exe CMD BATCH DPC.R log.txt<\/span><\/p>\n<p>where the word <em>path<\/em> is replaced by the full path to R.exe on your computer. This command will suspend Stata while R runs the script as a batch job (it might take a few minutes). R sends its output, including any error messages to the log file. We can subsequently inspect this log file within Stata using,<br \/>\n<span style=\"color: #0000ff\">. type log.txt<\/span><\/p>\n<p>Now I run several different programs in this way and it is tedious to have to remember the path to each program so I store them in a file called <strong>executables.txt<\/strong>, which is saved in my PERSONAL directory.<\/p>\n<p>Here is a listing of the first 5 lines of my file executables.txt.<\/p>\n<p><span style=\"color: #0000ff\">PLINK,&#8221;C:\\Software\\PLINK\\plink.exe&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\"> FBAT,&#8221;C:\\Software\\fbat\\fbat202C_win\\fbat.exe&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\"> R,&#8221;C:\\Program Files\\R\\R-3.0.2\\bin\\x64\\R.exe&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\"> WINBUGS,&#8221;C:\\Software\\WinBUGS14\\WinBUGS14.exe&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\"> OPENBUGS,&#8221;C:\\Software\\OpenBUGS\\OpenBUGS.exe&#8221;<\/span><\/p>\n<p>The file contains the name of each program that I use, a comma, and then the path.<\/p>\n<p>Matched to this file I have an ado file called <strong>execute.ado<\/strong> that reads the required path from this file and runs the executable. The contents of execute.ado are<\/p>\n<p><span style=\"color: #0000ff\">program execute<\/span><br \/>\n<span style=\"color: #0000ff\"> gettoken PG CD : 0<\/span><br \/>\n<span style=\"color: #0000ff\"> local personal : sysdir PERSONAL<\/span><br \/>\n<span style=\"color: #0000ff\"> tempname fh<\/span><br \/>\n<span style=\"color: #0000ff\"> file open `fh&#8217; using `personal&#8217;executables.txt ,read<\/span><br \/>\n<span style=\"color: #0000ff\"> file read `fh&#8217; inLine<\/span><br \/>\n<span style=\"color: #0000ff\"> while !r(eof) {<\/span><br \/>\n<span style=\"color: #0000ff\"> tokenize `&#8221;`inLine'&#8221;&#8216;, parse(&#8220;,&#8221;)<\/span><br \/>\n<span style=\"color: #0000ff\"> if upper(&#8220;`1&#8242;&#8221;) == &#8220;`PG'&#8221; local executable &#8220;`3&#8242;&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\"> file read `fh&#8217; inLine<\/span><br \/>\n<span style=\"color: #0000ff\"> }<\/span><br \/>\n<span style=\"color: #0000ff\"> file close `fh&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\"> confirm file &#8220;`executable'&#8221;<\/span><br \/>\n<span style=\"color: #0000ff\"> shell &#8220;`executable'&#8221; `CD&#8217;<\/span><br \/>\n<span style=\"color: #0000ff\"> end<\/span><\/p>\n<p>So we could give Stata the command<br \/>\n<span style=\"color: #0000ff\">. execute R CMD BATCH DPC.R log.txt<\/span><br \/>\nand execute.ado would locate the appropriate path for R and \u2018shell\u2019 it together with the remainder of the command line.<\/p>\n<p>Our R script not only runs DPcdensity but it returns some of the output as Stata .dta files so after it has run we might plot a histogram of the number of cluster in each of the 5,000 stored iterations of the sampler.<\/p>\n<p><span style=\"color: #0000ff\">.\u00a0use nclusters.dta, clear<\/span><br \/>\n<span style=\"color: #0000ff\">. histogram nclus , start(0.5) width(1) freq title(Number of Clusters)<\/span><\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/r2fig1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-318\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/r2fig1.png\" alt=\"r2fig1\" width=\"378\" height=\"277\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/r2fig1.png 538w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/r2fig1-300x219.png 300w\" sizes=\"auto, (max-width: 378px) 100vw, 378px\" \/><\/a><\/p>\n<p>Alternatively we could plot the conditional distribution at age 5 years using,<\/p>\n<p><span style=\"color: #0000ff\">.\u00a0use cDensity.dta, clear<\/span><br \/>\n<span style=\"color: #0000ff\">. twoway (line m5 igg , lpat(solid) lcol(red)) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (line l5 igg , lpat(dash) lcol(black)) \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 (line h5 igg , lpat(dash) lcol(black)), \/\/\/<\/span><br \/>\n<span style=\"color: #0000ff\">\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 leg(off) title(Conditional distribution of IgG) ytitle(f(IgG|Age=5))<\/span><\/p>\n<p><a href=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/r2fig3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter  wp-image-319\" src=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/r2fig3.png\" alt=\"r2fig3\" width=\"385\" height=\"282\" srcset=\"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/r2fig3.png 538w, https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/files\/2014\/09\/r2fig3-300x219.png 300w\" sizes=\"auto, (max-width: 385px) 100vw, 385px\" \/><\/a><\/p>\n<p>This method can be adapted for any R analysis but it has one severe limitation. To use it, you must be able to program in R. This is fine for my own work, but is not very helpful if I am collaborating with someone who is not an experienced R programmer.<\/p>\n<p>Next time I will describe how I make the facilities of R available within Stata for people who do not know how to program R. The approach involves writing a Stata ado file that creates the R script as well as running it. In this way a Stata user only needs to install R on their computer, they do not need to know anything of its programming language.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is\u00a0the second\u00a0in a series of posting about conducting advanced statistical analyses in Stata by sending a job to R and then reading the results back into Stata. Our task for testing this process is to use the R package, DPpackage, to fit a Bayesian Dirichlet process mixture (DPM) model for smoothing a scatter plot. [&hellip;]<\/p>\n","protected":false},"author":134,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[37],"class_list":["post-314","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-r-stata-shell-executables-txt"],"_links":{"self":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/314","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/users\/134"}],"replies":[{"embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/comments?post=314"}],"version-history":[{"count":8,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/314\/revisions"}],"predecessor-version":[{"id":338,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/posts\/314\/revisions\/338"}],"wp:attachment":[{"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/media?parent=314"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/categories?post=314"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/staffblogs.le.ac.uk\/bayeswithstata\/wp-json\/wp\/v2\/tags?post=314"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}