This week I want to install Stan on a Windows computer and check the installation independently of Stata, so that next week we can move on to controlling Stan from within Stata.
Stan has several flavours, for instance, there is an R package that controls Stan, but for us the important version is the stand alone program cmdstan that is driven by command lines.
As I mentioned last time, all versions of Stan work by creating a C++ program, compiling and linking it and then running the program directly through the operating system. Stan is, at heart, a linux program and relies on a linux C++ compiler called gcc. So first we need a way of getting gcc to run under Windows. Working in a linux style on a Windows computer is a common requirement and there is a comprehensive way of doing it using a program called Cygwin. We do not need all of the facilities of Cygwin, so I will opt for a simpler solution based on another free program called Rtools.
Rtools is intended for R users who want to create their own packages rather like Stata users create ado files. Developers of R packages can write part of their code in C++ just as Stata developers can write part of their code in Mata, so Rtools includes gcc and all of the other components that we are going to need.
In what follows, I will assume that you are logged on to your computer as an administrator because you will need permission to create folders and modify the PATH. In my experience the installation of Stan was completely straightforward when I tried it on my home computer, but it was much more troublesome when I tried to do it on my work computer, which is linked to a University network and which has all sorts to security systems in place.
Rtools can be downloaded from http://cran.r-project.org/bin/windows/Rtools/. Currently, the most up-to-date, frozen version is Rtools32.exe and if you run this program it will install Rtools onto your computer. I have a folder called C:/Software and so I installed Rtools into C:/Software/Rtools; if you bother to look, you will find that it will contain a folder called gcc-4.6.3 that contains our compiler. During the installation you will be asked if you want to modify the Windows PATH. This is important because the PATH controls the locations were Windows will look when you ask for gcc; if you do not change the PATH then Windows will not find your C++ compiler.
The next step is to install Stan. You can obtain cmdstan from the Stan Home Page at http://mc-stan.org/. The current version is called CmdStan v2.6.2. On Windows it is simplest to download the zipped version and then to unpack it. I choose to unpack CmdStan into C:/Software/cmdstan-2.6.2, which, after unpacking, contained folders such as, doc and examples and make.
The unpacked version of cmdstan is not yet ready to run, we still need to generate the Stan binary. This step will test whether your installation of Rtools worked, because we will need a program called make that is part of Rtools.
Open a Windows command prompt; where this is located will depend on your version of Windows, it used to be under accessories but on Windows 8 it is available when you right click the icon in the extreme bottom left corner. The command prompt enables you to issue commands directly to the operating system without the irritating need to click buttons or icons.
Firstly move to the cmdstan folder using the change directory (cd) command. On my computer I needed to type
> cd c:/software/cmdstan-2.6.2
into the command prompt window. Now we issue the command to make the Stan binary. My computer has 4 cores but I decided that Stan would only be allowed to use 2 of them in case I wanted to do other things at the same time. The option -j controls the number of cores that you allow, so I typed,
> make build -j2
Then you sit back while cmdstan is created. The manual warns that this might take 10 minutes but on my computer it took under 1 minute.
This completes the installation. If it works, then it is very quick. If it doesn’t work, then it is probably a question of permissions.
Now we need to test the installation. The folder cmdstan-2.6.2 contains a subfolder called examples within which you fill find another folder called bernoulli and within that there are two files called bernoulli.stan and bernoulli.data.r. These files contain the model and data for fitting a Bernoulli model with parameter (probability) theta to 10 observations 0,1,0,0,0,0,0,0,0,1. So as we have two successes in ten trials, we expect theta to be around 0.2. The prior on theta is beta(1,1), that is to say it is flat between 0 and 1. We will dissect the contents of these files on another occasion, but for the time being we will just accept them and get them to run.
Still using the command prompt and sitting in the cmdstan-2.6.2 folder issue the command
> make c:/software/cmdstan-2.6.2/examples/bernoulli/bernoulli.exe
where, of course, you adjust the file path to suit your configuration. It is important that the path uses / in the linux style, the Windows option of using \ will not work.
This command tells Stan to take the model file bernoulli.stan, convert it into C++ code, compile it and link it with the necessary Stan modules to create a complete program called bernoulli.exe. This executable will be saved in the same folder as bernoulli.stan.
Now move to the folder containing the executable,
> cd c:/software/cmdstan-2.6.2/examples/bernoulli
If you want to check that bernoulli.exe is there, you can type
Now we need to run the analysis. We are going to accept all of the defaults so all we have to do is provide the data. The command is
> bernoulli.exe sample data file=bernoulli.data.r
This is a one parameter model on a tiny data set and the default only asks from a 1000 burnin and 1000 samples, so it is quick; less than 0.1 seconds. The output is sent to a file called output.csv in the same folder as the executable. You can open the output file with Excel or OpenOffice. Inside you will find 1,000 simulated values of theta, but irritatingly there is quite a lot else making the file awkward to read.
Next time we will start the process of controlling Stan for within Stata. Eventually, the commands that we have issued within the command prompt window will become our script file and we will need a way of extracting the values of theta from output.csv so that they can be read into Stata.