Most of the spatial data I work with begins its life as a shapefile. While there are a number of tools available for dealing with shapefiles in R, it is often easier to work in dedicated geographic information system (GIS) software such as ArcMap which is now almost exclusively oriented towards Python-based scripting. With a little help from Alex, I’ve managed to get my head wrapped around Python. The problem is that I now find myself running multiple scripts in multiple languages. When I’m writing code for personal consumption this isn’t really a problem. What usually happens is that I end up running the scripts in the wrong order and I have to start over. When it comes to providing to code to others, however, I am wary of anything that might lead to unintended errors. Consequently, I began looking into ways into which I could better integrate the Python-based scripts I use to work with geographic data with the R-based scripts I use to handle data analysis.

Perhaps the most elegant solution is to use something like RPy. I started down this road while working at home on a MacBook Pro only to have it all fall apart when I got to work where I am on a Windows-based system which isn’t compatible with either rpy or rpy2. As it turns out, the system command in R served as a viable work-around. More specifically, instead of writing a single script in Python using some variant of RPy, I wrote a master script in which I use the system command to call a separate .py file which generates shapefiles that can then be read and analyzed in R. This is basically a modification of a trick I’ve used in the past for organizing .tex files in my dissertation and .do files in Stata. The key difference is that so long as the scripts in questions can executed via the command line, it is relatively easy to use R to organize processes working across multiple platforms. I’d be interested to hear what other solutions people have come up with for dealing with this type of problem. Even if it is an ordinary conversion like pdf to word.

  • My system for handling multiple scripts is much less elegant: text files. On a day when I’m hacking away, I open a new file and keep a list of the files I’m using. E.g. collect_data.php, recode.sql, factor_analysis.r, etc. are listed in my file with some notes about what they do, and they’re listed in roughly the order they need to be run. I save the text file with the name of the project and the date so I can reference it again anytime i forget what all the scripts are doing. Notice, though, that there’s no version tracking and obviously the text file doesn’t actually “run” the scripts. I’m a luddite; I know.

    • Adam Slez

      Versioning is another good topic for discussion. Other than keeping a minimal .txt-based change log for programs I’ve posted on my site, I don’t do much beyond tacking on dates to files names. I’ll probably end up making the leap to GitHub eventually.

  • It looks like system2 has more cross-platform operability:

    I’m sort of paranoid about having one scripting language call another. I don’t know enough about R internals to say exactly what R does in this case – does it fork the R process and then the Python runs in the new address space? I really have no idea. It may not really matter that much. For my own purposes I usually just write a bash script to execute different processes sequentially and put in some conditionals if I change machines. In my case I’m usually just alternating between different Linux machines so I don’t really worry about platform-independence.

    • I’m also really into Make files…

    • Adam Slez

      I figured the bash option might make the multiplatform problem less of an issue. I have no idea about the details of what R is actually doing in this case. It might be useful to have a more general discussion of R internals at some point in the near future. We all put a lot of faith in the assumption that these things work, but this doesn’t always prove true. Plus, cribbing from internals is a good way to learn new tricks.