This got posted at R-bloggers last night, after the men’s 100 meter Olympic event was over. Marcus Gesmann predicted Usain Bolt’s 9.63 second result within 0.05 seconds. Even better, he did it using a simple log-linear model that didn’t control for any other factors.

Check the original article at R-bloggers, which talks more about the progression of faster running times and includes the R code used.

Last month I had the wonderful opportunity to help instruct an intensive eight-day workshop on the subject of social network analysis. Affiliated with the Sociology of Education and Science Laboratory at the Higher School of Economics—Saint Petersburg, the workshop sought to recreate the atmosphere of ICPSR summer courses. This workshop was the first of its kind in Russia to offer social networks training as a summer methods course. Continue reading

Most of the spatial data I work with begins its life as a shapefile. While there are a number of tools available for dealing with shapefiles in R, it is often easier to work in dedicated geographic information system (GIS) software such as ArcMap which is now almost exclusively oriented towards Python-based scripting. With a little help from Alex, I’ve managed to get my head wrapped around Python. The problem is that I now find myself running multiple scripts in multiple languages. When I’m writing code for personal consumption this isn’t really a problem. What usually happens is that I end up running the scripts in the wrong order and I have to start over. When it comes to providing to code to others, however, I am wary of anything that might lead to unintended errors. Consequently, I began looking into ways into which I could better integrate the Python-based scripts I use to work with geographic data with the R-based scripts I use to handle data analysis.

Perhaps the most elegant solution is to use something like RPy. I started down this road while working at home on a MacBook Pro only to have it all fall apart when I got to work where I am on a Windows-based system which isn’t compatible with either rpy or rpy2. As it turns out, the system command in R served as a viable work-around. More specifically, instead of writing a single script in Python using some variant of RPy, I wrote a master script in which I use the system command to call a separate .py file which generates shapefiles that can then be read and analyzed in R. This is basically a modification of a trick I’ve used in the past for organizing .tex files in my dissertation and .do files in Stata. The key difference is that so long as the scripts in questions can executed via the command line, it is relatively easy to use R to organize processes working across multiple platforms. I’d be interested to hear what other solutions people have come up with for dealing with this type of problem.

In a previous post I made a reference to the estimation of equilibrium effects in the context of a spatial lag model. This is a question which has received surprisingly little attention given that the standard approach to interpreting parameter estimates is generally inapplicable in this setting. The problem is that in a spatial lag model, the effect of any given variable depends on the structure of geographic relationships in the underlying data. To the extent that these relationships vary across observations, the relationship between some independent variable x and some dependent variable y varies across observations as well.
Continue reading

Loops are great. They save us lots of work and they solve all sorts of problems. Sometimes, however, there are better ways of going about things. In the first place, we are often using loops to implement matrix operations. This is important to keep in mind when working in a language such as R which allows you to handle matrices directly. Loops can also be memory-intensive, hence why R gurus tend to encourage the use of apply-style functions whenever possible. These points can be illustrated by working through the following:
Continue reading