This is not a post about Nate Silver. I promise. One of the more interesting and well-covered stories of the 2012 US Elections was the so-called “quants vs. pundits” debate that focused–unfairly, given the excellent models developed by Sam Wang, Drew Linzer, and Simon Jackman–on Nate Silver’s Five Thirty Eight forecasting model. I follow a number of social scientists on Twitter and many of their reactions to the success of these models followed along the lines of “YEAH! SCIENCE!” and “+1 for the quants!” and so on. There seemed to be real joy (aside from the fact that many of these individuals were Obama supporters) in the growing public recognition that quantitative forecasting models can produce valid results.
If the success of these models in forecasting the election results is seen as a victory for social science, why don’t sociologists emphasize the value of prediction and forecasting more? As far as I can tell, political scientists are outpacing sociologists in this area with the potential exceptions of applied demographers and Jack Goldstone. Beyond those mentioned above working in American politics, a number of forecasting political scientists easily come to mind: Jay Ulfelder, Mike Ward, Phil Schrodt, and Jay Yonamine.
I have completed graduate methods sequences at two universities, and both had some version of the “is it the job of the social scientist to predict or to explain” discussion that fell firmly on the “explanation” side. This seems like a fundamentally wrong-headed debate, though. The past decade or so has seen an emphasis on causal mechanisms, which are theoretically supposed to “travel” between cases. If one of the goals of sociological research is to identify these causal mechanisms, it follows that we should also be estimating quantitative models that have out-of-sample predictive validity. However, it seems that when this point is made, accusations of positivism and Hempel-style covering laws are quick to follow. Phil Schrodt more effectively makes a similar point in this widely read paper [PDF].
A second argument made against prediction and for “explanation” often involves theory, or a lack thereof, in predictive models. However, this argument lacks face validity. There are seemingly endless choices for how one specifies any quantitative model, predictive or not. Theory guides parameter selection. Obviously, as datasets grow in size, it becomes easier to fit models with larger numbers of parameters from a larger choice-set. This is where the dreaded phrase “data mining” often enters the discussion. However, we all know that the models that make it into published articles are never the first nor the only models that were estimated in the course of the research. Machine learning approaches to data analysis at least place a premium on out-of-sample validation. Many sociological papers are implicitly making predictions but never test them.
Acting as if published research strictly follows deductive hypothesis-testing procedures isn’t helpful or honest. Further, theories are often used in a post hoc, selective fashion to justify an interesting empirical result. Finally on this point, although well beyond the scope of this post, it may be time to have a conversation about what we mean by theory in sociology. Based on my experience at the most recent ASA meetings in Denver, “theory” is often a stand-in for “barely abstracted empirical finding from my paper.”
Certainly, the social world is complex, humans are self-aware, and predicting human behavior is not an easy task. Recent articles by Jay Ulfelder and Mike Ward take this point up from contrasting angles. However, this is a distinct argument from the question of whether we should attempt to predict/ or not. I am not arguing that, given enough data and computational power, we will be able to predict the future. I am saying, though, that sociologists are well-situated to identify those areas of social life that are characterized by empirical regularities and well-specified causal mechanisms. These exist, I am sure, outside of election results and international conflict.
Perhaps the first step is to change how we teach quantitative methods to graduate students. I was the TA for the year-long statistics sequence in my department and was very pleased when various forms of cross-validation were covered. This gets students implicitly thinking about out-of-sample prediction, but it’s also taught mostly as a way of assessing model fit.
So, where are the sociologists doing prediction and forecasting? Why aren’t there more? I’m certainly guilty–as a PhD candidate on the job market, it’s hardly easy to throw disciplinary norms out the window. Am I missing people doing exciting work in this area?
Edited to clean up a few grammatical errors.