This is not a post about Nate Silver. I promise. One of the more interesting and well-covered stories of the 2012 US Elections was the so-called “quants vs. pundits” debate that focused–unfairly, given the excellent models developed by Sam Wang, Drew Linzer, and Simon Jackman–on Nate Silver’s Five Thirty Eight forecasting model. I follow a number of social scientists on Twitter and many of their reactions to the success of these models followed along the lines of “YEAH! SCIENCE!” and “+1 for the quants!” and so on. There seemed to be real joy (aside from the fact that many of these individuals were Obama supporters) in the growing public recognition that quantitative forecasting models can produce valid results.

If the success of these models in forecasting the election results is seen as a victory for social science, why don’t sociologists emphasize the value of prediction and forecasting more? As far as I can tell, political scientists are outpacing sociologists in this area with the potential exceptions of applied demographers and Jack Goldstone. Beyond those mentioned above working in American politics, a number of forecasting political scientists easily come to mind: Jay Ulfelder, Mike Ward, Phil Schrodt, and Jay Yonamine.

I have completed graduate methods sequences at two universities, and both had some version of the “is it the job of the social scientist to predict or to explain” discussion that fell firmly on the “explanation” side. This seems like a fundamentally wrong-headed debate, though. The past decade or so has seen an emphasis on causal mechanisms, which are theoretically supposed to “travel” between cases. If one of the goals of sociological research is to identify these causal mechanisms, it follows that we should also be estimating quantitative models that have out-of-sample predictive validity. However, it seems that when this point is made, accusations of positivism and Hempel-style covering laws are quick to follow. Phil Schrodt more effectively makes a similar point in this widely read paper [PDF].

A second argument made against prediction and for “explanation” often involves theory, or a lack thereof, in predictive models. However, this argument lacks face validity. There are seemingly endless choices for how one specifies any quantitative model, predictive or not. Theory guides parameter selection. Obviously, as datasets grow in size, it becomes easier to fit models with larger numbers of parameters from a larger choice-set. This is where the dreaded phrase “data mining” often enters the discussion. However, we all know that the models that make it into published articles are never the first nor the only models that were estimated in the course of the research. Machine learning approaches to data analysis at least place a premium on out-of-sample validation. Many sociological papers are implicitly making predictions but never test them.

Acting as if published research strictly follows deductive hypothesis-testing procedures isn’t helpful or honest. Further, theories are often used in a post hoc, selective fashion to justify an interesting empirical result. Finally on this point, although well beyond the scope of this post, it may be time to have a conversation about what we mean by theory in sociology. Based on my experience at the most recent ASA meetings in Denver, “theory” is often a stand-in for “barely abstracted empirical finding from my paper.”

Certainly, the social world is complex, humans are self-aware, and predicting human behavior is not an easy task. Recent articles by Jay Ulfelder and Mike Ward take this point up from contrasting angles. However, this is a distinct argument from the question of whether we should attempt to predict/ or not. I am not arguing that, given enough data and computational power, we will be able to predict the future. I am saying, though, that sociologists are well-situated to identify those areas of social life that are characterized by empirical regularities and well-specified causal mechanisms. These exist, I am sure, outside of election results and international conflict.

Perhaps the first step is to change how we teach quantitative methods to graduate students. I was the TA for the year-long statistics sequence in my department and was very pleased when various forms of cross-validation were covered. This gets students implicitly thinking about out-of-sample prediction, but it’s also taught mostly as a way of assessing model fit.

So, where are the sociologists doing prediction and forecasting? Why aren’t there more? I’m certainly guilty–as a PhD candidate on the job market, it’s hardly easy to throw disciplinary norms out the window. Am I missing people doing exciting work in this area?

Edited to clean up a few grammatical errors.

  • You can only make predictions if you have a highly constrained model w/o much wiggle room. Suffice it to say that most sociologists (even quants) treat it as a point of principle to reject this. Hence our “assume a can opener” jokes about economists. Of course it just maybe should occur to us that there is a downside to having ideas that are flexible but intractable to operationalize.

  • I think the social sciences (including individual-focused psychology) tend to emphasize explanation so much because it’s often possible to decree that you’ve achieved a successful explanation when predictive tests would suggest you know very little. If the world seems to make more sense after a year of research, you can claim to have explained something — even if your explanation is essentially a series of definitions and some vague directional, qualitative predictions like “the more popular party will win the election”.

    The trouble with making predictions for our fields is that there are surprisingly few things that social scientists care about for which either (a) the predictors or (b) the outcomes have a standard measurement system with uncontested data sets already available. A huge literature in psychology on working memory tells us that texting while driving will increase accident rates, but we don’t have a tradition of collecting quantitative data on either variable so we can’t make quantitative predictions.

    In my mind, the problem for building predictions isn’t about the modeling culture we have (although I wholeheartedly agree with Gabriel Rossman that we tolerate far too much slack in our ad hoc assumptions), but the fact that we have shockingly few measurement devices that produce generally useful predictors or outcome variables. Voting works beautifully for Nate Silver (and for Ideal Points folks) because the data sets are well established and the things you could predict are clearly ecologically valid.

  • Pingback: Forecasting Round-Up No. 2 « Dart-Throwing Chimp()

  • If the success of these models in forecasting the election results is
    seen as a victory for social science, why don’t sociologists emphasize
    the value of prediction and forecasting more? As far as I can tell,
    political scientists are outpacing sociologists in this area

    What kinds of things would you expect people to predict they will do/feel in your space? In politics elections are a pretty obvious focal point.

  • Pingback: Big Data, Automated Textual Analysis, and Protest Events | Mobilizing Ideas()