walker_5.png-large

The graph above recently appeared as part of Scott Walker’s Twitter feed. Presumably, the idea is to suggest that under Walker’s leadership, Wisconsin has done better than the country as a whole when it comes to unemployment, though an alternative version of the ad makes it somewhat more personal, using the same basic figures to suggest that Walker—a Republican presidential candidate—is outperforming sitting Democratic president Barack Obama. In these ads, the Walker campaign repeatedly highlights the fact that the unemployment rate in Wisconsin is lower than the national average. Note, however, that the unemployment rate in Wisconsin was already lower than the national average when Walker took office. In other words, Walker inherited a good labor market. If we want to measure Walker’s effect on the Wisconsin economy, we need to look at changes in the unemployment rate over time.

Continue reading

The quote above comes from Firebaugh and Gibbs’s “User’s Guide to Ratio Variables” (1985: 718). I first ran across this article a couple of years ago but only just got around to reading it this past week. This article, along with a couple of companion pieces (Firebaugh and Gibbs 1986; Firebaugh 1988), helped to redefine what was, at that point, a nearly century-old debate dating back to an 1897 article by Karl Pearson on ratio variables and the problem of spurious correlation. The gist of “Pearson’s Paradox” is that “two ratios can be correlated even when their components are not—for example, X/Z and Y/Z can be correlated even when X, Y, and Z are not” (Firebaugh 1988: 524).* This basic fact became a point of contention among methodologists interested in, among other things, the best approach to controlling for population size when analyzing aggregate data in which the magnitude of a given outcome is at least partially driven by the size of the underlying units. While the debate itself is pretty interesting, the thing I liked best about the Firebaugh and Gibbs piece is the way in which the authors managed to clear away a significant amount of methodological underbrush using simple math.

Following Firebaugh and Gibbs (1986: 103), let’s start with a component-based model in which y is a continuous outcome, x is the predictor of interest, z is a control representing the size of the population, and \eta represents a random disturbance:

    \[ y = \beta_0 + \beta_1{x} + \beta_2{z} + \eta. \]

If we then divide everything through by z we end up with an equivalent ratio-based model:

    \[ \frac{y}{z} = \beta_0{\left(\frac{1}{z}\right)} + \beta_1{\left(\frac{x}{z}\right)} + \beta_2 + \varepsilon, \]

where \varepsilon = \eta/z. On its face, the equivalence of these two expressions seems obvious. Yet prior to the work of Firebaugh and Gibbs, much of the fight was over the difference between the component-based model described above and the following:

    \[ \frac{y}{z} = \beta^*_1{\left(\frac{x}{z}\right)} + \beta^*_2 + \varepsilon^*. \]

Simply put, the fight was driven by an attempt to adjudicate between fundamentally non-comparable models, hence the reference in the title to wasted journal space, confused readers, and solutions to phantom problems.

What Firebaugh and Gibbs ultimately show is that when we compare the component method to the equivalent ratio method (i.e. when we make the correct comparison), we find alternative estimators for the same basic model. To the extent that \sigma^2—the variance of \eta—is proportional to z^2 (i.e. to the extent that the variance of the error term is characterized by a particular form of population-related heteroscedasticity), the ratio method actually provides more efficient estimates of the parameters of interest than the corresponding component method (see Firebaugh and Gibbs 1986).** So where we once saw a potential problem, we now see a potential solution.

Even if you don’t care about ratio variables, I think that the original piece, subsequent follow ups, and exchanges with critics (namely Bradshaw and Radbill 1987) are well worth the read. This is a great example of someone thinking through the problem of model specification, as well as the implications of the often overlooked distinction between specification and estimation. There is also a serious discussion of the relationship between theory and method. More specifically, Firebaugh and Gibbs go to great lengths to emphasize that, by definition, our theoretical interests cannot help us decide between mathematically equivalent expressions. The trick, of course, is recognizing equivalent expressions when you see them.

* Firebaugh (1988: 524-526) shows that Pearson’s Paradox is a byproduct of the fact that correlation coefficients do not account for the value of the y-intercept. Pearson’s Paradox does not extend to the case of regression in which the intercept is explicitly taken into account.

** Nerdy readers may recognize the ratio method for what it is: a weighted least squares model.

I recently discovered Gary Weissman’s excellent post on Grey’s Anatomy Network of Sexual Relations and I felt inspired.  For those who haven’t heard of the television show before, Grey’s Anatomy is a widely popular, award-winning prime-time medical drama airing on ABC which has received no shortage of critical acclaim.  Meeting conventional medical drama expectations, the show quite regularly features members of its attractive cast “hooking up.”  Or so I am told.  In an effort to teach medical students some basic social network lessons, Weissman produced a network data set on the show’s sexual contacts between characters.  Though I’m not particularly fond of the show and both sexual and fictional networks lie outside my research interests, Weissman’s post served as a remarkable demonstration of network analysis for pedagogical purposes.

Continue reading