As it turns out, logistic regression is much harder than it looks. Actually, the hard part is trying to compare the results of logistic regression across models. The basic gist of the problem is that the coefficients produced by a run-of-the-mill logistic regression are affected by the degree of unobserved heterogeneity in the model, thus making it difficult to discern real differences in the true effect of a given variable or set of variables from differences induced by changes in the degree of unobserved heterogeneity.

To see how this works, let’s imagine that the values of a given binary outcome is driven by the following data generating process:

where refers to an unobserved latent variable ranging from to which depicts the underlying propensity for a given event to occur, represents the effect associated with the th independent variable , and represents an adjustment factor which allows the variance of the error term to be adjusted up or down.

Since is unobservable, the latent variable model can’t be estimated directly. Instead, we take the latent variable model as a point of departure and treat —which we can observe—as a binary indicator of whether or not the value of is above a given threshold . By convention, we typically assume that . If we further assume that has a logistic distribution such that and , we find with a little bit of work that

This should look familiar—it is the standard logistic regression model. If we had assumed took on a normal distribution such that and , we would have ended up with a probit model. Consequently, anything I say here about the logistic regression applies to probit models as well.

The relationship between the set of “true” effects and the set of estimated effects is as follows:

Simply put, when we estimate an effect using logistic regression, we are actually estimating the ratio between the true effect and the degree of unobserved heterogeneity. We can think about this as a form of implicit standardization. The problem is that to the extent that the magnitude of varies across models, so does the metric according to which coefficients are standardized. What this means is that the magnitude of can vary across models even when the the magnitude of the true effect remains constant.

The implication here is that we can’t get away with the usual trick of comparing a series of nested models to determine the way in which the inclusion of controls affects the parameter estimates associated with a given variable of interest. Moreover, we can’t compare group-specific models unless we are willing to assume groupwise homoscedasticity. The latter principle also extends to the interpretation of interaction effects within a single model. In other words, unobserved heterogeneity can pose big problems in the context of logistic regression.

Perhaps somewhat surprisingly, discussion of this issue goes back at least as far as Winship and Mare (1984) who proposed a solution based on the use of a standardized dependent variable. Alternative solutions have since been proposed by Allison (1999), Williams (2009), and, most recently, Karlson et al. who have a paper forthcoming in *Sociological Methodology*. In addition to providing a nice overview of this line of work, Mood (2010) discusses a number of other solutions including the use of linear probability models. While the linear probability model is not without its problems, it is easy to estimate and interpret. Moreover, the problems that does have are often easily remedied without turning to logistic regression.