It’s pretty apparent that race is a contentious topic in the sports media. I decided to explore popular perceptions of differential treatment of white and non-white quarterbacks in the NFL and algorithmically analyzed more than 36,000 articles from ESPN.com published over the past 17 months.

Background

Attention and concern have been growing over the way white and black quarterbacks in the NFL are portrayed by the media. Hall of Famer Warren Moon has recently claimed that black quarterbacks are stereotyped as having integrity or character issues. Geno Smith was the subject of a withering profile by Nolan Nawrocki (who published a similar profile of Cam Newton two years ago). Drew Magary questioned why no black quarterback has won a Super Bowl in 25 years.

In addition to the supposed “character” and “integrity” issues, black quarterbacks are often described as “athletic” or “mobile” while white quarterbacks are credited as having “vision” or “intelligence.” Black quarterbacks are expected to be dual threats (capable of both called passes and runs) while white quarterbacks are expected to be traditional drop-back pocket passers. Think Peyton Manning vs. Michael Vick. This is not a new observation. Salon commented on it in 2002.

Methods

To explore the extent of this difference, I used Scrapy to download the text of 36,156 articles from ESPN.com. I included all articles that mention the NFL between January 1, 2012 and May 8, 2013. I excluded Insider content as I am not a subscriber.

I identified all of the quarterbacks on NFL rosters for the 2011-2012 and 2012-2013 seasons (thanks, PFR!) as well as quarterbacks taken in the 2013 draft. I then used a (very) slightly modified version of Neal Caren’s Python project investigating how the NY Times talks about men and women. Most of it is simple plain Python with a little bit of help from the Natural Language Toolkit.

As is standard practice, each sentence in each article was converted to lower case and into a vector of individual words, removing punctuation. Each sentence was then analyzed to determine if it mentioned one of the above quarterbacks by name. Then it was coded as either mentioning a white quarterback, a non-white quarterback, or both. The words that were then used in that sentence were stored.

I ran the script iteratively to trim out noise words. Initial runs produced results consisting mainly of other players’ names, names of teams and cities, and so forth. I modified the script to remove most other names, cities, team names, and team nicknames. This was a decision that potentially affected my results, and I’ll discuss why in a moment.

Finally, I generated the list of words that are most associated with either white or non-white quarterbacks and the ratio at which they are used with each race.

Results

My findings indicate that differing language is indeed present, though not necessarily in the ways described above. Fourteen percent of all sentences in the articles were assigned as discussing one quarterback’s race or another. There were 2.4 sentences written about white quarterbacks (81,046 sentences) to every sentence written about a black quarterback (34,441 sentences — not surprising given that most quarterbacks on NFL rosters are white).

Below are the words that are most likely to be used to describe white and non-white quarterbacks.The numbers beside each word are the number of sentences that used that word to describe a white or non-white quarterback. As you can see, many of the words refer to throwing — unsurprising for a quarterback. Most of the words are generally neutral or positive (“MVP”, (hall of) “famer”, “completion”, “bonus”, “win”, etc.) Although we do see “sacked”, “sacks”, and “interceptions.” The high rankings of “neck” and “Tennesean” are almost certainly due to the Peyton Manning story. I have no idea what radio, music, and podcast are about.

Most Likely to Appear in Sentences about White QBs

Times Assigned to White QBsTimes Assigned to non-White QBsWord
2330owner/general
4390famer
25410wildcat
23111vice
2553128room
18710music
34224neck
50949radio
23028tennessean
25532executive
19525georgia
41657mvp
24040completion
25444throughout
26246overtime
25846favorite
50091thrown
20738listen
516102sacked
24048podcast
29159president
19540poor
23348rank
24952april
651136throwing
35074march
20343relationship
830177throw
1051226threw
1238269interceptions
521117throws
560128qbr
38790shoulder
40695stadium
26061bonus
19847replaced
21952completing
474113saturday
119142861quarterback
595144rating
1087264qb
37492comments
25262wins
1595401win
1645414backup
812205interception
19048et
676171home
23359board

Now, for the interesting part. What are the words used to describe non-white quarterbacks? First, note the prevalence of other position names throughout the list. Remember when I said above that I had to trim out a lot of other players’ names to get meaningful results? This was doubly true for non-white quarterbacks. Take what you will from it, but non-white quarterbacks seem much more likely to be discussed in concert with their teammates rather than alone.

Further, we see some words that support the differential treatment discussed above. “Talented”, “rushing”, “talent”, “threat”” (as in dual-threat), “dynamic”, “runs”, and “speed.” We also see a big focus on injuries (which is almost certainly because of many, many articles on RG III and Ray Lewis) as well as physical descriptions: “triceps”, “acl”, “knee”, “age”, “hamstring”, and “pounds.” The list is pretty different in significant ways, though I am pleased and surprised not to see “character”, “integrity”, or “off-the-field” in the list. Fewer words are offered than for white quarterbacks because the differences level off for non-white quarterbacks (presumably because there are many fewer sentences).

Most Likely to Appear in Sentences about non-White QBs

Times Assigned to White QBsTimes Assigned to non-White QBsWord
10154baylor
792triceps
17107acl
35158torn
73169restricted
93184linebackers
4478promising
76126tender
104164receptions
5075barry
75103safeties
100137talented
5879nose
245332rushing
197265receiving
6182opposite
314367man
265308talent
6978fill
203222tackles
213230catches
115121threat
9781001linebacker
471480knee
105107age

Limitations

Of course, any research project must also disclose its limitations. As this is an algorithmic analysis, I can’t say for certain that these differences are 100% valid. I also cannot attribute these differences to intent on the writers’ part. Second, the data only comprise about a year and a half of articles. I can’t say how this language has changed over time. Third, ESPN’s coverage of Manning(s), Brady, RGIII, Tebow, etc. are likely to skew these findings in particular ways. I don’t know how Sports Illustrated, NBC Sports, or Fox Sports cover the same issues.

For those with a little statistical know-how, notice there are not hypothesis tests or significance levels offered. However, given the disparity between many of the sentence counts, I’m confident the differences are not spurious. Finally, the method above misses those sentences that refer to quarterbacks without using their name. For instance, a sentence about Colin Kaepernick written as “[t]he young 49ers quarterback showed off his speed and athleticism by throwing for two touchdowns and running for another” wouldn’t be detected by this method.

All of my code can be found on my GitHub page. This will be cross-posted on spreadBlog as well as Bad Hessian.