It’s pretty apparent that race is a contentious topic in the sports media. I decided to explore popular perceptions of differential treatment of white and non-white quarterbacks in the NFL and algorithmically analyzed more than 36,000 articles from ESPN.com published over the past 17 months.
Background
Attention and concern have been growing over the way white and black quarterbacks in the NFL are portrayed by the media. Hall of Famer Warren Moon has recently claimed that black quarterbacks are stereotyped as having integrity or character issues. Geno Smith was the subject of a withering profile by Nolan Nawrocki (who published a similar profile of Cam Newton two years ago). Drew Magary questioned why no black quarterback has won a Super Bowl in 25 years.
In addition to the supposed “character” and “integrity” issues, black quarterbacks are often described as “athletic” or “mobile” while white quarterbacks are credited as having “vision” or “intelligence.” Black quarterbacks are expected to be dual threats (capable of both called passes and runs) while white quarterbacks are expected to be traditional drop-back pocket passers. Think Peyton Manning vs. Michael Vick. This is not a new observation. Salon commented on it in 2002.
Methods
To explore the extent of this difference, I used Scrapy to download the text of 36,156 articles from ESPN.com. I included all articles that mention the NFL between January 1, 2012 and May 8, 2013. I excluded Insider content as I am not a subscriber.
I identified all of the quarterbacks on NFL rosters for the 2011-2012 and 2012-2013 seasons (thanks, PFR!) as well as quarterbacks taken in the 2013 draft. I then used a (very) slightly modified version of Neal Caren’s Python project investigating how the NY Times talks about men and women. Most of it is simple plain Python with a little bit of help from the Natural Language Toolkit.
As is standard practice, each sentence in each article was converted to lower case and into a vector of individual words, removing punctuation. Each sentence was then analyzed to determine if it mentioned one of the above quarterbacks by name. Then it was coded as either mentioning a white quarterback, a non-white quarterback, or both. The words that were then used in that sentence were stored.
I ran the script iteratively to trim out noise words. Initial runs produced results consisting mainly of other players’ names, names of teams and cities, and so forth. I modified the script to remove most other names, cities, team names, and team nicknames. This was a decision that potentially affected my results, and I’ll discuss why in a moment.
Finally, I generated the list of words that are most associated with either white or non-white quarterbacks and the ratio at which they are used with each race.
Results
My findings indicate that differing language is indeed present, though not necessarily in the ways described above. Fourteen percent of all sentences in the articles were assigned as discussing one quarterback’s race or another. There were 2.4 sentences written about white quarterbacks (81,046 sentences) to every sentence written about a black quarterback (34,441 sentences — not surprising given that most quarterbacks on NFL rosters are white).
Below are the words that are most likely to be used to describe white and non-white quarterbacks.The numbers beside each word are the number of sentences that used that word to describe a white or non-white quarterback. As you can see, many of the words refer to throwing — unsurprising for a quarterback. Most of the words are generally neutral or positive (“MVP”, (hall of) “famer”, “completion”, “bonus”, “win”, etc.) Although we do see “sacked”, “sacks”, and “interceptions.” The high rankings of “neck” and “Tennesean” are almost certainly due to the Peyton Manning story. I have no idea what radio, music, and podcast are about.
Most Likely to Appear in Sentences about White QBs
Times Assigned to White QBs | Times Assigned to non-White QBs | Word |
---|---|---|
233 | 0 | owner/general |
439 | 0 | famer |
254 | 10 | wildcat |
231 | 11 | vice |
2553 | 128 | room |
187 | 10 | music |
342 | 24 | neck |
509 | 49 | radio |
230 | 28 | tennessean |
255 | 32 | executive |
195 | 25 | georgia |
416 | 57 | mvp |
240 | 40 | completion |
254 | 44 | throughout |
262 | 46 | overtime |
258 | 46 | favorite |
500 | 91 | thrown |
207 | 38 | listen |
516 | 102 | sacked |
240 | 48 | podcast |
291 | 59 | president |
195 | 40 | poor |
233 | 48 | rank |
249 | 52 | april |
651 | 136 | throwing |
350 | 74 | march |
203 | 43 | relationship |
830 | 177 | throw |
1051 | 226 | threw |
1238 | 269 | interceptions |
521 | 117 | throws |
560 | 128 | qbr |
387 | 90 | shoulder |
406 | 95 | stadium |
260 | 61 | bonus |
198 | 47 | replaced |
219 | 52 | completing |
474 | 113 | saturday |
11914 | 2861 | quarterback |
595 | 144 | rating |
1087 | 264 | qb |
374 | 92 | comments |
252 | 62 | wins |
1595 | 401 | win |
1645 | 414 | backup |
812 | 205 | interception |
190 | 48 | et |
676 | 171 | home |
233 | 59 | board |
Now, for the interesting part. What are the words used to describe non-white quarterbacks? First, note the prevalence of other position names throughout the list. Remember when I said above that I had to trim out a lot of other players’ names to get meaningful results? This was doubly true for non-white quarterbacks. Take what you will from it, but non-white quarterbacks seem much more likely to be discussed in concert with their teammates rather than alone.
Further, we see some words that support the differential treatment discussed above. “Talented”, “rushing”, “talent”, “threat”” (as in dual-threat), “dynamic”, “runs”, and “speed.” We also see a big focus on injuries (which is almost certainly because of many, many articles on RG III and Ray Lewis) as well as physical descriptions: “triceps”, “acl”, “knee”, “age”, “hamstring”, and “pounds.” The list is pretty different in significant ways, though I am pleased and surprised not to see “character”, “integrity”, or “off-the-field” in the list. Fewer words are offered than for white quarterbacks because the differences level off for non-white quarterbacks (presumably because there are many fewer sentences).
Most Likely to Appear in Sentences about non-White QBs
Times Assigned to White QBs | Times Assigned to non-White QBs | Word |
---|---|---|
10 | 154 | baylor |
7 | 92 | triceps |
17 | 107 | acl |
35 | 158 | torn |
73 | 169 | restricted |
93 | 184 | linebackers |
44 | 78 | promising |
76 | 126 | tender |
104 | 164 | receptions |
50 | 75 | barry |
75 | 103 | safeties |
100 | 137 | talented |
58 | 79 | nose |
245 | 332 | rushing |
197 | 265 | receiving |
61 | 82 | opposite |
314 | 367 | man |
265 | 308 | talent |
69 | 78 | fill |
203 | 222 | tackles |
213 | 230 | catches |
115 | 121 | threat |
978 | 1001 | linebacker |
471 | 480 | knee |
105 | 107 | age |
Limitations
Of course, any research project must also disclose its limitations. As this is an algorithmic analysis, I can’t say for certain that these differences are 100% valid. I also cannot attribute these differences to intent on the writers’ part. Second, the data only comprise about a year and a half of articles. I can’t say how this language has changed over time. Third, ESPN’s coverage of Manning(s), Brady, RGIII, Tebow, etc. are likely to skew these findings in particular ways. I don’t know how Sports Illustrated, NBC Sports, or Fox Sports cover the same issues.
For those with a little statistical know-how, notice there are not hypothesis tests or significance levels offered. However, given the disparity between many of the sentence counts, I’m confident the differences are not spurious. Finally, the method above misses those sentences that refer to quarterbacks without using their name. For instance, a sentence about Colin Kaepernick written as “[t]he young 49ers quarterback showed off his speed and athleticism by throwing for two touchdowns and running for another” wouldn’t be detected by this method.
All of my code can be found on my GitHub page. This will be cross-posted on spreadBlog as well as Bad Hessian.