Greetings, everyone.  We are delighted to have been invited to author our first Bad Hessians guest post.  We are a couple of graduate students in the sociology department at University of North Carolina – Brandon Gorman and Charles Seguin.  Our post is about a project we began last year after we noticed that, during the Arab Spring, between January 25th and February 11th 2011, western media completely shifted from describing Hosni Mubarak as a “key US ally” to an “entrenched dictator.”  This made us wonder – what structures US media attention to foreign leaders?

Scholars have studied media coverage of foreign leaders and countries in the past, but these studies have generally been based on single cases or small-n comparisons.  We wanted to look at coverage of a wide range of foreign leaders across a long time span.  In order to gather these data, we wrote a Python script to scrape the New York Times, alongside other media outlets, for mentions of foreign leaders from 1950 to 2008.  We focused on countries contained in the Correlates of War Project datasets.  These include 165 countries—from small island nations to the world’s largest and most influential countries.

In order to scrape these media outlets, we first had to build a database of foreign leaders.  We began by using the Archigos dataset, which provides names of foreign leaders and the years in which they were in power.  The names in this dataset formed basic search terms, which then had to be refined to match specific spellings of leaders’ names used in the media.  Iran’s Mohammad Mossadeq, for example, is referred to alternatively as “Mohamed Mossadeq”, “Mohammad Mossadegh”, “Mohammad Mosadeq”, etc.  Likewise, one Armenian leader, Levon Ter-Petrosyan, had four different spellings of his last name.  We’re very close to having a finalized dataset here (after far too many iterations) but there may be a few errors to deal with.

In the first stage of this project, we are looking strictly at the quantity of media attention as opposed to framing or sentiment analysis – although in later stages we plan to categorize the full text of the articles using text classification.  For this stage, we scraped the yearly number of articles for each foreign leader and merged these data with other political science datasets such as the CoW, PITF, and Polity IV. We’re using these data to test a number of hypotheses related to media attention of foreign leaders, as well as more general sociological theories of media attention and politics; for now we’ll just give some descriptive tables.

Here are the thirty most covered leaders in the New York Times from 1950-2008:

Notice that while many of these leaders are heads of large, powerful countries such as the USSR, many are not.

We can break this down further into the leader-years with the most coverage. Each entry in the next table represents the number of articles mentioning a specific leader in a given year.

At a glance, these years and leaders make a lot of sense. Years when the US was at war with Iraq saw high coverage to Saddam Hussein. We can also see the influence of the Cuban Missile Crisis with Kruschev and Castro in the early 1960s, Nasser and the Suez Crisis in the mid-1950s, Sadat and the Egyptian-Israeli peace treaty in the late 1970s, and so forth.

Finally the relative share of attention to different regions has changed over time.  The figure below shows the percentage of New York Times articles mentioning foreign leaders  by region. Note that this is both a function of the (generally increasing) number of nation-states within these regions over time and the amount of attention each leader is getting.

Europe tends to dominate the field, but there are notable exceptions.  Notice the spike in coverage of Middle Eastern leaders around 1979 during the oil crisis, and again around 2003 during the second Gulf War.  It seems like there may be a trend towards more diversity in coverage – but it’s hard to parse out in figures like these because of countries popping in-and-out-of-existence between 1950 and 2008 – notice, for instance, the rise in attention share to Sub-Saharan Africa after decolonization.

This is just a teaser.  You’ll have to wait for next year’s ASA for our real results.  We know it’s hard but trust us – it’ll be worth the wait.


  • C. Anderson

    Wow, what an interesting analysis. Nothing too unexpected, but a dataset like this really gives some perspective on what’s going on around the world and who we egocentric Americans are thinking about. You have a great writing voice too; reminds me of Ezra Klein.

    I’d love to see a comparison between different media outlets, or even different media entirely. For instance, what does the Washington Post or the WSJ have to say versus the NYT? I guess you might run in to noise there due to unequal length of the publications. I’ll leave it to you stats folks to figure out how to control for that!

    I’d love to know of TV presents a different spread than print does, or especially social media with the recent Arab Spring. What about country-to-country coverage comparison? I am so happy that there are so many wonks in the world. More science please!

  • Charles Seguin

    Thanks for the kind words about the post! We have done a little bit of comparison with the Washington Post, Los Angeles Times and the Chicago Tribune. There’s a few differences, but overall they’re pretty highly correlated. Of, course at this point we’ve only looked at the amount of attention, so they could be saying different things about the same people. We’re still working on putting together a dataset for television news, and social media is the final frontier 🙂

  • Laura

    Interesting analysis. How does your python script work for articles before 1980? What database are you using? From my knowledge the text version of New York Times articles only start in 1980–before that they are scanned images. This is certainly true of ProQuest and LexisNexis, and the New York Times website itself. If you used ProQuest, how did you deal with their inconsistent archiving of New York Times articles over the years?

  • Charles Seguin

    Hey Laura. For some databases you need different URLs for pre and post 1980 searches. You’re totally right about the full text PDFs until 1980–they’re all images and even after 1980 many are not full text; we haven’t had to deal with this for the descriptive results above since we’ve just been analyzing the number of articles, rather than their content. We’re working right now on converting these images to text for later analysis. Shoot me an email for the specifics of the script and database.