This week, the Global Data on Events, Location, and Tone, or GDELT dataset went public. The architect of this project is Kalev Leetaru, a researcher in library and information sciences, and owes much to the work of Phil Schrodt.
The scale of this project is nothing short of groundbreaking. It includes 200 million dyadic events from 1979-2012. Each event profiles target and source actors, including not only states, but also substate actors, the type of event drawn from the Schrodt-specified CAMEO project, and even longitude and latitude of the event for many of the events. The events are drawn from several different news sources, including the AP, AFP, Reuters, and Xinhua and are computer-coded with Schrodt’s TABARI system.
To give you a sense how much more this has improved upon the granularity of what we once had, the last large project of this sort that hadn’t been in the domain of a national security organization is King and Lowe’s 10 million dyadic events dataset. Furthermore, the dataset will be updated daily. And to put a cherry on the top, as Jay Ulfelder pointed out, it was funded by the National Science Foundation.
For my own purposes, I’m planning on using these data to extract protest event counts. Social movement scholars have typically relied on handcoding newspaper archives to count for particular protest events, which is typically time-consuming and also susceptible to selection and description bias (Earl et al. 2004 have a good review of this). This dataset has the potential to take some of the time out of this; the jury is still out on how well it accounts for the biases, though.
For what it’s worth, though, it looks like it does a pretty bang-up job with some of the Egypt data. Here’s a simple plot I did across time for CAMEO codes related to protest with some Egyptian entity as the source actor. Rather low until January 2011, and then staying more steady through out the year, peaking again in November 2011, during the Mohamed Mahmoud clashes.
These data have a lot of potential for political sociology, where computer-coded event data haven’t really made much of an appearance. Considering the granularity of the data, that it accounts for many substate actors, social movement scholars would be remiss not to start digging in.
A few other resources on GDELT:
Leetaru and Schrodt’s 2013 ISA paper
Jay Yonamine‘s (one of Schrodt’s students) paper on predicting levels of violence in Afghanistan