Sadly, we haven’t posted in a while. My own excuse is that I’ve been working a lot on a dissertation chapter. I’m presenting this work at the Young Scholars in Social Movements conference at Notre Dame at the beginning of May and have just finished a rather rough draft of that chapter. The abstract:
Scholars and policy makers recognize the need for better and timelier data about contentious collective action, both the peaceful protests that are understood as part of democracy and the violent events that are threats to it. News media provide the only consistent source of information available outside government intelligence agencies and are thus the focus of all scholarly efforts to improve collective action data. Human coding of news sources is time-consuming and thus can never be timely and is necessarily limited to a small number of sources, a small time interval, or a limited set of protest “issues” as captured by particular keywords. There have been a number of attempts to address this need through machine coding of electronic versions of news media, but approaches so far remain less than optimal. The goal of this paper is to outline the steps needed build, test and validate an open-source system for coding protest events from any electronically available news source using advances from natural language processing and machine learning. Such a system should have the effect of increasing the speed and reducing the labor costs associated with identifying and coding collective actions in news sources, thus increasing the timeliness of protest data and reducing biases due to excessive reliance on too few news sources. The system will also be open, available for replication, and extendable by future social movement researchers, and social and computational scientists.
You can find the chapter at SSRN.
This is very much a work still in progress. There are some tasks which I know immediately need to be done — improving evaluation for the closed-ended coding task, incorporating the open-ended coding, and clarifying the methods. From those of you that do event data work, I would love your feedback. Also if you can think of a witty, Googleable name for the system, I’d love to hear that too.
For my dissertation, I’ve been working on a way to generate new protest event data using principles from natural language processing and machine learning. In the process, I’ve been assessing other datasets to see how well they have captured protest events.
I’ve mused on before on assessing GDELT (currently under reorganized management) for protest events. One of the steps of doing this has been to compare it to the Dynamics of Collective Action dataset. The Dynamics of Collective Action dataset (here thereafter DoCA) is a remarkable undertaking, supervised by some leading names in social movements (Soule, McCarthy, Olzak, and McAdam), wherein their team handcoded 35 years of the New York Times for protest events. Each event record includes not only when and where the event took place (what GDELT includes), but over 90 other variables, including a qualitative description of the event, claims of the protesters, their target, the form of protest, and the groups initiating it.
Pam Oliver, Chaeyoon Lim, and I compared the two datasets by looking at a simple monthly time series of event counts and also did a qualitative comparison of a specific month.
Michael Corey asked me to post this CfP for a conference “Demography in the Digital Age,” occurring at Facebook the day before ASA (August 15). Note that this is the same day as the ASA Datathon, but if you’re a demographer this looks very cool.
On August 15th 2014, Facebook is sponsoring a conference on data collection in the digital age. Planned for the day before the American Sociological Association meetings in SF, the conference aims to bring together faculty, grad students, and industry professionals to share techniques related to data collection with the advent of social media and increased interconnectivity across the world.
I’m excited to say that Sociological Science, the new general audience open-access sociology journal, has published its first batch of articles. These include a great set of pieces, including one from my collaborator Chaeyoon Lim on network effects and emotional well-being. But the article “The Structure of Online Activism” by Lewis, Gray, and Meierhenrich caught my eye, for obvious reasons.
I’ve got some thoughts on this article, and following the philosophy of Sociological Science of encouraging “ex post corrections/comments over ex ante R&R demands,” here’s my response, which I’m also posting as a formal response on the Sociological Science site.
With season 6 of RuPaul’s Drag Race beginning exactly two weeks from today, it is officially the Drag Race preseason. I had lofty ideas for this season, like doing some elaborate forecasting from Twitter data à la the line of research that’s grown around elections forecasting. But little things (my dissertation) have limited the kind of commitment I can make to that endeavor.
Instead, I’m taking some inspiration from Jay Ulfelder and using a wiki survey to generate a forecast for the winner of season 6. I’m not really sure if a preseason forecast is actually a very good tool here — I’d venture the average Drag Race viewer isn’t well-versed in the careers of most of the queens who are appearing on this season. But there are definitely viewers who have some strong opinions formed already (like my RPDR viewing buddy Ryan) so I hope to get those folks voting within the next two weeks.
I present to you, thus, the RuPaul’s Drag Race wiki survey. Please share far and wide!
Laura K. Nelson wrote a nice review of my recent Mobilization article last week for the Mobilizing Ideas blog. She sums of some of the work that I had done in preparing the article and training the machine learning classifier for coding mobilization in the April 6th Movement’s Facebook messages.
Brayden King at Northwestern asked me to pass this on.
The Kellogg School of Management at Northwestern University seeks a post-doctoral researcher interested in at least one of the following areas of scholarship: social movements, collective behavior, networks, and organizational theory. We particularly encourage scholars to apply who have advanced quantitative training, programming skills, and familiarity with “big data” methods. The ideal candidate will have a PhD in sociology, communications, political science, or information sciences.
The post-doctoral position will allow the scholar to advance his or her own research agenda while also working on collaborative projects related to social media and activism. The post-doctoral position will be managed by Brayden King and will be affiliated with the Management and Organizations department and NICO (Northwestern Institute on Complex Systems). The term of this position is negotiable.
To apply, please e-mail curriculum vitae along with a brief statement of how your research interests are related to this position to Juliana Steers (firstname.lastname@example.org) with “MORS Post-Doctoral Position” as the subject. Arrange to have two letters of recommendation e-mailed to the same address. Salary and research budget are competitive and includes full medical insurance. Applications are due March 2, 2014.
Northwestern University is an Equal Opportunity, Affirmative Action Employer of all protected classes including veterans and individuals with disabilities.
Michael Corey, a former UChicago PhD soc student (and recent guest poster at OrgTheory), asked me to forward this job posting at Facebook.
Quantitative UX Researcher
Menlo Park, CA
Facebook is working to connect the world in a big way. To succeed we need to understand the unique character of each of the world’s communities, what Facebook means or could mean to them, and how best to make our technology work for them. We’re looking for people with strong quantitative research skills to help in this effort. The ideal candidate will be a social scientist with expertise in quantitative research methodologies OR a quantitative specialist with experience solving social problems. They’ll be comfortable improvising and have the ability to work cross-functionally and thrive in a fast-paced organization.
Help shape the research agenda and drive research projects from end-to-end
Collaborate with product teams to define relevant questions about user growth and engagement
Deploy appropriate quantitative methodologies to answer those questions
Develop novel approaches where traditional methods won’t do
Collaborate with qualitative researchers as needed and iterate quickly to generate usable insights for product and business decisions
Deliver insights and recommendations clearly to relevant audiences
Ability to ask, as well as answer, meaningful and impactful questions
Ability to communicate complex analyses and results to any audience
Experience with Unix, Python, and large datasets (> 1TB) a plus
Master’s or Ph.D. in the social sciences (e.g., Psychology, Communication, Sociology, Political Science, Economics), OR in a quantitative field (e.g., Statistics, Informatics, Econometrics) with experience answering social questions
Fluency in data manipulation and analysis (R/SAS/Stata, SQL/Hive)
Expertise in quantitative research methodologies (e.g., survey sampling and design, significance testing, regression modeling, experimental design, behavioral data analysis)
I’m really excited to officially announce the first annual pre-ASA datathon, taking place at Berkeley’s D-Lab on August 15-16, 2014.
The theme is “big cities, big data: big opportunity for computational social science,” the idea being looking at contemporary urban issues — especially housing challenges — using data gathered and made publicly available by cities including San Francisco, New York, Chicago, Austin, Boston, Somerville, Seattle, etc.
The hacking will start at noon on August 15 and go until the next day. Sleeping is optional. We’ll have a presentation and judging session in the evening of August 16 in San Francisco, exact location TBD.
We’re working with several academic and industry partners to bring together tools and datasets which social scientists can use at the event. So stay tuned as that develops.
You can apply here and see the full call [PDF].
ALSO — Check out the CITASA Symposium the morning of the 15th (citasasymposium.info) before joining us at noon for the Datathon! There’ll be a number of great talks which will complement the hacking over at the D-Lab.
I was pleased to see Fabio Rojas make an open invitation for more female scholars on OrgTheory. Writing for a technically-oriented blog, I’ve been painfully aware of the dearth of female voices expressed here. And as computational social scientists, we should be incredibly wary of the possibility of reproducing many of the same kinds of inequalities that have plagued computer science and tech at-large. We see this when “big data isn’t big enough“, as Jen Schradie has put it, when non-dominant voices are shushed in myriad different ways online, and I fear it when all our current contributors are men. Sociology has gone a long way to open up space for more “scholars at the margins” (a term I’m taking from Eric Grollman and his blog Conditionally Accepted), but there’s still a long way to go.
This is, then, an open invitation for anyone to contribute to Bad Hessian, especially women, people of color, queer people, people with disabilities, working-class or poor people, fat people, immigrants, and single parents. Our doors are always open for guest contributors and new regular contributors. Computational social science ought to be as committed as possible to not only bringing computational methods into the social sciences, but making sure that everyone, especially those at the margins, have a place to speak to and engage with those methods.