Some comments on “Computer-aided content analysis…”

Laura K. Nelson wrote a nice review of my recent Mobilization article last week for the Mobilizing Ideas blog. She sums of some of the work that I had done in preparing the article and training the machine learning classifier for coding mobilization in the April 6th Movement’s Facebook messages.

At the end, she poses two questions:

How can we understand the opposing results from his two methods? While the word count method found that media words peaked on April 6 and May 4 and offline coordination did not follow the expected pattern, the machine learning approach found that the offline coordination category rose around the days of action but the media and press category did not. Why this discrepancy and what does it mean? Does this reflect something in the data or is it simply an artifact of the ambiguity of language and the difficulty of using automated methods to study it? I believe his findings are not contradictory, as they may appear, but they reveal something about the data.

There may indeed be a story in the data around this. One potential explanation is that people would be primarily linking to major news organizations but weren’t engaged in any discussion or criticism of how media and press covered the event. I agree with her that article would have been improved by providing some examples of the types of messages being posted on April 6 itself.

This is also one place where Hopkins and King’s ReadMe fails — since it generates point estimates of the proportion of a particular class in a corpus you cannot use it to generate per document probabilities.

Her second question:

What was going on in the none category? If not for mobilization, how else were the members in this Facebook group using this platform? Hanna clearly could not address everything in the limited space in this article, but I hope his future articles will unpack the ambiguity in this none category and suggest some conclusions about what the users were saying in these posts.

Something which I didn’t explore in the article but I did in several earlier drafts of the master’s from which it was drawn was the idea of discussions around issues, the idea drawing from the well-worn path of the social movement framing perspective. But it was harder to discern whether several unified claims were emerging. I think it’d be a worthwhile project for folks to explore these kinds of data for emergent claims. This may be one place where topic modeling is a more adequate tool for the job.

I appreciate Nelson’s review and hope to see more machine learning used to address existing questions in the social movement literature.

Bad Hessian

A blog of life, love, and Lisp