Qualitative Research and Programming

A few weeks ago I helped organize and instruct a Software Carpentry workshop geared towards social scientists, with the great help from folks at UW-Madison’s Advanced Computing Institute. Aside from tweaking a few examples (e.g. replacing an example using fake cochlear implant data with one of fake survey data), the curriculum was largely the same. The Software Carpentry curriculum is made to help researchers, mostly in STEM fields, to write code for reproducibility and collaboration. There’s instruction in the Unix shell, a scripting language of your choice (we did Python), and collaboration with Git.

We had a good mix of folks at the workshop, many who had some familiarity with coding to those who had zero experience. There were a number of questions at the workshop about how folks could use these tools in their research, a lot of them coming from qualitative researchers.

I was curious about what other ways researchers who use qualitative methods could incorporate programming into their research routine. So I took to Facebook and Twitter.

Question for folks who do qual work: is there any common research tasks you think would be facilitated by programming?

— Alex Hanna (@alexhanna) June 22, 2015

Among the replies, there were three tasks that stood out as potentially useful to qual folks:

1. Qualitative coding with help from automated methods

A number of people suggested that there needs to be better tools for automated or semi-automated coding of texts. On one extreme, some suggested that fully automated methods such as topic modeling were desirable but not sufficient for the task, either because they already had in mind a set of codes to be drawn from the text, or because topic modeling produces things which are not at all like those desired codes (e.g. topics are not conceptually the same as, say, a social movement frame).

Some other folks desired some basic coding by keyword or phrase as a means of facilitating production of qualitative codes. Probably the most extreme request was a tool which could replace existing qualitative coding programs like NVivo and Dedoose.

2. Storage and filtering for documents

Others rejected any kind of automated coding of texts and emphasized the necessity of being close to the text to identify themes. In their view, it’d be much more fruitful to have a tool which could organize relevant documents with their associated metadata. The tool would also provide some methods for filtering on certain documents by keyword or metadata so it wouldn’t be necessary to do close qualitative coding of every text in a corpus.

Similarly, a number of people wanted some basic text cleaning applications to remove things like HTML and other text detritus.

3. Data collection

Lastly, some commenters wanted scraping tools to gather public texts at scale, things which could poll Google News or scour TV archives. This would probably be accomplished via some kind of web crawler or accessing particular APIs.

* * *

I have two observations from these suggestions:

Most of these tasks seem to relate to text. Whether it relates to document management or data collection, the primary interest is in text as data. Accordingly, it may be possible to find some existing software to recalibrate somewhat towards a qualitative social science crowd. I’m less familiar with the software which has accompanied the digital humanities wave, so anyone who is, feel free to chime in.

Second, although my original prompt was about programming, there is more of an emphasis on tools rather than programming. Not that there’s anything wrong with that — we usually envision what kind of task we’d like to accomplish before we discover it’d require a knowledge of a set of different technologies (e.g. the coding interface I developed for my dissertation sits on a codebase of Python, HTML, CSS, and JavaScript).

The second point is important with consideration of how to move from a desired task to teaching the most valuable technical skills to accomplish those tasks. While the Software Carpentry curriculum is very good, it seems more oriented towards the STEM fields and people who may have some coding experience but little in the way of training in best software engineering practices. Dissecting a task into components for instruction isn’t exactly intuitive. For instance, last year I wrote a forum scraper to help a friend with a qualitative project. But how would I teach that? The necessary skills to write a scraper is some combination of knowledge of how HTML is structured, how HTTP requests are made, and a basic knowledge of Python/your fav scripting language.

All said, those suggestions were definitely helpful in thinking about what a programming curriculum for social scientists may look like in the future. If you’re someone who does qualitative work and have some other use cases to consider, please leave ’em in the comments.

Bad Hessian

skocpol <- sapply('state', function(x) { paste('bringing the', x, 'back in') })

Qualitative Research and Programming