A few weeks ago I helped organize and instruct a Software Carpentry workshop geared towards social scientists, with the great help from folks at UW-Madison’s Advanced Computing Institute. Aside from tweaking a few examples (e.g. replacing an example using fake cochlear implant data with one of fake survey data), the curriculum was largely the same. The Software Carpentry curriculum is made to help researchers, mostly in STEM fields, to write code for reproducibility and collaboration. There’s instruction in the Unix shell, a scripting language of your choice (we did Python), and collaboration with Git.

We had a good mix of folks at the workshop, many who had some familiarity with coding to those who had zero experience. There were a number of questions at the workshop about how folks could use these tools in their research, a lot of them coming from qualitative researchers.

I was curious about what other ways researchers who use qualitative methods could incorporate programming into their research routine. So I took to Facebook and Twitter.

Among the replies, there were three tasks that stood out as potentially useful to qual folks:

1. Qualitative coding with help from automated methods

A number of people suggested that there needs to be better tools for automated or semi-automated coding of texts. On one extreme, some suggested that fully automated methods such as topic modeling were desirable but not sufficient for the task, either because they already had in mind a set of codes to be drawn from the text, or because topic modeling produces things which are not at all like those desired codes (e.g. topics are not conceptually the same as, say, a social movement frame).

Some other folks desired some basic coding by keyword or phrase as a means of facilitating production of qualitative codes. Probably the most extreme request was a tool which could replace existing qualitative coding programs like NVivo and Dedoose.

2. Storage and filtering for documents

Others rejected any kind of automated coding of texts and emphasized the necessity of being close to the text to identify themes. In their view, it’d be much more fruitful to have a tool which could organize relevant documents with their associated metadata. The tool would also provide some methods for filtering on certain documents by keyword or metadata so it wouldn’t be necessary to do close qualitative coding of every text in a corpus.

Similarly, a number of people wanted some basic text cleaning applications to remove things like HTML and other text detritus.

3. Data collection

Lastly, some commenters wanted scraping tools to gather public texts at scale, things which could poll Google News or scour TV archives. This would probably be accomplished via some kind of web crawler or accessing particular APIs.

* * *

I have two observations from these suggestions:

Most of these tasks seem to relate to text. Whether it relates to document management or data collection, the primary interest is in text as data. Accordingly, it may be possible to find some existing software to recalibrate somewhat towards a qualitative social science crowd. I’m less familiar with the software which has accompanied the digital humanities wave, so anyone who is, feel free to chime in.

Second, although my original prompt was about programming, there is more of an emphasis on tools rather than programming. Not that there’s anything wrong with that — we usually envision what kind of task we’d like to accomplish before we discover it’d require a knowledge of a set of different technologies (e.g. the coding interface I developed for my dissertation sits on a codebase of Python, HTML, CSS, and JavaScript).

The second point is important with consideration of how to move from a desired task to teaching the most valuable technical skills to accomplish those tasks. While the Software Carpentry curriculum is very good, it seems more oriented towards the STEM fields and people who may have some coding experience but little in the way of training in best software engineering practices. Dissecting a task into components for instruction isn’t exactly intuitive. For instance, last year I wrote a forum scraper to help a friend with a qualitative project. But how would I teach that? The necessary skills to write a scraper is some combination of knowledge of how HTML is structured, how HTTP requests are made, and a basic knowledge of Python/your fav scripting language.

All said, those suggestions were definitely helpful in thinking about what a programming curriculum for social scientists may look like in the future. If you’re someone who does qualitative work and have some other use cases to consider, please leave ’em in the comments.

  • oriolmirosa

    Thanks for this, Alex. I have a few comments beyond the few tweets we exchanged yesterday. I think that the three tasks that you mention fall into two different groups. The first one (automated coding) is the one I brought up, basically because it is the one for which my programming skills are not strong enough to be able to do on my own. However, my impression is that this is something that even the most advanced machine learning is struggling with. There are some text mining techniques and algorithms that allow us to do some topic modeling, but there is still a long way to go before they can be used with the nuance that social scientists require. In this respect, I believe that (most) social scientists should work with machine learning experts in order to move forward.

    The other two tasks, however, can be done by social scientists with relatively easy and quick-to-learn computing skills. I am by no means an expert programmer, and I managed to learn how to scrape websites and extract text using a few R packages in only a handful of hours. I think that these tasks should be the main focus of programming training for qualitative social scientists. In fact, after I showed the work that I had done to a few students and faculty in my department (I basically scraped conservative websites for texts that deal with climate change), I have seen an explosion of interest and I have spent hours coaching colleagues and students on how to do these basic tasks with R. I think that a workshop or class focusing on acquiring these skills would be very welcome by many qualitative social scientists.

    • Thanks for the the comment, Oriol. That’s very good to know and good to hear that scraping tasks are very manageable for students you’ve spoken to.

  • Regarding the more ambitious request of a tool to replace existing
    qualitative coding programs like NVivo and Dedoose, could I indulge in a
    little self-promotion? I’ve written some open source tools to shift data between NVivo’s file formats and a simple SQLite database. From there the data can be manipulated or analysed using whichever method you like (Python, R, etc.) and still remain accessible by NVivo.

    You can find this project at https://github.com/BarraQDA/nvivotools and some of the tools can be used without local installation via http://wooey.barraqda.org/

    I’d love to hear what people think of this work.