Cyrus Dioun is a PhD Candidate in Sociology at UC Berkeley and a Data Science Fellow at the Berkeley Institute for Data Science. Garret Christensen is an Economics PhD Research Fellow at the Berkeley Initiative for Transparency in the Social Sciences and Data Science Fellow at the Berkeley Institute for Data Science

In recent years, the failure to reproduce the results of some of the social sciences’ most high profile studies (Reinhart and Rogoff 2010; LaCour and Green 2014) has created a crisis of confidence. From the adoption of austerity measures in Europe to retractions by Science and This American Life, these errors and, in some cases, fabrications, have had major consequences. It seems that The Royal Society’s 453 year old motto, “Nullius in verba” (or “take no one’s word for it”) is still relevant today.

Social scientists who use computational methods are well positioned to “show their work” and pioneer new standards of transparency and reproducibility. Reproducibility is the ability for a second investigator to recreate the finished results of a study, including key findings, tables, and figures, given only a set of files related to the research project.

Practicing reproducibility not only allows other social scientists to verify the author’s results, but also helps an author take part in more “hygienic” research practices, clearly documenting every step and assumption. This annotation and explication is essential when working with large data sets and computational methods that can seem to be an opaque “black box” to outsiders.

Yet, making work reproducible can feel daunting. How do you make research reproducible? Where to start? There are few explicit how-to-guides for social scientists.

The Berkeley Institute for Data Science (BIDS) and Berkeley Initiative for Transparency in the Social Sciences (BITSS) hope to address this shortcoming and create a resource on reproducibility for social scientists. Under the auspices of BIDS and BITSS, we are editing a volume of short case studies on reproducible workflows focused specifically on social science research. BIDS is currently in the process of finishing a volume on reproducibility in the natural sciences that is under review at a number of academic presses. These presses have expressed interest in publishing a follow-up volume on reproducibility in the social sciences.

We are inviting you and your colleagues to share your reproducible workflows. We are hoping to collect 20 to 30 case studies covering a range of topics from the social science disciplines and social scientists working in professional schools. Each case study will be short, about 1,500 to 2,000 words plus one diagram that demonstrates the “how” of reproducible research, and follow a standard template of short answer questions to make it easy to contribute a case study. The case study will consist of an introduction (100 -200 words), workflow narrative (500-800 words), “pain points” (200-400 words), key benefits (200-400 words), and tools used (200-400 words). To help facilitate the process we have a template as well as an example of Garret’s case study with accompanying diagram. (Draw.io is an easy-to-use online tool to draw your diagram.)

BITSS will be sponsoring a Summer Institute for Transparency and Reproducibility in the Social Sciences from June 8 – June 10 in Berkeley, CA. On June 9, BITSS will devote a special session to writing up workflow narratives and creating diagrams for inclusion in this edited volume. While the Summer Institute admissions deadline has passed, BITSS may still consider applications from especially motivated researchers and contributors to the volume. BITSS is also offering a similar workshop through ICPSR at the University of Michigan July 5-6.

Attending the BITSS workshop is not required to contribute to the volume. We invite submissions from faculty, graduate students, and post-docs in the social sciences and professional schools.

If you are interested in contributing to (or learning more) about this volume please email Cyrus Dioun (dioun@berkeley.edu) or Garret Christensen (garret@berkeley.edu) no later than May 6th. Completed drafts will be due June 28th.

References

LaCour, Michael J., and Donald P. Green. “When contact changes minds: An experiment on transmission of support for gay equality.” Science 346, no. 6215 (2014): 1366-1369.

Rogoff, Kenneth, and Carmen Reinhart. “Growth in a Time of Debt.” American Economic Review 100, no. 2 (2010): 573-8.

Matt Rafalow is a Ph.D. candidate in sociology at UC Irvine, and a researcher for the Connected Learning Research Network. http://mattrafalow.org/

Tech-minded educators and startups increasingly point to big data as the future of learning. Putting schools in the cloud, they argue, opens new doors for student achievement: greater access to resources online, data-driven and individualized curricula, and more flexibility for teachers when designing their lessons. When I started my ethnographic study of high tech middle schools I had these ambitions in mind. But what I heard from teachers on the ground provided a much more complicated story to the politics of data collection and use in the classroom.

For example, Mr. Kenworth, an art teacher and self-described techie, recounted to me with nerdy glee how he hacked together a solution to address bureaucratic tape that interfered with his classes. Administrators at Sheldon Junior High, the Southern California-based middle school where he taught, required that all student behavior online be collected and linked to individual students. Among the burdens that this imposed on teachers’ curricular flexibility was how it limited students’ options for group projects. “I oversee yearbook,” he said. “The school network can be slow, but more than that it requires that students log in and it’s not always easy for them to edit someone else’s files.” Kenworth explained that data tracking in this way made it harder for student file sharing with one another, minimizing opportunities to easily and playfully co-create documents, like yearbook files, from their own computers.

As a workaround to the login-centered school data policy, Kenworth secretly wired together a local area network just for his students’ yearbook group. “I’m the only computer lab on campus with its own network,” he said. “The computers are not connected to the district. They’re using an open directory whereas all other computers have to navigate a different system.” He reflected on why he created the private network. “The design of these data systems is terrible,” he said, furrowing his brow. “They want you to use their technology and their approach. It’s not open at all.”

Learning about teachers’ frustrations with school data collection procedures revealed, to me, the pressure points imposed on them by educational institutions’ increasing commitment to collect data on student online behavior. Mr. Kenworth’s tactics, in particular, make explicit the social structures in place that tie the hands of teachers and students as they use digital technologies in the classroom. Whereas much of the scholarly writing in education focuses on inequalities that emerge from digital divides, like unequal access to technology or differences in kids’ digital skill acquisition, little attention is paid to matters of student privacy. Most of the debates around student data occurs in across news media – academia, in classic form, has not yet caught up to these issues. But education researchers need to begin studying data collection processes in schools because they are shaping pedagogy and students’ experience of schooling in important ways. At some schools I have studied, like where Mr. Kenworth teaches, administrators use student data to not only discipline children but also to inform recommendations for academic tracks in high school. Students are not made aware that this data is being collected nor how it could be used.

Students and their families are being left out of any discussion about the big datasets being assembled that include online behaviors linked to their children. This reflects, I believe, an unequal distribution of power driven by educational institutions’ unchecked procedures for supplying and using student data. The school did not explicitly prohibit Mr. Kenworth’s activities, but if they found out they would likely reprimand him and link his computers to the district network. But Kenworth’s contention that this data collection processes limits how he can run his yearbook group extends far beyond editing shared yearbook files. It shows just how committed schools are to collecting detailed information about their students’ digital footprints. At the present moment, what they choose to do with that data is entirely up to them.