big data « Bad Hessian

Matt Rafalow is a Ph.D. candidate in sociology at UC Irvine, and a researcher for the Connected Learning Research Network. http://mattrafalow.org/

Tech-minded educators and startups increasingly point to big data as the future of learning. Putting schools in the cloud, they argue, opens new doors for student achievement: greater access to resources online, data-driven and individualized curricula, and more flexibility for teachers when designing their lessons. When I started my ethnographic study of high tech middle schools I had these ambitions in mind. But what I heard from teachers on the ground provided a much more complicated story to the politics of data collection and use in the classroom.

For example, Mr. Kenworth, an art teacher and self-described techie, recounted to me with nerdy glee how he hacked together a solution to address bureaucratic tape that interfered with his classes. Administrators at Sheldon Junior High, the Southern California-based middle school where he taught, required that all student behavior online be collected and linked to individual students. Among the burdens that this imposed on teachers’ curricular flexibility was how it limited students’ options for group projects. “I oversee yearbook,” he said. “The school network can be slow, but more than that it requires that students log in and it’s not always easy for them to edit someone else’s files.” Kenworth explained that data tracking in this way made it harder for student file sharing with one another, minimizing opportunities to easily and playfully co-create documents, like yearbook files, from their own computers.

As a workaround to the login-centered school data policy, Kenworth secretly wired together a local area network just for his students’ yearbook group. “I’m the only computer lab on campus with its own network,” he said. “The computers are not connected to the district. They’re using an open directory whereas all other computers have to navigate a different system.” He reflected on why he created the private network. “The design of these data systems is terrible,” he said, furrowing his brow. “They want you to use their technology and their approach. It’s not open at all.”

Learning about teachers’ frustrations with school data collection procedures revealed, to me, the pressure points imposed on them by educational institutions’ increasing commitment to collect data on student online behavior. Mr. Kenworth’s tactics, in particular, make explicit the social structures in place that tie the hands of teachers and students as they use digital technologies in the classroom. Whereas much of the scholarly writing in education focuses on inequalities that emerge from digital divides, like unequal access to technology or differences in kids’ digital skill acquisition, little attention is paid to matters of student privacy. Most of the debates around student data occurs in across news media – academia, in classic form, has not yet caught up to these issues. But education researchers need to begin studying data collection processes in schools because they are shaping pedagogy and students’ experience of schooling in important ways. At some schools I have studied, like where Mr. Kenworth teaches, administrators use student data to not only discipline children but also to inform recommendations for academic tracks in high school. Students are not made aware that this data is being collected nor how it could be used.

Students and their families are being left out of any discussion about the big datasets being assembled that include online behaviors linked to their children. This reflects, I believe, an unequal distribution of power driven by educational institutions’ unchecked procedures for supplying and using student data. The school did not explicitly prohibit Mr. Kenworth’s activities, but if they found out they would likely reprimand him and link his computers to the district network. But Kenworth’s contention that this data collection processes limits how he can run his yearbook group extends far beyond editing shared yearbook files. It shows just how committed schools are to collecting detailed information about their students’ digital footprints. At the present moment, what they choose to do with that data is entirely up to them.

This is a guest post by Jen Schradie. Jen is a doctoral candidate in the Department of Sociology at the University of California-Berkeley and the Berkeley Center for New Media. She has a master’s degree in sociology from UC Berkeley and a MPA from the Harvard Kennedy School. Using both statistical methods and qualitative fieldwork, her research is at the intersection of social media, social movements and social class. Her broad research agenda is to interrogate digital democracy claims in light of societal and structural differences. Before academia, she directed six documentary films on social movements confronting corporate power. You can find her at www.schradie.com or @schradie on Twitter.

Five years ago, Chris Anderson, editor-in-chief of Wired Magazine, wrote a provocative article entitled, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete” (2008). He argued that hypothesis testing is no longer necessary with google’s petabytes of data, which provides all of the answers to how society works. Correlation now “supercedes” causation:

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

An easy strawman, Anderson’s piece generated a host of articles in academic journals decrying his claim. The overall consensus, to no surprise, was that the scientific method – i.e. hypothesis testing – is far from over. Most argued as Pigliucci (2009:534) articulated,

But, if we stop looking for models and hypotheses, are we still really doing science? Science, unlike advertising, is not about finding patterns—although that is certainly part of the process—it is about finding explanations for those patterns.

Other analysts focused on the debate around “correlation is not causation.” Some critiqued Anderson in that correlation can lead you in the wrong direction with spurious noise. Others implicitly pointed to what Box (1976) articulated so well pre-Big Data – that science is an iterative process in which correlation is useful in that it can trigger research which uses hypothesis testing.

Continue reading →

Bad Hessian

Hack the World-System

Tag Archives: big data

Defying the Cloud: Murmurs from Middle School

Big Data and the Survival of the Scientific Method