This is a guest post by Jen Schradie. Jen is a doctoral candidate in the Department of Sociology at the University of California-Berkeley and the Berkeley Center for New Media. She has a master’s degree in sociology from UC Berkeley and a MPA from the Harvard Kennedy School. Using both statistical methods and qualitative fieldwork, her research is at the intersection of social media, social movements and social class. Her broad research agenda is to interrogate digital democracy claims in light of societal and structural differences. Before academia, she directed six documentary films on social movements confronting corporate power. You can find her at www.schradie.com or @schradie on Twitter.
Five years ago, Chris Anderson, editor-in-chief of Wired Magazine, wrote a provocative article entitled, “The End of Theory: The Data Deluge Makes the Scientific Method Obsolete” (2008). He argued that hypothesis testing is no longer necessary with google’s petabytes of data, which provides all of the answers to how society works. Correlation now “supercedes” causation:
This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.
An easy strawman, Anderson’s piece generated a host of articles in academic journals decrying his claim. The overall consensus, to no surprise, was that the scientific method – i.e. hypothesis testing – is far from over. Most argued as Pigliucci (2009:534) articulated,
But, if we stop looking for models and hypotheses, are we still really doing science? Science, unlike advertising, is not about finding patterns—although that is certainly part of the process—it is about finding explanations for those patterns.
Other analysts focused on the debate around “correlation is not causation.” Some critiqued Anderson in that correlation can lead you in the wrong direction with spurious noise. Others implicitly pointed to what Box (1976) articulated so well pre-Big Data – that science is an iterative process in which correlation is useful in that it can trigger research which uses hypothesis testing.
For sociologists, this issue may simply seem like fodder for an undergraduate lesson in the social scientific method. But Anderson’s argument has not disappeared. Instead, it has grown exponentially, as has the term Big Data. In the process, a lot of confusion has arisen about what in the heck Big Data is and how it is used – are we talking a source of data, a data collection technique, an analytic tool, or perhaps data visualization? And how big is Big Data, anyway? Is it simply an accumulation of tweets on a hashtag or much larger? The answers to all of these questions is, well, kind of, sort of, yes.
Digital technology has enabled researchers to access, store, analyze and report unprecedented massive amounts of data, often (though not always) from online sources. The trendy, and already clichéd term is not just data, but Big Data.
Access to Big Data and information from the digital cloud, as well as research techniques using it, has become much easier. When Anderson wrote this article five years ago, conducting social network analysis required the use of clunky software and manually cleaning up databases. Twitter, for instance, was just emerging on the social media landscape. Mining Big Data from the digital cloud is increasingly prevalent in social science research, which can consist of writing computer code to scrape and download complex search results or using software to do so. Now, with a few mouse clicks, for instance, Twitter conversations are instantaneously displayed creating data for an article. Ok, it’s not that simple, but it is so much more efficient than early forms of, say, UCINET, for instance. Research has never been so efficient.
“Um, yeah,” many of you may be thinking, “I know all of that.” But I have found that many sociologists have a lot of questions about Big Data. Six months ago, a colleague contacted me to schedule a time to meet so that I could explain to her what Big Data was. Before we could connect, though, she had already started a job as a “data scientist.” Now it’s my turn to ask her what she’s doing.
But the question remains, has Big Data changed the scientific method? When I first read Anderson’s article, I thought he was out to some digital utopian lunch. And I still do. In fact, on PBS’ MediaShift, I blogged about consistent representativeness problems with using Big Data scraped from the cloud because of digital inequality. And I am a proud card-carrying member of the scientific method society. However, is it possible that there could be something to correlation after all, that is more than a question of patterns (Gasp)? Big Data could actually be more of an anthropological source of data than we might realize, rather than the fetish that it is “right” and small data is “wrong.”
It might be useful to turn to the opposite end of the methodological spectrum: ethnography. Some ethnographers believe in grounded theory – in that we shouldn’t go into a research site with preconceived theories or hypotheses to make sure we are open to any and all findings that emerge from the research site. Other ethnographers believe that researchers should start out with a theory and then constantly refine it with every observation or interview. Perhaps both of these approaches could be used with Big Data. With all of the methods we have of analyzing and visualizing Big Data so quickly, an anthropological approach could provide a rich and textured alternative path to along the data science road. As someone who uses multi-methods, I may be alienating both quantitative and qualitative researchers alike by being a participant observer of Big Data at times. However, I am not suggesting fishing expeditions in lieu of hypothesis testing nor any Anderson-esque junking of the scientific method. The numbers do not speak for themselves. Instead, it is our job as social scientists to understand the difference between the data, whatever its size, and the method, whatever that may be. You can click here for more details.
Anderson, Chris. 2008. “the end of theory: the data deluge makes the scientific method obsolete.” Wired Magazine.
Box, G. E. P. 1976. “Science and Statistics.” In Journal of the American Statistical Association (71).
Pigliucci, Massimo. 2009. “The End of Theory in Science.” EMBO reports (10):534.