Wednesday, 17 October 2012

KES 2 - Big Data

On the 26th of September, we met at the OII for the second Knowledge Exchange Seminar for the Blurring the Boundaries network. The topic this time was quantitative methods.

Session 1 kicked off with a presentation from Ralph Schroeder and Eric Meyer from the OII on Big Data and raised the question, is the availability of Big Data changing the kinds of questions social researchers are asking? Is the Big Data tail wagging the research dog?

From the discussions after the presentation and throughout the day, a common theme emerged. And that was that there was still a great deal of uncertainty about what kinds of questions Big Data is useful for. If we want to ensure that we’re doing high quality social science research and not letting the Big Data tail wag the dog, then we need to think carefully about questions first and appropriate methods and data sources (big or small), second. It seems clear that the potential for companies to learn from their data and to predict consumer behaviour and increase sales, for example, is great. However prediction is not the only concern in the social sciences, and the questions we might want to keep in mind are, what do we potentially lose with a shift in attention to Big Data? How is Big Data changing the questions people are asking and the ways in which we do research? There seemed to me a consensus that it’s still relatively early days in terms of Big Data’s role in social science research and at the moment there are some examples of researchers grabbing onto the ‘low hanging fruit’ and that it’s up to social scientists to, over time, show how Big Data can be used in a way that aligns with the goals of social research.

A second common theme of the day was the idea that data fusion is a key issue for the social sciences. It may be that the potential for Big Data to be useful comes not from having lots of the same type of data, but in finding ways to integrate different types of data. What Jim Hendler calls Broad Data, in that it’s about the overlaying of many different types of data sets; structured and unstructured, big and small, public and private, open and closed, person and non-personal, anonymous and identified, aggregate and individual. It’s about finding the structure in all this data and a way to link it all together so that it becomes meaningful. He said the goal is integrating data assets. How do social scientists learn these skills?

In thinking about skills, another common theme emerged around the training of social scientists in the UK in quantitative skills. The ESRC is pushing for better quant methods training at undergraduate level for social scientists, but there is certainly a question as to whether our researchers are equipped with the statistics skills to understand what kinds of questions can be answered with Big Data. There is also the question of whether what we now need are social science researchers who are also computer scientists, rather than traditional statistics skills. Certainly a common theme that emerged from the day was that we absolutely need more multi-disciplinary teams if we’re working with this type of data, involving social scientists and computer scientists. There are issues then around research funding for collaborative research and questions about whether the REF does enough to encourage truly multi-disciplinary working when pressure to publish in discipline specific journals is substantial in many fields.