Friday, 19 October 2012

KES 2 - Data Visualisation

The third session at the Knowledge Exchange Seminar on quantitative methods on the 26th of September was from Scott Hale of the OII. The topic was Data Visualisation and he gave a quick run through of some of some of the pitfalls and problems encountered using visualisation software without really thinking about the story you want to tell with your data. He also showed examples of really helpful visualisations that made it possible for large, complex data to be viewed and understood. One key theme here was that, due to the challenges of representing the temporal dimension of social media data, interactivity was often required. Again, this raised the real issue of skills and expertise needed in the world of social media research and the requirement for working in multidisciplinary teams.

The question was raised about whether the network could provide details about data visualisation platforms and blogs and as there are already some great resources out there, I agree this is worth trying to pull together. I’ve begun compiling a list below and urge members to share theirs too!

Nathan Yau’s blog, Flowing Data (Nathan is also author of the book Visualize This, a practical guide to visualisation )
Andy Kirk’s blog, Visualising Data  (Every month, Andy pulls together a list of the best visualisations on the web)
Moritz Stefaner’s blog, Well-formed Data
David McCandless’s blog, Information is Beautiful
The Award Winning Data Journalism Handbook, which has excellent chapters on data visualisation  

Thursday, 18 October 2012

KES 2 - Populations and sampling

The second session at the Knowledge Exchange Seminar on quantitative methods on the 26th of September was from Grant Blank of the OII. The topic was Populations and Sampling and he asked the questions; What is the “population” on social media platforms? How do platforms differ in population characteristics? How can we select cases or sample on social media?
One of the key issues in terms of sampling online is that it’s difficult to develop a sampling frame; Grant pointed out that a biased sampling frame was unavoidable in much online research. However, despite the potential problems, the advantages of online data collection often outweigh the challenges, not least because it’s cheap and fast. 

Since Twitter data are so easy to collect, much of the discussion following the session was around the challenges in sampling Tweets. How can we get a random representative sample of tweets, especially if we’re interested in looking at more than just a snapshot of time? It seems to me that a potential aim for the network might be to put together some guidelines around sampling from Twitter for new researchers who are looking for guidance. Again, the question was raised about what kinds of questions Twitter data can really help us to answer, if we know that Twitter users are not representative of the whole population and that even getting a random, representative sample of tweets is problematic. Some case studies and examples of research questions where Twitter data has been used to good effect could also be helpful to network members.
Little time was spent discussing sampling from other social media platforms, but an interesting reference was provided for Gjoka et al (2010) which promotes a Random Walk technique to obtain an unbiased sample of social network sites, see:

Wednesday, 17 October 2012

KES 2 - Big Data

On the 26th of September, we met at the OII for the second Knowledge Exchange Seminar for the Blurring the Boundaries network. The topic this time was quantitative methods.

Session 1 kicked off with a presentation from Ralph Schroeder and Eric Meyer from the OII on Big Data and raised the question, is the availability of Big Data changing the kinds of questions social researchers are asking? Is the Big Data tail wagging the research dog?

From the discussions after the presentation and throughout the day, a common theme emerged. And that was that there was still a great deal of uncertainty about what kinds of questions Big Data is useful for. If we want to ensure that we’re doing high quality social science research and not letting the Big Data tail wag the dog, then we need to think carefully about questions first and appropriate methods and data sources (big or small), second. It seems clear that the potential for companies to learn from their data and to predict consumer behaviour and increase sales, for example, is great. However prediction is not the only concern in the social sciences, and the questions we might want to keep in mind are, what do we potentially lose with a shift in attention to Big Data? How is Big Data changing the questions people are asking and the ways in which we do research? There seemed to me a consensus that it’s still relatively early days in terms of Big Data’s role in social science research and at the moment there are some examples of researchers grabbing onto the ‘low hanging fruit’ and that it’s up to social scientists to, over time, show how Big Data can be used in a way that aligns with the goals of social research.

A second common theme of the day was the idea that data fusion is a key issue for the social sciences. It may be that the potential for Big Data to be useful comes not from having lots of the same type of data, but in finding ways to integrate different types of data. What Jim Hendler calls Broad Data, in that it’s about the overlaying of many different types of data sets; structured and unstructured, big and small, public and private, open and closed, person and non-personal, anonymous and identified, aggregate and individual. It’s about finding the structure in all this data and a way to link it all together so that it becomes meaningful. He said the goal is integrating data assets. How do social scientists learn these skills?

In thinking about skills, another common theme emerged around the training of social scientists in the UK in quantitative skills. The ESRC is pushing for better quant methods training at undergraduate level for social scientists, but there is certainly a question as to whether our researchers are equipped with the statistics skills to understand what kinds of questions can be answered with Big Data. There is also the question of whether what we now need are social science researchers who are also computer scientists, rather than traditional statistics skills. Certainly a common theme that emerged from the day was that we absolutely need more multi-disciplinary teams if we’re working with this type of data, involving social scientists and computer scientists. There are issues then around research funding for collaborative research and questions about whether the REF does enough to encourage truly multi-disciplinary working when pressure to publish in discipline specific journals is substantial in many fields.