Wednesday, 19 November 2014

Making the most out of big data: computer mediated methods

Patrick Readshaw is a Media and Cultural Studies Doctoral Candidate at Canterbury Christ Church University. Patrick is interested in social media as an alternative and empowering source of information on current events, free from the constraints of other agenda-setting media forms. You can contact Patrick by email on  

When I was asked to write a blog for NSMNSS, I was certainly excited and being my first post of this kind I was suitably anxious about the prospect. However, my ongoing thesis has never ceased to provide interesting discussions with individuals in linked or parallel fields relating to social media. The main caveat in these discussions is that I often have to try not to over complicate things. With that in mind and my ham-fisted introduction out of the way I want to take some time to break down the value of so called “new media systems” like Twitter and the how I personally go about dealing with the data I collect. 

Since Social Media sites such as “Facebook” burst onto the scene 10 years ago, researchers and market analysts have been looking for a way to tap into the content on these sites. In recent years, there have been several attempts to do this with some being more successful than others (Lewis, Zamith & Hermida, 2013), particularly with regards to the scale of the medium in question. For those uninitiated (apologies to those that are) the term “Big Data” is the catch-all for the enormous trails of information generated by consumers going about their day in an increasingly digitized world (Manyika et al., 2011). It is this sheer volume of information that poses the first hurdle to be overcome when conducting research online. For example, earlier this year I was collecting data on the European Parliamentary Election and generated over 16,000 tweets in about three weeks. Bearing in mind that on average a tweet contains approximately 12 words in 1.5 sentences (Twitter, 2013), for those three weeks I had 196,500 words or 24,500 sentences to come to terms with. That is a lot of data for one person to deal with alone, especially if only applying manual techniques such as content analysis. 

So ultimately you have to ask two questions. Firstly how many undergraduates/interns chained to computers running basic content analysis is it going to take to complete the analysis in a reasonable space of time and whether that analysis is going to be reliable between the analysts. Secondly, while computational methods save time on analysis can you guarantee the same level of depth as with manual content analysis? Considering that content analysis goes beyond basic frequency statistics which can be collected simply from Twitter’s own search engine, I advocate the use of computer mediate techniques in which the data collected can firstly be reduced using filters to removes reTweets or spam responses and secondly to apply hierarchical cluster analysis among others to structure the data somewhat, or at least conceptualise it along a number of important factors. Both Howard (2011) and Papacharissi (2010) utilise this mixed methods approach as do Lewis, Zamith and Hermida (2013) whose method I adapted to my own work and applied as described above. Furthermore these individual pieces of research suggest the value of the medium overall as a source of data, due to its role as one of the primary news disseminators when access to mainstream news media is blocked such as during 2011 Arab Spring events. Burgess and Bruns (2012) have conducted addition research looking at the 2010 federal election campaign in Australia, advising the use of computational methods to reduce their sample to facilitate manual methods ultimately, maintaining depth during content analysis. As can be imagined Lewis, Zamith and Hermida (2013) and Manovich (2012) both support the methodologies utilized by the studies above and advocate making the most of the technical advances that have allowed for the content in question to be organized and harnessed in an efficient way.  

The application of mixed methodologies will continue to develop the techniques integral to facilitating the oncoming age of computational social science (Lazer et al., 2009) or “New Social Science”. While this is the case it is vitally important that while using this readily available source of data is not exploited in a way that could be potentially damaging to the medium as a whole and maintaining good research practice concerning the ethics associated with consumer privacy. As a final aside I would like to remind everyone that this data is hugely fascinating and rich beyond all belief but there are dangers associated with quantifying social life and if possible this should be at front of our minds before, during and after conducting research online (Boyd & Crawford, 2012; Oboler, Welsh & Cruz, 2012).


Boyd, d. & Crawford, K. (2012). Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15 (5), 662–679.

Burgess, J., & Bruns, A. (2012). (Not) the Twitter election: The dynamics of the #ausvotes conversation in relation to the Australian media ecology. Journalism Practice, 6 (3), 384– 402.
Howard, P. (2011). The digital origins of dictatorship and democracy: Information technology and political Islam. London, UK: Oxford University Press.

Lazer, D., Pentland, A., Adamic, L., Aral, S., Barbási, A., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D. & Van Alstyne, M. (2009). Life in the network: The coming age of computational social science. Science, 323 (5915), 721-723.

 Lewis, S. C., Zamith, R., & Hermida, A. (2013). Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods. Journal of Broadcasting & Electronic Media, 57 (1), 34–52.

Manovich, L. (2012). Trending: The promises and the challenges of big social data. In M. K. Gold (Ed.), Debates in the Digital Humanities (pp. 460–475). Minneapolis, MN: University of Minnesota Press.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.

Oboler, A., Welsh, K., & Cruz, L. (2012). The danger of big data: Social media as computational social science. First Monday, 17 (7-2). Retrieved from

Papacharissi, Z. (2010). A private sphere: Democracy in a digital age. Cambridge, England: Polity Press.


  1. I agree with what has been mentioned, we should stress ourselves only to a level upto which it is productive. Thanks. Cialis Kaufen

  2. Nice post . Thanks for the shearing valuable information. I really fiend this type blog. Special thanks to writer.
    Instagram Followers

  3. Thanks for sharing such a nice Article

  4. Survey analysis is often presumed to be difficult - the reality may not be so. Survey analysis in a survey research can be classified into two - quantitative and qualitative data analysis. This article tries to throw light on the basic differences between the two techniques. data analysis in quantitative research

  5. Very informative post! It contains a lot of great advice. As with a lot of good advice it contains a lot of common sense. I am just starting out and really appreciate your blogging this. It is very helpful. Thank you!!buy-instagram-followers

  6. Thank you for your amazing post.This post is very helpful for everybody buy-twitter-followers.....

  7. Thank you for your amazing post.This post is very helpful for everybody buy-twitter-followers.....