Since Social Media sites such as “Facebook” burst onto the scene 10 years ago, researchers and market analysts have been looking for a way to tap into the content on these sites. In recent years, there have been several attempts to do this with some being more successful than others (Lewis, Zamith & Hermida, 2013), particularly with regards to the scale of the medium in question. For those uninitiated (apologies to those that are) the term “Big Data” is the catch-all for the enormous trails of information generated by consumers going about their day in an increasingly digitized world (Manyika et al., 2011). It is this sheer volume of information that poses the first hurdle to be overcome when conducting research online. For example, earlier this year I was collecting data on the European Parliamentary Election and generated over 16,000 tweets in about three weeks. Bearing in mind that on average a tweet contains approximately 12 words in 1.5 sentences (Twitter, 2013), for those three weeks I had 196,500 words or 24,500 sentences to come to terms with. That is a lot of data for one person to deal with alone, especially if only applying manual techniques such as content analysis.
So ultimately you have to ask two questions. Firstly how many undergraduates/interns chained to computers running basic content analysis is it going to take to complete the analysis in a reasonable space of time and whether that analysis is going to be reliable between the analysts. Secondly, while computational methods save time on analysis can you guarantee the same level of depth as with manual content analysis? Considering that content analysis goes beyond basic frequency statistics which can be collected simply from Twitter’s own search engine, I advocate the use of computer mediate techniques in which the data collected can firstly be reduced using filters to removes reTweets or spam responses and secondly to apply hierarchical cluster analysis among others to structure the data somewhat, or at least conceptualise it along a number of important factors. Both Howard (2011) and Papacharissi (2010) utilise this mixed methods approach as do Lewis, Zamith and Hermida (2013) whose method I adapted to my own work and applied as described above. Furthermore these individual pieces of research suggest the value of the medium overall as a source of data, due to its role as one of the primary news disseminators when access to mainstream news media is blocked such as during 2011 Arab Spring events. Burgess and Bruns (2012) have conducted addition research looking at the 2010 federal election campaign in Australia, advising the use of computational methods to reduce their sample to facilitate manual methods ultimately, maintaining depth during content analysis. As can be imagined Lewis, Zamith and Hermida (2013) and Manovich (2012) both support the methodologies utilized by the studies above and advocate making the most of the technical advances that have allowed for the content in question to be organized and harnessed in an efficient way.
The application of mixed methodologies will continue to develop the techniques integral to facilitating the oncoming age of computational social science (Lazer et al., 2009) or “New Social Science”. While this is the case it is vitally important that while using this readily available source of data is not exploited in a way that could be potentially damaging to the medium as a whole and maintaining good research practice concerning the ethics associated with consumer privacy. As a final aside I would like to remind everyone that this data is hugely fascinating and rich beyond all belief but there are dangers associated with quantifying social life and if possible this should be at front of our minds before, during and after conducting research online (Boyd & Crawford, 2012; Oboler, Welsh & Cruz, 2012).
Boyd, d. & Crawford, K. (2012). Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15 (5), 662–679.
Burgess, J., & Bruns, A. (2012). (Not) the Twitter election: The dynamics of the #ausvotes conversation in relation to the Australian media ecology. Journalism Practice, 6 (3), 384– 402.
Lazer, D., Pentland, A., Adamic, L., Aral, S., Barbási, A., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D. & Van Alstyne, M. (2009). Life in the network: The coming age of computational social science. Science, 323 (5915), 721-723.
Lewis, S. C., Zamith, R., & Hermida, A. (2013). Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods. Journal of Broadcasting & Electronic Media, 57 (1), 34–52.
Manovich, L. (2012). Trending: The promises and the challenges of big social data. In M. K. Gold (Ed.), Debates in the Digital Humanities (pp. 460–475). Minneapolis, MN: University of Minnesota Press.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.
Oboler, A., Welsh, K., & Cruz, L. (2012). The danger of big data: Social media as computational social science. First Monday, 17 (7-2). Retrieved from http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3993/3269.
Papacharissi, Z. (2010). A private sphere: Democracy in a digital age. Cambridge, England: Polity Press.