Friday, 12 December 2014

NSMNSS event January 12th, London: Using social media for academic work - opportunities and challenges - book now

NSMNSS knowledge exchange event: Using social media for academic work - opportunities and challenges
Following on from the successful launch of our book of blogs on social media research, we will be holding a new knowledge exchange event on January 12th at NatCen in central London with two of the blog authors sharing their experiences of using social media for research.

We're delighted to be joined by Dr Deborah Lupton, Centenary Research Professor at the University of Canberra, who is making a flying visit to the UK and also by Dr Luke Sloan from the COSMOS team at Cardiff University.

As well as the two main presentations there will be an opportunity for networking, lunch and the chance for you to share your research experiences in an open space forum where you can ask for advice and input on challenges or dilemmas you may be facing with your own research projects.

A draft agenda for the event is below; places are free but limited to 25 people so please book your place by clicking this link as soon as possible. 

If you would like to share your challenges, dilemmas or experiences on your own research during the informal open space session then email Kandy Woodfield at You won't need to prepare a presentation but we will ask you to talk about your research project or proposal and the issue you'd like input on for about 5 minutes.

Using social media for academic work - possibilities, benefits and risks

January 12th 2015, NatCen Social Research, 35 Northampton Square, London EC1V 0AX

10am             Arrival, tea & coffee and networking

10.30am       Using social media for academic work - possibilities, benefits and risks, Dr Deborah Lupton (@DALupton), University of Canberra

11.40am       Q&As

Midday         Open space forum

1pm              Lunch

1.45pm:         Investigating Social Phenomena Through COSMOS: A Case Study of the  Horsemeat Scandal, Dr Luke Sloan, COSMOS team, Cardiff University

2.45pm:         Q&As

3pm:              Finish

Thursday, 11 December 2014

Share your ideas for tweetchat topics in the new year!

Do you have a burning question about social sciences and social media? Want to learn something specific from our cross-country, cross-discipline members? Now is your chance!

We are thinking ahead about the topics of our monthly tweetchats for 2015. Let us know in the comments what you want to discuss and learn more about and we will expand on the idea by coming up with questions to pose to the network to discuss.

Looking forward to your thoughts!

Wednesday, 10 December 2014

Missed out? Read the twitter feed from our chat on changing role of researchers in a social media world

See below for the twitter feed from our tweetchat on 9th December 2014 all about the changing role of researchers involved in online and social media research. Scroll to the bottom and work your way up to follow the conversation in the order it occured.

A summary of the 5 questions posed to those taking part is included below:

Q1: How is social media impacting upon you as a researcher? E.g. identity, work/life balance, ethics
Q2: How is social media changing our identities as researchers?
Q3 from : How does insider-outsider position influence your role, identity,access or objectivity?
Q4 from : What does it mean to be ‘virtually ethical’?
Q5: What are the key issues around social media for researchers going forward?
Q6: What topics shall we chat about on in the new year?

Wednesday, 19 November 2014

Making the most out of big data: computer mediated methods

Patrick Readshaw is a Media and Cultural Studies Doctoral Candidate at Canterbury Christ Church University. Patrick is interested in social media as an alternative and empowering source of information on current events, free from the constraints of other agenda-setting media forms. You can contact Patrick by email on  

When I was asked to write a blog for NSMNSS, I was certainly excited and being my first post of this kind I was suitably anxious about the prospect. However, my ongoing thesis has never ceased to provide interesting discussions with individuals in linked or parallel fields relating to social media. The main caveat in these discussions is that I often have to try not to over complicate things. With that in mind and my ham-fisted introduction out of the way I want to take some time to break down the value of so called “new media systems” like Twitter and the how I personally go about dealing with the data I collect. 

Since Social Media sites such as “Facebook” burst onto the scene 10 years ago, researchers and market analysts have been looking for a way to tap into the content on these sites. In recent years, there have been several attempts to do this with some being more successful than others (Lewis, Zamith & Hermida, 2013), particularly with regards to the scale of the medium in question. For those uninitiated (apologies to those that are) the term “Big Data” is the catch-all for the enormous trails of information generated by consumers going about their day in an increasingly digitized world (Manyika et al., 2011). It is this sheer volume of information that poses the first hurdle to be overcome when conducting research online. For example, earlier this year I was collecting data on the European Parliamentary Election and generated over 16,000 tweets in about three weeks. Bearing in mind that on average a tweet contains approximately 12 words in 1.5 sentences (Twitter, 2013), for those three weeks I had 196,500 words or 24,500 sentences to come to terms with. That is a lot of data for one person to deal with alone, especially if only applying manual techniques such as content analysis. 

So ultimately you have to ask two questions. Firstly how many undergraduates/interns chained to computers running basic content analysis is it going to take to complete the analysis in a reasonable space of time and whether that analysis is going to be reliable between the analysts. Secondly, while computational methods save time on analysis can you guarantee the same level of depth as with manual content analysis? Considering that content analysis goes beyond basic frequency statistics which can be collected simply from Twitter’s own search engine, I advocate the use of computer mediate techniques in which the data collected can firstly be reduced using filters to removes reTweets or spam responses and secondly to apply hierarchical cluster analysis among others to structure the data somewhat, or at least conceptualise it along a number of important factors. Both Howard (2011) and Papacharissi (2010) utilise this mixed methods approach as do Lewis, Zamith and Hermida (2013) whose method I adapted to my own work and applied as described above. Furthermore these individual pieces of research suggest the value of the medium overall as a source of data, due to its role as one of the primary news disseminators when access to mainstream news media is blocked such as during 2011 Arab Spring events. Burgess and Bruns (2012) have conducted addition research looking at the 2010 federal election campaign in Australia, advising the use of computational methods to reduce their sample to facilitate manual methods ultimately, maintaining depth during content analysis. As can be imagined Lewis, Zamith and Hermida (2013) and Manovich (2012) both support the methodologies utilized by the studies above and advocate making the most of the technical advances that have allowed for the content in question to be organized and harnessed in an efficient way.  

The application of mixed methodologies will continue to develop the techniques integral to facilitating the oncoming age of computational social science (Lazer et al., 2009) or “New Social Science”. While this is the case it is vitally important that while using this readily available source of data is not exploited in a way that could be potentially damaging to the medium as a whole and maintaining good research practice concerning the ethics associated with consumer privacy. As a final aside I would like to remind everyone that this data is hugely fascinating and rich beyond all belief but there are dangers associated with quantifying social life and if possible this should be at front of our minds before, during and after conducting research online (Boyd & Crawford, 2012; Oboler, Welsh & Cruz, 2012).


Boyd, d. & Crawford, K. (2012). Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15 (5), 662–679.

Burgess, J., & Bruns, A. (2012). (Not) the Twitter election: The dynamics of the #ausvotes conversation in relation to the Australian media ecology. Journalism Practice, 6 (3), 384– 402.
Howard, P. (2011). The digital origins of dictatorship and democracy: Information technology and political Islam. London, UK: Oxford University Press.

Lazer, D., Pentland, A., Adamic, L., Aral, S., Barbási, A., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D. & Van Alstyne, M. (2009). Life in the network: The coming age of computational social science. Science, 323 (5915), 721-723.

 Lewis, S. C., Zamith, R., & Hermida, A. (2013). Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods. Journal of Broadcasting & Electronic Media, 57 (1), 34–52.

Manovich, L. (2012). Trending: The promises and the challenges of big social data. In M. K. Gold (Ed.), Debates in the Digital Humanities (pp. 460–475). Minneapolis, MN: University of Minnesota Press.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.

Oboler, A., Welsh, K., & Cruz, L. (2012). The danger of big data: Social media as computational social science. First Monday, 17 (7-2). Retrieved from

Papacharissi, Z. (2010). A private sphere: Democracy in a digital age. Cambridge, England: Polity Press.

Thursday, 13 November 2014

The changing nature of who produces and owns data: How will it impact survey research?

Brian Head is a research methodologist at RTI International. This post first appeared on SurveyPost on 20 May, 2014. You can follow Brian on Twitter @BrianFHead.

Cloud Photo

Survey researchers have become interested in big data because it offers potential solutions to problems we’re experiencing with traditional methods. Much of the focus so far has been on social media (e.g., Tweets), but sensors (wearable tech) and the internet of things (IoT) are producing an increasingly rich, complex, and massive source of data. These new data sources could lead to an important change in how individuals see the data collected about them, and thus have ramifications for those interested in gathering and analyzing those data.

Who compiles data?

Quantitative data about people have been gathered for millennia. But with technological advances and identification of new purposes for it, the past 100 years have seen significant increases in the amount of data produced and collected—e.g., data on consumer patterns and other market research, probability surveys, etc.

Common to these data are three factors: 1) the data are a commodity compiled, used, or traded by third parties; 2) generally there are no direct benefits to individuals about whom data are gathered; and 3) the organizations interested in the data gather, store, and analyze it. All this is not to say that throughout history individuals haven’t collected information about themselves. Individuals have collected qualitative data in the form of diaries and biographies. And, they have collected some quantitative data but this has generally to satisfy a third-party (e.g., collecting financial information to file taxes). But, now in addition to all of the data others compile about them, new technologies like wearable technologies (sensors) and IoT devices allow people to voluntarily produce and compile massive amounts of data about themselves and doing so can have a direct benefit to them. (Involuntary data collection through connected devices is already taking place—e.g., internet connected devices are being used for geo-targeting advertising).

Who owns or controls data?

Data are collected in different ways. Census data are collected periodically (intervals vary by nation) through a mandatory government data collection. Surveys generally operate under the requirement of voluntary participation, although there are exceptions.  Much of the consumer data gathered now is done surreptitiously. Examples include browser cookies that collect information about the websites we visit, search engines that collect information about the internet searches people conduct, email providers that scan emails, and apps that use geodata to market goods and services to prospective clients.

It seems the public is increasingly aware of and concerned with the sum of these data collections. According to a recent Robert Wood Johnson Foundation (RWJF) study large majorities of self-tracking app/device users think (84%) they do or want (75%) to own data that are collected with the device. There have been attempts to limit data collection, such as the recent attempt to limit the data the U.S. government collects on citizens.  Advocates of efforts like this tend to cite concerns over burden and privacy. The exponential growth of data collected both voluntarily and involuntarily through apps, sensors, and the IoT may cause similar (perhaps successful) attempts to change government and corporate policies to provide individuals more control over their data. In fact, market researchers are already beginning to respond to such an interest among consumers by offering to pay consumers for access to their browsing history, social network activity, and transactions they conduct online while at the same time giving those consumers control over which data they sell to the brokers.
As the amount of data collected about us increases, there’s a good chance individuals will increasingly see their data as their own, understand the value it has to various third parties, demand more control over it, and to be compensated for it. At first brush that may seem concerning. However, the type of compensation individuals’ desire for data will likely depend on how data will be used. For example, consumers are likely to continue to trade data for convenience in services (see thesis # 12). And, the RWJF report cited above suggests the usual leverages used to gain survey participation—e.g., topic salience and altruism—may work in gaining access to big data when the purpose of the study is for “public good research.”

Need for further research

Further research is needed in this area of big data to answer questions like: 1) to what extent, and how soon, will a larger proportion of the population begin to voluntarily use sensor and IoT devices; 2) will the general public continue to tolerate involuntary data collection when those data are collected by connected devices; 3) will the general public have opinions similar to early adopters in the RWJF about sharing personal data from connected devices with survey researchers; 4) will the leverages that work for gaining survey participation work for gaining access to personal big data or will new/additional leverages be needed; 5) will we be able to use techniques similar to those used to access administrative record data or will we need to develop new protocol for seeking permission to access these data? I look forward to seeing and contributing toward the research to answer these questions. What are your thoughts?

Thursday, 6 November 2014

You Are What You Tweet: An Exploration of Tweets as an Auxiliary Data Source

Ashley Richards is a survey methodologist at RTI International. This post first appeared on SurveyPost on 29, July 2014. 

Last fall at MAPOR , Joe Murphy presented the findings of a fun study he did with our colleague, Justin Landwehr, and me. We asked survey respondents if we could look at their recent Tweets and combine them with their survey data. We took a subset of those respondents and masked their responses on six categorical variables. We then had three human coders and a machine algorithm try to predict the masked responses by reviewing the respondents’ Tweets and guessing how they would have responded on the survey. The coders looked for any clues in the Tweets, while the algorithm used a subset of Tweets and survey responses to find patterns in the way words were used. We found that both the humans and machine were better than random in predicting values of most of the variables.

We recently took this research a step further and compared the accuracy of these approaches to multiple imputation, with the help of our colleague Darryl Creel. Imputation is the approach traditionally used to account for missing data and we wanted to see how the nontraditional approaches stack up. Furthermore, we wanted to check out these approaches because imputation cannot be used in the case where survey questions are not asked. This commonly occurs because of space limitations, the desire to reduce respondent burden, or other factors. I will be presenting on this research at the upcoming Joint Statistical Meetings (JSM), in early August. I’ll give a brief summary here, but if you’d like more details on it please check out my presentation or email me for a copy of the paper.

Income was the only variable for which imputation was the most accurate approach, but the differences between imputation and the other approaches were not statistically significant. Imputation correctly predicted income 32% of the time, compared to 25% for human coders and 26% for the machine algorithm. Considering that there were four income categories and a person would have a 25% chance of randomly selecting the correct response, I am unimpressed with these success rates of 25%-32%.

Human coders outperformed imputation on the other demographic items (age and sex), but imputation was more accurate than the machine algorithm. For these variables, the human coders picked up on clues in respondents’ Tweets. I was one of the coders and found myself jumping to conclusions, but I did so with a pretty good rate of success. For instance, if a Tweeter said “haha” a lot or used smiley faces, I was more likely to guess the person was young and/or female. These are tendencies that I’ve observed personally but I’ve read about them too.

As a coder I struggled to predict respondents’ health and depression statuses, and this was evident in the results. Imputation was better than humans at predicting these, but the machine algorithm was even more accurate. The machine was also best at predicting who respondents voted for in the previous presidential election, with human coders in second place and imputation in last place. As a coder I found that predicting voting was fairly simple among the subset of respondents who Tweeted about politics. Many Tweeters avoided the subject altogether, but those who Tweeted about politics tended to make it obvious who they supported.

So what does this all mean? We found that even with a small set of respondents, Tweets can be used to produce estimates with accuracy in the same range or better[1] as imputation procedures. There is quite a bit of room for improvement in our methods that could make them even more accurate. For example, we could use a larger sample of Tweets to train the machine algorithm and we could select human coders who are especially perceptive and detail-oriented. The finding that Tweets are as good or better as imputation is important because imputation cannot be used in the case where survey questions were not asked.

As interesting as these findings may be, they need to be taken with a grain of salt, especially because of our small sample size (n=29).[2] Relying on Twitter data is challenging because many respondents are not on Twitter, and those who are on Twitter are not representative of the general population and may not be willing to share their Tweets for these purposes. Another challenge is the variation in Tweet content. For example, as I mentioned earlier, some people Tweet their political views while others stay away from the topic on Twitter.

Despite these limitations, Twitter may represent an important resource for estimating values that are desired but not asked for in a survey. Many of our survey respondents are dropping clues about these values across the Internet, and now it’s time to decide if and how to use them. How many clues have you dropped about yourself online? Is your online identity revealing of your true characteristics?!?

[1] Even if approaches using Tweets may be more accurate than imputation, they require more time and money and in many cases may not be worth the tradeoff. As discussed later, these findings need to be taken with a grain of salt.

[2] We had more than 2,000 respondents, but our sample size for this portion of the study was greatly reduced after excluding respondents who don’t use Twitter, respondents who did not authorize our use of their Tweets, and respondents whose Tweets were not in English. Furthermore, half of the remaining respondents’ Tweets were used to train the machine algorithm.

Thursday, 30 October 2014

Innovations in knowledge sharing: creating our book of blogs

Kandy Woodfield is the Learning and Enterprise Director at NatCen Social Research, and the co-founder of the NSMNSS network. You can reach Kandy on Twitter @jess1ecat.

Yesterday the NSMNSS network published its first ebook, a collection of over fifty blogs penned by researchers from around the world who are using social media in their social research. To the best of our knowledge this is the first book of blogs in the social sciences.  It draws on the insights of experienced and well-known commentators on social media research through to the thoughts of researchers new to the field.

Why did we choose to publish a book of blogs rather than a textbook or peer-reviewed article?

 In my view there is space in the academic publishing world for peer reviewed works and self-published books. We chose to publish a book of blogs rather than a traditional academic tome because we wanted to create something quickly which reflected the concerns and voices of our members. Creating a digital text, built on people’s experiences and use of social media seemed an obvious choice. Many of our network members were already blogging about their use of social media for research, for those who weren’t this was an opportunity to write something short and have their voices heard.

Unlike other fields of social research,  social media research is not yet populated with established authors and leading writers, the constant state of flux of the field means it is unlikely to ever settle in quite the same way as ethnography say or survey research. The tools, platforms and approaches to studying them are constantly changing. In this context works which are published quickly to continue to feed the plentiful discussions about the methods, ethics and practicalities of social media research seem an important counterpoint to more scholarly articles and texts.

How did we do it?

Step 1 – Create a call for action: We used social media channels to publicise the call for authors, posting tweets with links to the network blog which gave authors a clear brief on what we were looking for. Within less than a fortnight we had over 40 authors signed up.

Step 2 -  Decide on the editorial control you want to have: We let authors know that we were not peer reviewing content, if someone was prepared to contribute we would accept that contribution unless it was off theme. In the end we used every submitted blog with one exception. This was an important principle for us, the network is member-led and we wanted this book to reflect the concerns of our members not those of an editor or peer-review panel. The core team at NatCen undertook light touch editing to formatting and spelling but otherwise the contributions are unadulterated. We also organised the contributions into themes to make it easier for readers to navigate.

Step 2 – Manage your contributions: We used Google Drive to host an author’s sign-up spreadsheet asking for contact information and also an indication of the blog title and content. We also invited people to act as informal peer reviewers. Some of our less experienced authors wanted feedback and this was provided by other authors. This saved time because we did not have to create a database ourselves and was invaluable when it came to contacting authors along the way.

Step 3 – Keep a buzz going and keep in touch with authors: We found it important to keep the book of blogs uppermost in contributors minds, we did this through a combination of social media (using the #bookofblogs) and regular blogs and email updates to authors.

Step 4 – Set milestones: we set not just an end date for contributions but several milestones along the way tgo achieve 40% and 60% of contributions, this helped keep the momentum going.

Step 5 – Choose your publishing platform: there are a number of self-publishing platforms. We chose to use Press Books which has a very smooth and simple user interface similar to many blogging tools like Wordpress. We did this because we wanted authors to upload their own contributions, saving administrative time. By and large this worked fine although inevitably we ended up uploading some for authors and dealing with formatting issues!

Step 6 – Decide on format and distribution channels - You will need to consider whether to have just an e-book, an e-book and a traditional book and where to sell your book. We chose Amazon and Kindle (Mobi) format for coverage and global reach but you can publish into various formats and there are a range of channels for selling your book. 

Step 7 – Stick with it… when you’re creating a co-authored text like this with multiple authors you need to stick with it, have a clear vision of what you are trying to create and belief that you will reach your launch ready to go. And we did, we hope you enjoy it.

Watch a short video featuring a few of the authors from the Book of Blogs discussing what their pieces are about, here
Join the conversation today; Buy the e-book here!