Wednesday, 19 November 2014

Making the most out of big data: computer mediated methods

Patrick Readshaw is a Media and Cultural Studies Doctoral Candidate at Canterbury Christ Church University. Patrick is interested in social media as an alternative and empowering source of information on current events, free from the constraints of other agenda-setting media forms. You can contact Patrick by email on  

When I was asked to write a blog for NSMNSS, I was certainly excited and being my first post of this kind I was suitably anxious about the prospect. However, my ongoing thesis has never ceased to provide interesting discussions with individuals in linked or parallel fields relating to social media. The main caveat in these discussions is that I often have to try not to over complicate things. With that in mind and my ham-fisted introduction out of the way I want to take some time to break down the value of so called “new media systems” like Twitter and the how I personally go about dealing with the data I collect. 

Since Social Media sites such as “Facebook” burst onto the scene 10 years ago, researchers and market analysts have been looking for a way to tap into the content on these sites. In recent years, there have been several attempts to do this with some being more successful than others (Lewis, Zamith & Hermida, 2013), particularly with regards to the scale of the medium in question. For those uninitiated (apologies to those that are) the term “Big Data” is the catch-all for the enormous trails of information generated by consumers going about their day in an increasingly digitized world (Manyika et al., 2011). It is this sheer volume of information that poses the first hurdle to be overcome when conducting research online. For example, earlier this year I was collecting data on the European Parliamentary Election and generated over 16,000 tweets in about three weeks. Bearing in mind that on average a tweet contains approximately 12 words in 1.5 sentences (Twitter, 2013), for those three weeks I had 196,500 words or 24,500 sentences to come to terms with. That is a lot of data for one person to deal with alone, especially if only applying manual techniques such as content analysis. 

So ultimately you have to ask two questions. Firstly how many undergraduates/interns chained to computers running basic content analysis is it going to take to complete the analysis in a reasonable space of time and whether that analysis is going to be reliable between the analysts. Secondly, while computational methods save time on analysis can you guarantee the same level of depth as with manual content analysis? Considering that content analysis goes beyond basic frequency statistics which can be collected simply from Twitter’s own search engine, I advocate the use of computer mediate techniques in which the data collected can firstly be reduced using filters to removes reTweets or spam responses and secondly to apply hierarchical cluster analysis among others to structure the data somewhat, or at least conceptualise it along a number of important factors. Both Howard (2011) and Papacharissi (2010) utilise this mixed methods approach as do Lewis, Zamith and Hermida (2013) whose method I adapted to my own work and applied as described above. Furthermore these individual pieces of research suggest the value of the medium overall as a source of data, due to its role as one of the primary news disseminators when access to mainstream news media is blocked such as during 2011 Arab Spring events. Burgess and Bruns (2012) have conducted addition research looking at the 2010 federal election campaign in Australia, advising the use of computational methods to reduce their sample to facilitate manual methods ultimately, maintaining depth during content analysis. As can be imagined Lewis, Zamith and Hermida (2013) and Manovich (2012) both support the methodologies utilized by the studies above and advocate making the most of the technical advances that have allowed for the content in question to be organized and harnessed in an efficient way.  

The application of mixed methodologies will continue to develop the techniques integral to facilitating the oncoming age of computational social science (Lazer et al., 2009) or “New Social Science”. While this is the case it is vitally important that while using this readily available source of data is not exploited in a way that could be potentially damaging to the medium as a whole and maintaining good research practice concerning the ethics associated with consumer privacy. As a final aside I would like to remind everyone that this data is hugely fascinating and rich beyond all belief but there are dangers associated with quantifying social life and if possible this should be at front of our minds before, during and after conducting research online (Boyd & Crawford, 2012; Oboler, Welsh & Cruz, 2012).


Boyd, d. & Crawford, K. (2012). Critical questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15 (5), 662–679.

Burgess, J., & Bruns, A. (2012). (Not) the Twitter election: The dynamics of the #ausvotes conversation in relation to the Australian media ecology. Journalism Practice, 6 (3), 384– 402.
Howard, P. (2011). The digital origins of dictatorship and democracy: Information technology and political Islam. London, UK: Oxford University Press.

Lazer, D., Pentland, A., Adamic, L., Aral, S., Barbási, A., Brewer, D., Christakis, N., Contractor, N., Fowler, J., Gutmann, M., Jebara, T., King, G., Macy, M., Roy, D. & Van Alstyne, M. (2009). Life in the network: The coming age of computational social science. Science, 323 (5915), 721-723.

 Lewis, S. C., Zamith, R., & Hermida, A. (2013). Content Analysis in an Era of Big Data: A Hybrid Approach to Computational and Manual Methods. Journal of Broadcasting & Electronic Media, 57 (1), 34–52.

Manovich, L. (2012). Trending: The promises and the challenges of big social data. In M. K. Gold (Ed.), Debates in the Digital Humanities (pp. 460–475). Minneapolis, MN: University of Minnesota Press.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute.

Oboler, A., Welsh, K., & Cruz, L. (2012). The danger of big data: Social media as computational social science. First Monday, 17 (7-2). Retrieved from

Papacharissi, Z. (2010). A private sphere: Democracy in a digital age. Cambridge, England: Polity Press.

Thursday, 13 November 2014

The changing nature of who produces and owns data: How will it impact survey research?

Brian Head is a research methodologist at RTI International. This post first appeared on SurveyPost on 20 May, 2014. You can follow Brian on Twitter @BrianFHead.

Cloud Photo

Survey researchers have become interested in big data because it offers potential solutions to problems we’re experiencing with traditional methods. Much of the focus so far has been on social media (e.g., Tweets), but sensors (wearable tech) and the internet of things (IoT) are producing an increasingly rich, complex, and massive source of data. These new data sources could lead to an important change in how individuals see the data collected about them, and thus have ramifications for those interested in gathering and analyzing those data.

Who compiles data?

Quantitative data about people have been gathered for millennia. But with technological advances and identification of new purposes for it, the past 100 years have seen significant increases in the amount of data produced and collected—e.g., data on consumer patterns and other market research, probability surveys, etc.

Common to these data are three factors: 1) the data are a commodity compiled, used, or traded by third parties; 2) generally there are no direct benefits to individuals about whom data are gathered; and 3) the organizations interested in the data gather, store, and analyze it. All this is not to say that throughout history individuals haven’t collected information about themselves. Individuals have collected qualitative data in the form of diaries and biographies. And, they have collected some quantitative data but this has generally to satisfy a third-party (e.g., collecting financial information to file taxes). But, now in addition to all of the data others compile about them, new technologies like wearable technologies (sensors) and IoT devices allow people to voluntarily produce and compile massive amounts of data about themselves and doing so can have a direct benefit to them. (Involuntary data collection through connected devices is already taking place—e.g., internet connected devices are being used for geo-targeting advertising).

Who owns or controls data?

Data are collected in different ways. Census data are collected periodically (intervals vary by nation) through a mandatory government data collection. Surveys generally operate under the requirement of voluntary participation, although there are exceptions.  Much of the consumer data gathered now is done surreptitiously. Examples include browser cookies that collect information about the websites we visit, search engines that collect information about the internet searches people conduct, email providers that scan emails, and apps that use geodata to market goods and services to prospective clients.

It seems the public is increasingly aware of and concerned with the sum of these data collections. According to a recent Robert Wood Johnson Foundation (RWJF) study large majorities of self-tracking app/device users think (84%) they do or want (75%) to own data that are collected with the device. There have been attempts to limit data collection, such as the recent attempt to limit the data the U.S. government collects on citizens.  Advocates of efforts like this tend to cite concerns over burden and privacy. The exponential growth of data collected both voluntarily and involuntarily through apps, sensors, and the IoT may cause similar (perhaps successful) attempts to change government and corporate policies to provide individuals more control over their data. In fact, market researchers are already beginning to respond to such an interest among consumers by offering to pay consumers for access to their browsing history, social network activity, and transactions they conduct online while at the same time giving those consumers control over which data they sell to the brokers.
As the amount of data collected about us increases, there’s a good chance individuals will increasingly see their data as their own, understand the value it has to various third parties, demand more control over it, and to be compensated for it. At first brush that may seem concerning. However, the type of compensation individuals’ desire for data will likely depend on how data will be used. For example, consumers are likely to continue to trade data for convenience in services (see thesis # 12). And, the RWJF report cited above suggests the usual leverages used to gain survey participation—e.g., topic salience and altruism—may work in gaining access to big data when the purpose of the study is for “public good research.”

Need for further research

Further research is needed in this area of big data to answer questions like: 1) to what extent, and how soon, will a larger proportion of the population begin to voluntarily use sensor and IoT devices; 2) will the general public continue to tolerate involuntary data collection when those data are collected by connected devices; 3) will the general public have opinions similar to early adopters in the RWJF about sharing personal data from connected devices with survey researchers; 4) will the leverages that work for gaining survey participation work for gaining access to personal big data or will new/additional leverages be needed; 5) will we be able to use techniques similar to those used to access administrative record data or will we need to develop new protocol for seeking permission to access these data? I look forward to seeing and contributing toward the research to answer these questions. What are your thoughts?

Thursday, 6 November 2014

You Are What You Tweet: An Exploration of Tweets as an Auxiliary Data Source

Ashley Richards is a survey methodologist at RTI International. This post first appeared on SurveyPost on 29, July 2014. 

Last fall at MAPOR , Joe Murphy presented the findings of a fun study he did with our colleague, Justin Landwehr, and me. We asked survey respondents if we could look at their recent Tweets and combine them with their survey data. We took a subset of those respondents and masked their responses on six categorical variables. We then had three human coders and a machine algorithm try to predict the masked responses by reviewing the respondents’ Tweets and guessing how they would have responded on the survey. The coders looked for any clues in the Tweets, while the algorithm used a subset of Tweets and survey responses to find patterns in the way words were used. We found that both the humans and machine were better than random in predicting values of most of the variables.

We recently took this research a step further and compared the accuracy of these approaches to multiple imputation, with the help of our colleague Darryl Creel. Imputation is the approach traditionally used to account for missing data and we wanted to see how the nontraditional approaches stack up. Furthermore, we wanted to check out these approaches because imputation cannot be used in the case where survey questions are not asked. This commonly occurs because of space limitations, the desire to reduce respondent burden, or other factors. I will be presenting on this research at the upcoming Joint Statistical Meetings (JSM), in early August. I’ll give a brief summary here, but if you’d like more details on it please check out my presentation or email me for a copy of the paper.

Income was the only variable for which imputation was the most accurate approach, but the differences between imputation and the other approaches were not statistically significant. Imputation correctly predicted income 32% of the time, compared to 25% for human coders and 26% for the machine algorithm. Considering that there were four income categories and a person would have a 25% chance of randomly selecting the correct response, I am unimpressed with these success rates of 25%-32%.

Human coders outperformed imputation on the other demographic items (age and sex), but imputation was more accurate than the machine algorithm. For these variables, the human coders picked up on clues in respondents’ Tweets. I was one of the coders and found myself jumping to conclusions, but I did so with a pretty good rate of success. For instance, if a Tweeter said “haha” a lot or used smiley faces, I was more likely to guess the person was young and/or female. These are tendencies that I’ve observed personally but I’ve read about them too.

As a coder I struggled to predict respondents’ health and depression statuses, and this was evident in the results. Imputation was better than humans at predicting these, but the machine algorithm was even more accurate. The machine was also best at predicting who respondents voted for in the previous presidential election, with human coders in second place and imputation in last place. As a coder I found that predicting voting was fairly simple among the subset of respondents who Tweeted about politics. Many Tweeters avoided the subject altogether, but those who Tweeted about politics tended to make it obvious who they supported.

So what does this all mean? We found that even with a small set of respondents, Tweets can be used to produce estimates with accuracy in the same range or better[1] as imputation procedures. There is quite a bit of room for improvement in our methods that could make them even more accurate. For example, we could use a larger sample of Tweets to train the machine algorithm and we could select human coders who are especially perceptive and detail-oriented. The finding that Tweets are as good or better as imputation is important because imputation cannot be used in the case where survey questions were not asked.

As interesting as these findings may be, they need to be taken with a grain of salt, especially because of our small sample size (n=29).[2] Relying on Twitter data is challenging because many respondents are not on Twitter, and those who are on Twitter are not representative of the general population and may not be willing to share their Tweets for these purposes. Another challenge is the variation in Tweet content. For example, as I mentioned earlier, some people Tweet their political views while others stay away from the topic on Twitter.

Despite these limitations, Twitter may represent an important resource for estimating values that are desired but not asked for in a survey. Many of our survey respondents are dropping clues about these values across the Internet, and now it’s time to decide if and how to use them. How many clues have you dropped about yourself online? Is your online identity revealing of your true characteristics?!?

[1] Even if approaches using Tweets may be more accurate than imputation, they require more time and money and in many cases may not be worth the tradeoff. As discussed later, these findings need to be taken with a grain of salt.

[2] We had more than 2,000 respondents, but our sample size for this portion of the study was greatly reduced after excluding respondents who don’t use Twitter, respondents who did not authorize our use of their Tweets, and respondents whose Tweets were not in English. Furthermore, half of the remaining respondents’ Tweets were used to train the machine algorithm.

Thursday, 30 October 2014

Innovations in knowledge sharing: creating our book of blogs

Kandy Woodfield is the Learning and Enterprise Director at NatCen Social Research, and the co-founder of the NSMNSS network. You can reach Kandy on Twitter @jess1ecat.

Yesterday the NSMNSS network published its first ebook, a collection of over fifty blogs penned by researchers from around the world who are using social media in their social research. To the best of our knowledge this is the first book of blogs in the social sciences.  It draws on the insights of experienced and well-known commentators on social media research through to the thoughts of researchers new to the field.

Why did we choose to publish a book of blogs rather than a textbook or peer-reviewed article?

 In my view there is space in the academic publishing world for peer reviewed works and self-published books. We chose to publish a book of blogs rather than a traditional academic tome because we wanted to create something quickly which reflected the concerns and voices of our members. Creating a digital text, built on people’s experiences and use of social media seemed an obvious choice. Many of our network members were already blogging about their use of social media for research, for those who weren’t this was an opportunity to write something short and have their voices heard.

Unlike other fields of social research,  social media research is not yet populated with established authors and leading writers, the constant state of flux of the field means it is unlikely to ever settle in quite the same way as ethnography say or survey research. The tools, platforms and approaches to studying them are constantly changing. In this context works which are published quickly to continue to feed the plentiful discussions about the methods, ethics and practicalities of social media research seem an important counterpoint to more scholarly articles and texts.

How did we do it?

Step 1 – Create a call for action: We used social media channels to publicise the call for authors, posting tweets with links to the network blog which gave authors a clear brief on what we were looking for. Within less than a fortnight we had over 40 authors signed up.

Step 2 -  Decide on the editorial control you want to have: We let authors know that we were not peer reviewing content, if someone was prepared to contribute we would accept that contribution unless it was off theme. In the end we used every submitted blog with one exception. This was an important principle for us, the network is member-led and we wanted this book to reflect the concerns of our members not those of an editor or peer-review panel. The core team at NatCen undertook light touch editing to formatting and spelling but otherwise the contributions are unadulterated. We also organised the contributions into themes to make it easier for readers to navigate.

Step 2 – Manage your contributions: We used Google Drive to host an author’s sign-up spreadsheet asking for contact information and also an indication of the blog title and content. We also invited people to act as informal peer reviewers. Some of our less experienced authors wanted feedback and this was provided by other authors. This saved time because we did not have to create a database ourselves and was invaluable when it came to contacting authors along the way.

Step 3 – Keep a buzz going and keep in touch with authors: We found it important to keep the book of blogs uppermost in contributors minds, we did this through a combination of social media (using the #bookofblogs) and regular blogs and email updates to authors.

Step 4 – Set milestones: we set not just an end date for contributions but several milestones along the way tgo achieve 40% and 60% of contributions, this helped keep the momentum going.

Step 5 – Choose your publishing platform: there are a number of self-publishing platforms. We chose to use Press Books which has a very smooth and simple user interface similar to many blogging tools like Wordpress. We did this because we wanted authors to upload their own contributions, saving administrative time. By and large this worked fine although inevitably we ended up uploading some for authors and dealing with formatting issues!

Step 6 – Decide on format and distribution channels - You will need to consider whether to have just an e-book, an e-book and a traditional book and where to sell your book. We chose Amazon and Kindle (Mobi) format for coverage and global reach but you can publish into various formats and there are a range of channels for selling your book. 

Step 7 – Stick with it… when you’re creating a co-authored text like this with multiple authors you need to stick with it, have a clear vision of what you are trying to create and belief that you will reach your launch ready to go. And we did, we hope you enjoy it.

Watch a short video featuring a few of the authors from the Book of Blogs discussing what their pieces are about, here
Join the conversation today; Buy the e-book here!

Tuesday, 21 October 2014

It started with a tweet...


Kandy Woodfield is the Learning and Enterprise Director at NatCen Social Research, and the co-founder of the NSMNSS network. You can reach Kandy on Twitter @jess1ecat.

It started with a tweet, a blog post and a nervous laugh. Three months later I found  myself looking at a book of blogs. How did that happen?! Being involved in the NSMNSS network since its beginning has been an ongoing delight for me. It's full of researchers who aren't afraid to push the boundaries, question established thinking and break down a few silos. When I began my social research career, mobile phones were suitcase-sized and collecting your data meant lugging a tape recorder and tapes around with you. That world is gone, the smartphone most of us carry in our pockets now replaces most of the researcher's kitbag, and one single device is our street atlas, translator, digital recorder, video camera and so much more. Our research world today is a different place from 20 years ago, social media are common and we don't bat an eyelid at running a virtual focus group or online survey. We navigate and manage our social relationships using a plethora of tools, apps and platforms and the worlds we inhabit physically no longer limit our ability to make connections.

Social research as a craft, a profession, is all about making sense of the worlds and networks we and others live in, how strange would it be then if the methods and tools we use to navigate these new social worlds were not also changing and flexing.  Our network set out to give researchers a space to reflect on how social media and new forms of data were challenging conventional research practice and how we engage with research participants and audiences. If we had found little to discuss and little change it would have been worrying, I am relieved to report the opposite, researchers have been eager to share their experiences, dissect their success at using new methods and explore knotty questions about robustness, ethics and methods.

Our forthcoming  book of blogs is our members take on what that changing methodological world feels like to them, it's about where the boundaries are blurring between disciplines and methods, roles and realities. It is not a peer reviewed collection and it's not meant to be used as a text book, what we hope it offers is a series of challenging, interesting, topical perspectives on how social research is adapting, or not, in the face of huge technological and social change.

We are holding a launch event on Wednesday 29th October at NatCen Social Research if you would like more details please contact us.

I want to thank every single author from the established bloggers to the new writers who have shared their thoughts with us in this volume. I hope you enjoy the book as much as I have enjoyed curating it. Remember you can follow the network and join in the discussion @NSMNSS, #NSMNSS or at our

Thursday, 16 October 2014

Analytics, Social Media Management and Research Impact

Sebastian Stevens is an Associate Lecturer and Research Assistant at Plymouth University. He teaches research methods to social science students specialising in quantitative methods. He is on twitter @sebstevens99 and has a blog site at 

A key benefit that social media can bring to social science research is through impact and engagement. Demonstrating how a research project will achieve impact and engage the public is a key requirement of most social science research bids today, with many funders looking for more than the traditional conference and journal article as being sufficient. Funders today want to see not only how your research will contribute to the current body of knowledge, but also how your research could impact other areas of academia as well as providing public engagement and economic and societal wide benefits.

To promote your research to the widest possible audience, it is often necessary to use a number of Social Media platforms in order to access different populations. It is also now possible to measure this level of engagement through the use of web analytics with the two most common social media platforms (Facebook and Twitter) both providing free access to analytic software for their users. Managing the content and evaluating the impact of a number of social media platforms can however become tiresome and laborious, an issue overcome by the use of a Social Media Management System (SMMS).

The benefits of using a SMMS are vast and take the hassle out of managing multiple social media platforms for your research for a reasonable yearly subscription. There are many SMMS on the market today with an example that I am currently using on a project being Hootsuite. This particular SMMS provides a research team the benefits of:

1.    Scheduling – Researchers are busy people and have little time to manage multiple social media accounts. With a SMMS you can schedule posts to be sent to multiple social media platforms at times of the day known to deliver the largest impact.

2.    Enhanced analytics – The standard analytics of the accounts included in the SMMS are available in one place, alongside extra features including Google Analytics and Klout scores.  

3.    Streams – These provide the opportunity to keep up to date with features of your accounts such as your newsfeeds, retweets, mentions, hashtag usage plus many others.

4.    Multiple Authors – Multiple authors can be added to the system taking the responsibility away from one member of the team.

5.    RSS/Atom feeds – You can keep up with updates of other websites related to your research by adding the RSS/Atom feeds to the system.

By adopting the use of a SMMS a research team has a centralised, hassle free dashboard in which to create and post content alongside evaluating its impact. Each management system comes at a different price and includes different features, however most will take the hassle out of managing your social media platforms and provide greater opportunities to evaluate your research impact.




Thursday, 9 October 2014

Sentiment And Semantic Analysis

Michalis founded DigitalMR in 2010 following a corporate career in market research with Synovate and MEMRB since 1991. This post was first published on the DigitalMR blog. Explore the blog here:

It took a bit longer than anticipated to write Part 3 of a series of posts about the content proliferation around social media research and social media marketing. In the previous two parts, we talked about Enterprise Feedback Management (December 2013) and Short -event-driven- Intercept Surveys (February 2014). This post is about sentiment and semantic analysis: two interrelated terms in the “race” to reach the highest sentiment accuracy that a social media monitoring tool can achieve. From where we sit, this seems to be a race that DigitalMR is running on its own, competing against its best score.
The best academic institution in this field, Stanford University, announced a few months ago that they had reached 80% sentiment accuracy; they since elevated it to 85% but this has only been achieved in the English language, based on comments for one vertical, namely movies -a rather straight-forward case of: “I liked the movie” or “I did not like it and here is why…”. Not to say that there will not be people sitting on the fence with their opinion about a movie, but even neutral comments in this case, will have less ambiguity than other product categories or subjects. The DigitalMR team of data scientists has been consistently achieving over 85% sentiment accuracy in multiple languages and multiple product categories since September 2013; this is when a few brilliant scientists (engineers and psychologists mainly) cracked the code of multilingual sentiment accuracy!
Let’s dive into sentiment and semantics in order to have a closer look on why these two types of analysis are important and useful for next-generation market research.
Sentiment Analysis
The sentiment accuracy from most automated social media monitoring tools (we know of about 300 of them) is lower than 60%. This means that if you take 100 posts that are supposed to be positive about a brand, only 60 of them will actually be positive; the rest will be neutral, negative or irrelevant. This is almost like the flip of a coin, so why do companies subscribe to SaaS tools with such unacceptable data quality? Does anyone know? The caveat around sentiment accuracy is that the maximum achievable accuracy using an automated method is not 100% but rather 90% or even less. This is so, because when humans are asked to annotate sentiment to a number of comments, they will not agree at least 1 in 10 times. DigitalMR has achieved 91% in the German language but the accuracy was established by 3 specific DigitalMR curators. If we were to have 3 different people curate the comments we may come up with a different accuracy; sarcasm -and in more general ambiguity- is the main reason for this disagreement. Some studies (such as the one mentioned in the paper Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews) of large numbers of tweets, have shown that less than 5% of the total number of tweets reviewed were sarcastic. The question is: does it make sense to solve the problem of sarcasm in machine learning-based sentiment analysis? We think it does and we find it exciting that no-one else has solved it yet.
Automated sentiment analysis allows us to create structure around large amounts of unstructured data without having to read each document or post one by one. We can analyse sentiment by: brand, topic, sub-topic, attribute, topic within brands and so on; this is when social analytics becomes a very useful source of insights for brand performance. The WWW is the largest focus group in the world and it is always on. We just need a good way to turn qualitative information into robust contextualised quantitative information.
Semantic Analysis
Some describe semantic analysis as “keyword analysis” which could also be referred to as “topic analysis”, and as described in the previous paragraph, we can even drill down to report on sub-topics and attributes.
Semantics is the study of meaning and understanding language. As researchers we need to provide context that goes along with the sentiment because without the right context the intended meaning can easily be misunderstood. Ambiguity makes this type of analytics difficult, for example, when we say “apple”, do we mean the brand or the fruit? When we say “mine”, do we mean the possessive proposition, the explosive device, or the place from which we extract useful raw materials?
Semantic analysis can help:
  • extract relevant and useful information from large bodies of unstructured data i.e. text.
  • find an answer to a question without having to ask anyone!
  • discover the meaning of colloquial speech in online posts and
  • uncover specific meanings to words used in foreign languages mixed with our own
What does high accuracy sentiment and semantic analysis of social media listening posts mean for market research? It means that a 50 billion US$ industry can finally divert some of the spending- from asking questions to a sample, using long and boring questionnaires- to listening to unsolicited opinions of the whole universe (census data) of their product category’s users.
This is big data analytics at its best and once there is confidence that sentiment and semantics are accurate, the sky is the limit for social analytics. Think about detection and scoring of specific emotions and not just varying degrees of sentiment; think, automated relevance ranking of posts in order to allocate them in vertical reports correctly; think, rating purchase intent and thus identifying hot leads. After all, accuracy was the only reason why Google beat Yahoo and became the most used search engine in the world.