POINT OF VIEW  SOCIAL MEASUREMENT  DEPENDS ON DATA QUANTITY  AND QUALITYThe challenge is first making these two types of  ...
POINT OF VIEW         SOCIAL MEASUREMENT         DEPENDS ON DATA QUANTITY         AND QUALITY Twitter, the Best Social    ...
POINT OF VIEWSOCIAL MEASUREMENTDEPENDS ON DATA QUANTITYAND QUALITY                           “net sentiment” tends to go d...
Upcoming SlideShare
Loading in …5

Social measurement depends on data quantity and quality


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Social measurement depends on data quantity and quality

  1. 1. POINT OF VIEW SOCIAL MEASUREMENT DEPENDS ON DATA QUANTITY AND QUALITYSocial Measurement Depends onData Quantity and QualityAs social platforms proliferate, enthusiastic users aregenerating more data than ever. Social media data arefast becoming the hottest commodity in market research. But can social data yield measurements that are comparable to those from other, more established forms of research? Is it really possible for brand The future of actionable managers to tap into these data streams to gain social media measurement insight into brand equity? is only as strong as its We believe it is too early to say for sure, even though social data have been used effectively standards for data quality. by PR and marketing departments for years.Anne Czernek For crisis management and on-the-fly campaign brands and more than 30 million online conversationsSenior Research Analyst assessments, social monitoring involves to determine the most appropriate methodologiesEmerging Media LabMillward Brown/Dynamic Logic watching a wide stream of updates in real time for working with social measurement from a brand and using those to gauge immediate next steps. perspective. Our conclusion? The future of actionable For these purposes, a qualitative sense of the social media measurement is only as strong as its consumer mood is adequate; the precision of standards for data quality. quantitative research is not required. The “Social” Voice: How Is But increasingly, insight and brand strategy teams It Different? are interested in using social data, and they would like to place social measurement alongside other Social data are fundamentally different from types of brand metrics (attitudinal, behavioral, and traditional brand measurement data. We can so on). In this context, social data must be treated think of consumers speaking in two different with the same rigor we expect of more traditional voices. As Figure 1 shows, the “survey” voice forms of measurement. Therefore, Millward Brown’s of consumers is captured under structured and Emerging Media Lab has conducted tests across 60 replicable conditions, while their “social” voice is observed in a fluid state. FIGURE 1: SOCIAL VOICE VS. SURVEY VOICE SURVEY VOICE SOCIAL VOICE Guided Replicable Unsolicited Fluid Quantifiable Observational Unmoderated Structured SHARE 1
  2. 2. POINT OF VIEW SOCIAL MEASUREMENT DEPENDS ON DATA QUANTITY AND QUALITYThe challenge is first making these two types of Our collective eagerness to be perpetuallydata structurally comparable and then establishing connected has spawned an ever-expandinglinkages across them. Methodological issues ecosystem of technology platforms, and as aarise out of the nature of the samples available result, the prevailing methodology for socialto us in these two data sources. Traditional brand media listening is to capture as much social datatracking and brand equity measurement rely on as possible—from Facebook, Twitter, Instagram,observing a statistically similar group of people Pinterest, Tumblr, etc. But should commentsover time. Individuals in a quantitative dataset may on news articles be considered? What aboutbe treated differently—their opinions weighted reviews on sites like Yelp? How do we define themore or less heavily—to ensure that the sample is borders of the social universe?representative and consistent. It seems that the edges of the social universe are as murky as those of the real universe; in both cases, the boundaries are expanding at an apparentlyThe only way to address the increasing rate. Without an agreed-upon definitionissues of user duplication and of the limits of the universe—or even what“social” means—it’s difficult to know whether we are trulylack of profiling information is to capturing all of the relevant data.put boundaries on our dataset. Of course, there are also significant differences inWe need to make some choices. the type of data available across platforms. Blog and forum discussion tends to be more composed and “conversationally” oriented than Twitter andBut the social sample is a rolling sample of Facebook updates, which tend to be short andactive, not necessarily representative, voices. posted on the fly. And while, to consumers, it mayActivists may overamplify topics they care deeply seem like social data points are easily accessibleabout, while people having positive but ordinary and browsable, from a research perspective,experiences with a brand may not feel compelled platforms control how those data streams areto speak up at all. It is not possible to weight these syndicated at scale. Twitter has monetized itsresponses effectively because the information “firehose” and charges for full access to it; Yelp,needed to profile respondents is not consistently in contrast, prohibits collection of its reviews foravailable. Thus it is difficult, if not impossible, research purposes. Table 1 shows the data fieldsto ascertain whether those who post, tweet, or that are available (or generally inferable) acrosscomment on a brand are representative of the different types of platforms.population of interest. TABLE 1: TYPES OF INFORMATIONWe expect that social profiling will improve AVAILABLE ACROSS SOCIAL MEDIAover time as we’re able to derive more of these PLATFORMSattributes from implicit relationships. But even with FACEBOOK BLOGS/ TWITTERimproved profiling, we anticipate ongoing concerns (TWEETS) PERSONAL FORUMSas basic as user duplication—e.g., how can we PAGES* (POSTS)ascertain whether one individual has posted five Full data streamtimes—once each on Facebook, Twitter, Tumblr, Full text availabilityBlogger, and a forum—or whether five different Majority public profilesindividuals posted similar content? Historical data User-level dataDefining a Social Universe User profilesto Measure InfluenceIn the current environment, the only way to Locationaddress the issues of user duplication and lack Genderof profiling information is to put boundaries onour dataset. We need to make some choices. *Facebook is a special case: Because data is only released in aggregate, none of the measures above are publicly available for research at a user level. SHARE 2
  3. 3. POINT OF VIEW SOCIAL MEASUREMENT DEPENDS ON DATA QUANTITY AND QUALITY Twitter, the Best Social Data Issues Research Source — FOR NOW But having chosen a data source, our work is Because of all the issues we have mentioned thus not done. Further processing of social data is far, we believe it is necessary to rely on a single needed. Even if Twitter is the best source of source for social data, and to us, Twitter seems to data for social measurement, not all of the data be the strongest candidate. It’s open, it’s mobile, within it is clean and useable—far from it. Our and it’s the world’s largest information platform. tests of over 30 million tweets show that up to 60 And because the Twitter firehose flows with a percent of Twitter data must be removed from good deal of tweet-level information (including the dataset before it is ready for analysis. the full text, user ID, and timestamp), we’re able to ascertain more not only about the text, but also Twitter tends to have two main issues that the users. Thus Twitter meets our requirement for can degrade data quality: respondent-level information in a bounded dataset. 1. Keyword ambiguity Moreover, because of the way it is used and Social data are generally collected through perceived by users, Twitter seems to us to be keyword searches, so when a brand name most representative of the broader social sphere. is also a common word, a large proportion of Dynamic Logic’s 2010 AdReaction study found content returned will not be about the brand. For that consumers viewed Facebook as being example, collecting data for the sandwich chain about connecting to friends and family, whereas “Subway” returns many mentions of the New Twitter was seen as an information platform for York City transit system. This not only distorts discovering, sharing, and learning. Though much the themes of conversation topics tracked, but has changed in social media in the intervening can also wreak havoc on other metrics, like years, those observations still hold true. Facebook sentiment. (In this case, a delayed subway is still the central connection platform, even though train might generate many complaints that are consumers are also using smaller, more interest- irrelevant to $5 foot-long sandwiches.) based social networks to share content related to cooking, photography, sports, or other specialized 2. Spam interests. When the wealth of data from these sites There is widespread proliferation of spam content is shared into the broader social stream, it usually on Twitter. As Figure 2 shows, half of the social comes through Twitter. Thus Twitter seems to us media mentions of a particular CPG brand were the best aggregation of discovery and sharing. spam. When spam comments are removed, FIGURE 2: SPAM PROPORTION OF TOTAL BRAND KEYWORD MENTIONS 30,000 Spam 25,000 CPG brand 20,000Tweets 15,000 10,000 5,000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 Weeks SHARE 3
  4. 4. POINT OF VIEWSOCIAL MEASUREMENTDEPENDS ON DATA QUANTITYAND QUALITY “net sentiment” tends to go down. (We obtain This research is under way now, and we our measure of net sentiment by adding the anticipate sharing results in the coming months. percentages of positive and neutral content, and from that sum subtracting the percentage of negative comments.) This happens because most spam is neutral in tone; e.g., “Check out this Using social measurement coupon”—so removing it leaves negative sentiment effectivelywillrequireunstinting proportionally higher, as illustrated by Figure 3. attention to the quality of data FIGURE 3: EFFECT OF we consider. CLEANING ON SENTIMENT Uncleaned Cleaned The first step in harnessing the brand insights 35% contained in social data is to create a dataset that can be managed by the same principles 30% that govern established, trusted methodologies. We need to be assured that we are working 25% with respondent-level data from a platform that 20% has both breadth and depth, and we need to cleanse the dataset of spam as well as irrelevant 15% references. Only when these steps have been accomplished can brands be confident that 10% they are making defensible decisions based on 5% reliable data. 0% The Emerging Media Lab is Dynamic Logic’s Net Sentiment* for CPG Brand specialty practice dedicated to research *Net Sentiment: (Positive + Neutral) - Negative innovation across new media platforms — namely mobile, gaming and social media. TheFutureofSocialMeasurement Using social measurement effectively will require To read more about social media, unstinting attention to the quality of data we please visit www.mb-blog.com. consider. The current generation of technology can help aggregate the dataset (through If you enjoyed “Social Measurement techniques like Natural Language Processing Depends on Data Quantity and and applying Bayesian rules to cleaning), but Quality,” you might also be human discretion is still needed to evaluate its interested in: source, quality, and worth. While we believe social data have value for “Facebook: Not an Ad Platform but an measuring brand performance, further work is Ecosystem” needed to understand the exact relationship—if “Are You Getting Your Fair Digital any—to brand equity. Looking forward, Millward Share?” Brown and Dynamic Logic are examining how our proprietary measure of brand performance in social “Social Media: Fans and Followers are media—which we call “social vitality”—relates an End, not a Means” to brand equity. We’re assessing its value as a leading indicator of brand performance, as well as “How Social Technologies Drive its usefulness in understanding the influence of Business Success” events and media on brand perceptions. SHARE 4