Linguistic challenges associated with monitoring social media
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Linguistic challenges associated with monitoring social media

on

  • 2,144 views

Social media monitoring tools such as Radian6, Sysomos and Scout have tremendous capabilities for pulling in highly targeted conversations taking place around a topic, person or brand from all social ...

Social media monitoring tools such as Radian6, Sysomos and Scout have tremendous capabilities for pulling in highly targeted conversations taking place around a topic, person or brand from all social media platforms. These tools also enable a corporation or researcher to find the influencers around a topic.

While the targeting potential is astonishing, variations in languages, slang, regional idioms, misspellings and nicknames for topics and brands make accurate targeting difficult. What’s more, the influencers around a brand or topic are the most likely to use a nickname, slang term or personal parlance known only to their social circle.

Statistics

Views

Total Views
2,144
Views on SlideShare
2,142
Embed Views
2

Actions

Likes
0
Downloads
33
Comments
0

1 Embed 2

http://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Linguistic challenges associated with monitoring social media Document Transcript

  • 1. Linguistic Challenges Associated with Monitoring Social Media Dave Linabury, Jason Macemore, Gary Olson Campbell-Ewald | October, 2010
  • 2. Copyright ©2010 Linabury, Macemore and Olson. Edited by Christopher Moritz and Helena Dobbins. All rights reserved. Radian6, Chevrolet, OnStar, the United States Navy and Google retain the copyrights of their respective corporations. Campbell-Ewald uses the services of both Radian6 and Google along with internally devel- oped, proprietary set of tools for gathering data on social media for its clients as well as its own internal data. Campbell-Ewald, part of the Interpublic Group of agencies, has been in the business of marketing, communications and strategy since 1911. 2
  • 3. Executive Summary Social media monitoring tools such as Radian6, Sysomos and Scout have tremendous capabilities for pulling in highly targeted conversations taking place around a topic, person or brand from all social media platforms. These tools also enable a corporation or researcher to find the influencers around a topic. While the targeting potential is astonishing, variations in languages, slang, regional idioms, misspellings and nicknames for topics and brands make accurate targeting difficult. What’s more, the influencers around a brand or topic are the most likely to use a nickname, slang term or personal parlance known only to their social circle. This makes understanding of these language variants critical. Campbell-Ewald’s Social Media team has addressed these linguistic challenges in their own client monitoring projects over the past five years by: • Determining current search trends around a topic. This identifies not only how users are searching (which indicates intent), but aids in the identification of misspellings and relevant associated topics. through the use of various tools, knowledge of • Determining the age and gender of the writer generational writing patterns and comparing regional variations against reliable reference sources of slang. • Identifying the influencers and recording their linguistic patterns. • Identifying emoticons and comparing them to known regional and generational variants. This paper will detail these challenges—which are largely unknown to most users of these monitoring tools—in hopes that their own monitoring will be more accurate and complete. 3
  • 4. Background Campbell-Ewald has been an active participant in social History media since early 2006. Their lead social media planners, Dave Linabury and Jason Macemore were among the first to develop social media monitoring tools, such as Fat Pipe and Sentimentor. It was the development of these early tools — created to meet their own needs as researchers — that led to understanding the challenges raised in this paper. Linabury and Macemore quickly discovered that monitoring tools were not always able to spider all of the conversations that were known to exist. At first they theorized that conversations weren’t being pulled in because different coding methods and naming conventions for Web site sections made it difficult for the tools to parse data. As new technologies made parsing data easier, the initial theory proved to be an incorrect assessment. It was ascertained in mid-2007 that linguistic variants were the cause. Since the discovery of the linguistic variant sets, Campbell- Successes Ewald has become the nation’s leader in social media monitoring. They have been tasked with monitoring data for several United States government agencies, including the United States Navy, the United States Mint, the United States Naval Academy, the FBI, the Center for Disease Control (CDC)and the Environmental Protection Agency (EPA). In addition to government clients and projects, Campbell- Ewald’s social media team, under the leadership of Linabury, also provides monitoring for dozens of Fortune 500 clients, while garnering numerous awards, such as a Gold Echo Award, a Silver Effie, Best Military Site of 2009 and Best Social Media Strategy, among others. 4
  • 5. Target Market Based on usage trends, the target audience for social media monitoring applications can be divided into two main segments: internal and external. Categorically, corporations and public relations firms tend to use monitoring tools for internally-driven ends. These typically include reputation management, crisis management and as a clipping service to capture media mentions. Keyword strategies for these approaches are typically limited to formal brand names, the CEO’s name, and associated marketing terminology. They rarely take into consideration linguistic variations, context or subtle sentiment variations. Inversely, advertising agencies, researchers and social media agencies tend to retain an external focus to their monitoring efforts, concentrating on sentiment analysis, brand perception and marketing effectiveness and awareness. External monitoring tends to consider contextual relevance far more than PR firms do, but most still lack the incorporation of (or even existence of) language variants that need to be considered for accurate and inclusive brand monitoring in the social space. 5
  • 6. Business Challenge Nearly 40% of corporations are turning to social The User Base Grows Annually monitoring to keep abreast of what’s being said about their brand. Many take this on internally, but most hire outside social media companies or agencies. However, virtually none of them are aware that they are not seeing the entire conversation and blindly put faith in their chosen monitoring tool that it will fulfill their needs and find all of the relevant online discussions about their brand, product or services. This is not the case. The tools are limited by the thoroughness of the tool’s operator, and how much time is spent determining appropriate keywords. Most administrators make the assumption that the terms they use as marketing descriptors (e.g., marketing copy, search terms, and PR copy) are enough. Many monitoring tools are set up for the corporation by the tool manufacturer. It is highly unlikely the tool creator could understand a brand as well as the employees, agencies or long-standing vendors of the corporation. The reality is, the marketing descriptors are generally one-sided, somewhat aspirational and rarely match customer expectations and perceptions. Few companies use keywords describing themselves as “cheap”, “average”, Tagcloud about Chrysler “acceptable”, “poor,” “pathetic”, “good enough”, etc.” however from BrandTags.net those are precisely the terms consumers use with respect to brands. For proof of this, one need only see how brands are described at BrandTags.net, where tens of thousands of consumers have used those exact terms to describe hundreds of corporations in ever-growing tag clouds of user-generated terms. 6
  • 7. Languages constantly evolve. They evolve nationally, The Problem with Monitoring Language regionally and hyper-locally. For example, a popular phrase among teens nationally to describe something amazing is “off the hook.” Regional variants, such as “off the chain” [Detroit] and “off the heezy” [Brooklyn] exist as well. Hyper- locally, a neighborhood may have yet another variant, shared among friends, but not generally known outside that block. This presents unique challenges to the researcher who is using social media monitoring tools. If a phrase is known, it will be used as a key search term for the tool to use. If, however, more people are using lesser known regional variants, the tool loses effectiveness. There exist several linguistic phenomenon online that do 1337speak not exist offline. One is the well-known variant known as hacker speak or “1337speak” (Elite speak). This variation goes back more than a decade online. It was developed by computer hackers in an effort to make their messages to each other difficult to read by outsiders. Words are deconstructed to their visual elements and replaced with alpha-numeric and punctuation equivalents that bear a passing resemblance to the original letter form. For example, a capital ‘T’ may be replaced with the number 7 or a + sign. The word ‘at’ will be replaced with the @ sign. Capital ‘E’ becomes a 3 and so on. There is no sequencing to the replacements; it is simply a matter of finding letters, numbers and punctuation that can be substituted. Indeed, cleverness is praised, and while online “1337speak generators” exist which “translate” text back and forth between English and 1337speak, each hacker has her own style of writing and will make personal substitutions that others may or may not choose to adopt. Here is a sample sentence in English first, then 1337speak: “Time Magazine’s reporter had no idea what we were after.” “71M3 M464z1n3’5 |23p0|273|2 H4|} n0 1|}34 wH47 w3 w3|23 4f73|2.” 7
  • 8. If hackers were discussing a new Intel processor in 1337speak, no monitoring tool would be able to pick up that conversation as no complete English words exist in hacker speak for the tool to pick up. Sites such as General Mayhem, 4chan.org and LOLCATS ICANHASCHEEZBURGER are responsible for spreading one of the more popular slang variants known as LOLCATS (pronounced, “LAHL cats”). The meme originated as a series of cute pictures of kittens doing things with the accompanying text purported to be the voice of the cat. Cats, according to the meme, have unique spellings of English, poor grammar, and prefer the “Impact” font. Eventually kids began using LOLCATS as an accepted form of writing in text messages, instant messages, email and even speech. Like 1337speak, LOLCATS speak can be difficult, if not impossible for monitoring tools to parse as plain English. Consider the sentence used for the 1337speak example in English, then in LOLCATS: “Time Magazine’s reporter had no idea what we were after.” “TIEM MAGAZEENZ REPORTR IZ R NO IDEAZ WUT WE R AFTERZ.” Finally, Generation Y general do not spell correctly, Intentional Misspellings sometimes out of laziness, sometimes — like hackers — to intentionally disguise their messages from authority figures. This may not matter to a company monitoring the conversations of senior citizens, but if the target audience is the highly sought after 18-24 crowd, it is an issue that must be understood. Here is a real example, found on MySpace, from a 16 year-old girl to her friends: “HAY GUISE LOL WUT CHARGIN LAZOR LOLZ SHOOP DA WHOOP THIS KID TOOK MY LUNCH MONEY CALL HIM AND SAY BAD THINGS HERES HIS NUMBER LOLZ 696 696 6969 BUT BECAREFUL HE DOSNT AFRAID OF ANYTHING” 8
  • 9. Microblogging platforms like Twitter and Foursquare, Generational Differences in Emoticons which necessitate short messaging, seem almost devoid of emoticons. It is our theory that hashtags—short linked codes preceded by the pound sign (#)—take the place of emoticons on microblogging as many hashtags are used sarcastically, such as #whatever or #ilovemylife. There are distinct differences between the types of emoticons created by the different birth generations in the United States. Notice that with each generation, the “faces” become slightly more realistic. • The so-called Silent Generation (1925-1945) are the Silent Generation Wink: least likely to use emoticons in speech other than the ;) • most basic (smiles and frowns). The Baby Boomers (1946-1963) use emoticons sparingly, but nevertheless use more than just :-) and :-( symbols. They will include others such as :- (unsure), :-O (surprised) and ;-) (wink). Notice the addition of a nose formed with the hyphen key. Baby Boomer Wink: • Generation Xers (1964-1980) use the most emoticons ;-) of the older three generations. They include unusual emoticons, such as >-}}}}-(°> (dead fish) and :^p (sticking out tongue), even emoticons meant sexually such as (o) for breasts. Noses are often present, usually with a carat ^ in place of a hyphen, although hyphens are prevalent as well. Generation X Wink: • It is with Generation Y (1981-2000) that we see ;^) the greatest change in emoticons where the “faces” move from sideways to forward facing, taken from the Japanese kaomoji. Compare the symbol for wink between Generations X and Y: ;^) and (0_-) Generation Y Wink: (O_-) 9
  • 10. Gender Analysis may be unfamiliar to most, and many may Gender Analysis question even why it is necessary. The reason is simple. Comments may arise where either the screen name of the writer is ambiguous, or the writing style of a known individual seems to drastically change suddenly. In the latter case, there is the distinct possibility of profile fraud. Some individuals may pretend to be the opposite gender for various reasons: to pretend to be another person for a prank, to assume the identity of another for fraudulent reasons, to pretend to be the opposite gender for sexual reasons, to pretend to be another for undercover work as in vice-squad or detective work. Solutions Relying solely on internal industry and marketing keywords will not suffice. It is crucial to take additional steps. The following sources be used to determine additional relevant keywords: • The Urban Dictionary: http://urbandictionary.com Continually updated, the Urban Dictionary is easily the largest source of regional, national and international slang on the Internet. Excellent for typing in industry terms to see if variations exist, and regionally where they are used. Google Insights allow searches to go from global down • Google Insights: http://google.com/insights/search/ to individual cities, with timeframes from the last 30 days as far back as 2004. They provide trends on rising search patterns based on the root key term, maps indicating geo-density, forecasts and news headlines, plotted on trend lines. 10
  • 11. • Google AdWords: http://adwords.google.com/ AdWords is a free tool from Google designed to assist companies in making better choices when selecting keywords for paid search buys. The tool can also be used to help select better keywords for social media monitoring. Keywords are shown by the latest search patterns, with search quantities displayed. • Influencers: Ask active and influential customers for terms, nicknames, etc. If your company does not have a personal relationship with its influencers, find and read their blogs and tweets, paying close attention to the responses from their audiences. Flag unusual words, spellings and abbreviations. • Gender Genie: http:// bookblog.net/gender/genie.php A free tool that can identify the gender of the writer by pasting text into a field and running the algorithm. With these additional keywords, misspellings, slang, nicknames and regional variants, the new keyword list will not only yield more data, but will finally tell the whole consumer story surrounding the brand. Benefits It is no longer an option to be naïve enough to actually believe that no one is talking. All brands are being discussed by someone. Only through the proper configuration of professional-grade monitoring tools like Radian6—and preferably under the guidance of a social media agency that specializes in monitoring and analysis—can a company expect to truly know what is being said about their brand. Not knowing how your brand is being discussed and described means that brand is not getting the entire picture, as is the case with the reports from PR agencies and most internal social media monitoring. By applying these techniques and using these additional tools, a brand can be certain of seeing the full picture and glean far more learnings from their customer base. 11
  • 12. Case Study: Chevrolet Cobalt The phenomenon of linguistic variants was first noticed and described by Linabury and Macemore in 2007 to General Motors while they were monitoring conversations pertaining to the Chevrolet Cobalt—a small car that young males were customizing—along with Honda Accords—into street rods (known regionally as Rice Rods, Rice Burners, Rice Rockets, etc.). The assignment was to find out what these young men were saying about the Cobalt as they were deemed by Chevrolet to be influencers to non-Chevrolet owners. Campbell-Ewald’s monitoring was confined geographically to the Great Lakes states. During the course of the monitoring, Macemore noticed that some of the Chicago and Ohio conversations in forums were referring to the Cobalt as a “Balt”. Linabury noticed that conversations on the West side of Michigan referred to it as a “C-Car” or “C-Balt”. C-Car was the internal name of the vehicle used by engineers, but in Michigan (where the car is produced), it is possible that engineering names are known externally. Macemore then theorized that these terms were surfac- ing enough that they should be added to the keywords the monitoring tool was using to spider conversations. After adding the new terms, the number of conversations found by the tool increased by 53%. This led to speculation that the influential members of a social circle may be more that circle, and that these names needed to be identified at likely to have internal nicknames than those outside the outset of any social media monitoring assignment to en- sure accurate monitoring and the largest possible data set. Result: By adding the additional terms that were manually identified, the conversational data set increased by more than 50% and the client gained insight and learnings into how their vehicles were referred to by the most influential purchasers of their product. 12
  • 13. Case Study: OnStar™ OnStar™ is a multimillion dollar company that produces a telematics system for vehicles. As the system is responsible for saving the lives of hundreds of people involved in motor vehicle accidents, OnStar™’s corporate marketing team wanted up to the minute reports on what their subscribers were saying, their detractors, and the media. In 2007, OnStar™ hired Campbell-Ewald’s Social Media Team to monitor conversations and report back with weekly findings, and daily with any outstanding conversations or topics. Campbell-Ewald’s Social Media Team quickly discovered there would be a few barriers to accurate monitoring. For example, people discussing certain television shows were appearing in the feed. Sentences like, “Did you see what happened on Star Search last night?” or “There was one episode on Star Trek where…” These false positives were quickly weeded out through exclusionary phrases added to the keyword set. The team also discovered linguistic variants of OnStar™ appearing in the conversations of loyal fans and influencers, which included several hackers. Some hackers were tweaking OnStar™ at home (similar to the jail-breaking of iPhones) for fun. We found that they used numerous variants of OnStar™ including: On*, On Star, On_Star, NOnStar, ON.Star, OffStar, On-Star, OnsStar and BlondeStar (in reference to a YouTube parody of OnStar™). Result: By adding the additional terms that were manually identified, the conversational data set increased by more than 109% and the client gained insight and learnings into how OnStar was being referred to by the most influential purchasers of their product and by an unexpected fan base: hackers. 13
  • 14. Technical Specs Assigning new keywords to any social monitoring tool is simple. Finding the keywords is the challenge. The following demonstration shows how to add new keywords to an existing set using the popular social media monitoring tool, Radian6. Radian6 In this example, the new Dell Mini 3 cellphone has been chosen as a topic to monitor. Narrowing the feed to cell phones and removing “noise” about Dell laptops makes the results more accurate. By adding the keyword ‘cellphone’ and the exclusionary keyword ‘laptop’, the feed examples are more targeted. Radian6 A search on Google Insights for ‘Dell Mini 3’ shows us that consumers are also searching for it as a ‘cellular dell’, ‘dell android’, ‘dell android phone’, ‘dell smartphone’, and ‘dell mini 5’ (a different model). A look at the Urban Dictionary indicates any cellphone may be referred to as a “cellie” by youth. Google Insights These additional keywords (except perhaps the Mini 5) should be added to Radian6’s keywords as they represent the intent of users. That these keywords are listed by Google as “Breakouts” is significant; breakouts represent a recent increase in search volume of more than 5,000%. Urban Dictionary 14
  • 15. Summary Campbell-Ewald has been an active participant in social media since early 2006. Their lead social media researchers, Dave Linabury and Jason Macemore were among the first to develop social media monitoring software tools. It was through the development of these early tools that were created to meet their own needs as researchers that led to understanding the linguistic challenges raised in this paper. Campbell-Ewald’s Social Media team addressed these linguistic challenges in their own client monitoring projects over the past five years utilizing the following approaches: • Determining current search trends around a topic • Determining the age and gender of the writer • Identifying the Influencers and recording their linguistic patterns • Identifying emoticons and comparing them to known regional and generational variants It is critical in monitoring to understand that internal marketing descriptors and paid search terms are not enough to effectively crawl all of the conversations taking place around a brand. Nor is it enough to rely on basic tools like Google Alerts. Accurate monitoring is done with professional grade tools like Radian6, under the guidance of experienced monitoring teams, like those at Campbell-Ewald. The monitor must use the Urban Dictionary to determine any industry or brand slang, check Google AdWords for misspellings and current search trends and check Google Insights for regional interest. Finally, the researcher must either directly contact influential fans of the brand or failing that, spend time reading blog posts by influencers and responses to their content from their audience. Only then can a keyword set be considered accurate and comprehensive. 15
  • 16. Contact Dave Linabury, Group Director, Social Media Dave.Linabury@c-e.com Jason Macemore, Digital Strategist Jason.Macemore@c-e.com Gary Olson, Senior Social Media Planner Gary.Olson@c-e.com 30400 Van Dyke Ave. Campbell-Ewald Warren, Michigan 48093 +1 (586) 574-3400 http://c-e.com