Social media & sentiment analysis splunk conf2012


Published on

This presentation was delivered at Splunk's User Conference (conf2012). It covers info about social media data, how to index / use it with Splunk and a lot of content around Sentiment Analysis.

Published in: Technology, Business
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Social media & sentiment analysis splunk conf2012

  1. 1. Social Media & Sentiment Analysis How  I  learned  to  stop  worrying  and  love  the  internets   @michaelwilde | David Carasso Chief Mouth | Chief MindCopyright  ©  2012  Splunk  Inc.  
  2. 2. "What  is  social  data"?  What is social data? 2  
  3. 3. "What  is  social  data"?  data generated from human activity on social networks 3  
  4. 4. "What  is  social  data"?  Oh yeah, right Twitter. But I work in IT… so, who cares, right? 4  
  5. 5. Social Data Should be in Splunk! {[-] checkin : {[-]•  easy to analyze with badges : [], fields created : 1345093539, geolat : "41.7686007592",•  easy to create realtime/ geolong : "-72.621648", mayor : {[-] historical dashboards and type : "nochange" }, views primarycategory : {[-] fullpathname : "Food:Mexican Restaurants"•  easy to translate many iconurl : " word problems in to mexican_32.png", id : "4bf58dd8d48988d1c1941735", questions nodename : "Mexican Restaurants" }, timezone : "America/New_York", user : {[-] gender : "female" 5  }, venue : {[-]
  6. 6. "What  is  social  data"?  Wilde, we just said we workin IT and don’t care about Twitter! 6  
  7. 7. "What  is  social  data"?  Except when we search on the words “site” AND “is down” 7  
  8. 8. "What  is  social  data"?   8  
  9. 9. "What  is  social  data"?   Except when I search on the words “site” AND “is down”IT and the brand collide at times. 9  
  10. 10. Getting Social is  social  data"?   "What  Data Network Method Frequency: 3rd Parties Real-time Scheduled Push Pull 10  
  11. 11. Best thing about Social Data? Its almost always Structured JSON! 11  
  12. 12. What can you do with it? Map ConversationsAnalyze People 12  
  13. 13. What can you do with it? Enrich it with lookupsTrack Olympians 13  
  14. 14. Indexing the social mother lode A single stream of big data @itayNeeman’s curl splitter scripted input (TBR) Multiple forwarders installed on a single server streaming to multiple indexers 14  
  15. 15. Sir Bill, I believe the demos cometh.. …whoa. 15  
  16. 16. The Double Rainbow When it comes to “numbers”, the search language rocks! In social, what people “mean” matters. For that you’ll need some new tools that understand words and language “…what does it mean?!” 16  
  17. 17. Analyzing Sentiment Extract linguistic, subjectiveinformation of opinions, attitudes, emotions, and perspectives 17  
  18. 18. …and there are perspectives 18  
  19. 19. …and there are perspectives 19  
  20. 20. Understanding brings… Empathy with customers and prospects Intelligent business and design decisions 20  
  21. 21. Brand Perception Impacts Stock In 2011, our friends at Netflix announced that it would be increasing its subscription prices. The feedback on its Facebook page was outrage and the impact on its stock price was dramatic. 21  
  22. 22. Sentiment complements and informs “We analyze several surveys on consumer confidence and political opinion over the 2008 to 2009 period, and find they correlate to sentiment word frequencies in contemporaneous Twitter messages… …as high as 80%, and capture important large- scale trends. The results highlight the potential of text streams as a substitute and supplement for traditional polling.” From  Tweets  to  Polls:  Linking  Text  SenOment  to  Public  Opinion  Time  Series  (CMU:   OConnor,  Balasubramanyan,  Routledge,  and  Smith  2010)     22  
  23. 23. Twitter vs. Traditional Polling 23  
  24. 24. Box Office Revenue Forecasting“We use the chatter from to forecast box-officerevenues for movies. We show that a simple model built fromthe rate at which tweets are created about particular topicscan outperform market-based predictors. We furtherdemonstrate how sentiments extracted from Twitter can befurther utilized to improve the forecasting power of socialmedia.” Asur and Huberman 2010 24  
  25. 25. Easy  25  
  26. 26. What’s in a word?Terms have many contextdependent meanings."   depend on the writer, the reader, and their relationship, history, goals and preferences"   “unpredictable” bad in general, but good in movie reviews."  “jobs” data was affected by iPhone release 26  
  27. 27. How are you feeling right now? Plutchiks Wheel of Emotions Ekman’s Six Basic Emotions 27  
  28. 28. Sentiment analysis gonewrongWhen Anne Hathaway is mentioned, it’salmost always in a positive context, andas a result some trading algorithmsseem to purchase Berkshire Hathaway.When she is mentionedin the news, the stockgoes up. 28  
  29. 29. 29  
  30. 30. Bags of Words and Phrases Many sentiment words and expressions are not directly influenced by what is around them: That was fun :)But certainly they can be! They said it would be wonderful, but they were wrong. This "wonderful" movie turned out to be boring. 30  
  31. 31. Human Engineering vs. Machine Learning Hand-built expert systems and parse rules Similarly, human engineered lists of good and bad words (e.g., “good”, “great”, “bad”, “awful”) Natural Language Processing & Speech Understand - statistical and data driven. Sentiment analysis generally uses statistics and training sets. 31  
  32. 32. Machine Learning Choices"  Learning Type –  Supervised: + straightforward. – lots of training data. –  Unsupervised: + no training data. - may not find what you want. –  Semi-Supervised: + small initial training data. – interactive feedback."  Algorithms –  Naïve Bayes: +simplest probabilistic classifier model. – assumes words are independent –  EM: +performs better, doesn’t assume independence. - more complicated, over-fitting a problem 32  
  33. 33. Supervised LearningLabeled   New  Training   Learn   Unlabeled   Data   Model   Data  Labeled     New   Test     Validate   Model   Predict   Labeled   Data   Model   Labels   Data   33  
  34. 34. The Effect of Negation“The food was not good”Strategies: Negatingsentiment for all terms up to abreaking punctuation (i.e.,comma or sentence end)Negation effect is dependenton the term. • Mild words negate about the same: not bad ≈ good • Extreme words negate towards neutral: not horrible ≈ average   34  
  35. 35. Learning BiasA  common  feature  of  online  user-­‐supplied  reviews  is  that  the  posiOve   More occurrencesreviews  vastly  out-­‐number  the  negaOve  ones.    Movie  reviews  at  IMDB:   of “bad” in 10-star   reviews than in 2- star ones.   Normalize by accounting for relative frequencies. 35  
  36. 36. Sentiment in Social Media"  Emoticons: :-) ;( :/ –  Reliable measure of sentiment –  Simple regex can cover more than 95% of emoticons on twitter –  Ignores complex emotions"  Lengthening –  This talk is greeeeeat! David is the beeeeeeest! Ahhhhhhhhh! –  In English 3 or more of the same char in a row doesn’t exist, except for 7 obscure terms in unix dict. –  Can indicate heightened emotion, but actual lengths are probably not meaningful. –  Useful to normalize because of how common they are (hiiii è hi) 36  
  37. 37. Maybe it’s not so hard?“We are only interested in aggregatesentiment. A high error rate merely impliesthe sentiment detector is a noisymeasurement instrument. With a fairlylarge number of measurements, theseerrors will cancel out relative to thequantity we are interested in estimating… From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series   37  
  38. 38. Splunk Sentiment Analysis App   38  
  39. 39. Design  Decisions  •  Use supervised learning. Why? Doesn’t require interactive feedback. Learning get almost the best they are going to do with only a few hundred or perhaps a few thousand documents•  Use  naïve  bayes.    Why?  Dirt  simple  and  understandable.    The  difference   between  the  best  algorithms  and  a  simple  naïve  bayes  is  generally  only   a  few  percent.     39  
  40. 40. Design Decision•  Handle lengthening. Greeeat!•  Ignore negation. In the aggregate it won’t matter much.•  Supply multiple trained models: •  Movie reviews (using IMDB ratings) •  Tweets (using emoticons to create training sets) •  Please suggest more 40
  41. 41. Summary•  Sentiment analysis helps you understand your customers and marketplace.•  True sentiment analysis is hard.•  Aggregate sentiment analysis is easier but still very valuable.•  The simplest algorithms work almost as well as the most complex, given a few thousand training points.•  Splunk has a Sentiment App. •  Download it and give feedback. •  Integrate Social data into your existing corporate data •  Share your trained models with others. 41  
  42. 42. “splunk now knows when you’ve “I actually learned something! Not.” been naughty or nice #sentiment” “#splunk #sentiment niiice.” Teh End If you’re reading this, start“keep-it-simple sentiment works #conf2012” clapping. The talk is over. “Worst talk. Ever.” Golf clapping at #sentiment_talk 42