Big Data Predictive Analytics


Published on

Using Social Media (twitter, Facebook, fan pages) to predict the results of Dancing with the Stars to answer the question - Is unstructured, social media big data credible and can it be used to accurately predict future events?

Published in: Technology
  • Be the first to comment

Big Data Predictive Analytics

  1. 1. Big Data Predictive AnalyticsUsing social media to predict the results ofDancing with the Stars Rick Kawamura @r_kawamura
  2. 2. The Value of Big Data the Is unstructured, social media data credible and can it be used to accurately predict question future events?
  3. 3. The Value of Big Data Collect data from twitter, Facebook, and various fan sites. Cleanse data. Apply sentiment analysis. Organize, graph, and analyze.the Determine who will be eliminated from the show the following day. Semantictest Analysis
  4. 4. Fascinating – Kate’s Story of Survival Kate Gosselin (from Jon & Kate plus 8) was the least talented of the 12 dancers, but survived 5 weeks before being eliminated. For 5 weeks, Kate stole the headlines – not for her dancing, but for her meltdowns, fights with her partner, and how she continued to survive despite poor dancing performances. While common sense would lead one to believe she was sure to be eliminated each week, the data revealed a completely different (more accurate) story. Many comments throughout twitter and facebook showed viewers disdain for Kate and a serious credibility problem for ABC and DWTS. How could the worst dancer continue to survive? “The show must be fixed”. “ABC is keeping her on for the ratings”. Yet week after week, the data showed she was safe – that America was voting to keep her on.
  5. 5. How the data showed Kate was safe The week before she was eliminated, and similar to most The graph below shows the percent of all negative1. other weeks, Kate received the lowest score from the judges. 2. comments, Kate received close to 80%. The negative sentiment was strong. Positive sentiment, the best predictor of fan votes, showed Combining the judges’ scores with positive fan3. Kate clearly had more support than four other contestants despite her large volume of negative comments. 4. sentiment, it was clear Kate would be safe. % of Total Comments
  6. 6. The week Kate was eliminated – Data never lies Every week, Kate had the lowest score from the judges. Kate alone received 40% of all comments in social1. This week was no different. 2. media, but 90% of it was negative. % of Total Comments Judges’ Scores 50% 30 25 40% 20 30% 15 20% 10 10% 5 0% 0 In previous weeks, Kate had more positive comments than Given Kate had the lowest judges score and the lowest3. several of her competitors. However this week, while her total volume remained high, her percent of positive comments dropped significantly. 4. number of positive comments, it was clear this week that she would be eliminated. % of Positive Comments 35% 30% 25% 20% 15% 10% 5% 0%
  7. 7. Key Takeaways Social, Unstructured Big Data is Credible Social data contains true sentiment that can be applied to data models to provide insight and intelligence. Clarity of Data In some cases, the answer is obvious. Other times it is a general sense or trend, but may not pinpoint the exact target. Sentiment Analysis Sentiment Analysis is a valuable technology. But fine tuning the “degree of sentiment” can be a challenge. Consider how you would rate the following: “I love Nicole”. “I voted for Chad”. “Erin is gorgeous”. Predicting Future Events As evidenced with Kate, the results clearly demonstrated the value social media data possesses to help predict future results. Data Veracity Who better represents America’s sentiment? Those who cast their votes by calling in or texting? Or those who express their views via social media?
  8. 8. Extracting Value from Social Media – 5 Tips Data Trumps Conventional Wisdom Think of Kate. Despite the overwhelming volume of negative sentiment, her percent of positive sentiment still dwarfed many of the contestants who lacked any drama Timing is Critical Working with data as close to an event as possible is most valuable. Utilizing data in real-time can provide a competitive advantage. Don’t be blind to the Noise Factor There is a significant amount of non-essential noise in social media data that needs to be cleansed. It’s not all fluff, but may not pertain to the question you are trying to answer. Not all Social Media Sentiment is Created Equal Not all data is needed or equal in weight. Is one tweet equal to one blog post? Is negative sentiment equally as important as positive sentiment? Don’t Look at Data in a Vacuum Context around the question you are trying to answer plays an important role. Knowing to disregard negative sentiment because votes are only cast for keeping contestants on the show is critical.
  9. 9. Thanks for Viewing@r_kawamura