Why Twitter Is All the Rage: A Data Miner's Perspective


Published on

A presentation on data mining with Twitter that was originally presented as an O'Reilly webinar. See http://oreillynet.com/pub/e/2928 for the archived webinar video.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Why Twitter Is All the Rage: A Data Miner's Perspective

  1. 1. 1 Why Twitter Is All The Rage: A Data Miner's Perspective Matthew A. Russell O'Reilly Webcast 15 Oct 2013
  2. 2. 2 Hello, My Name Is ... Matthew Educated as a Computer Scientist CTO @ Digital Reasoning Systems Data mining; machine learning Author @ O'Reilly Media 5 published books on technology Principal @ Zaffra Selective boutique consulting
  3. 3. 3 Transforming Curiosity Into Insight An open source software (OSS) project http://bit.ly/MiningTheSocialWeb2E A book http://bit.ly/135dHfs Accessible to (virtually) everyone Virtual machine with turn-key coding templates for data science experiments Think of the book as "premium" support for the OSS project
  4. 4. 4 Overview Background Twitter as a data science platform Politics, influence, world events Data science tools for mining Twitter Q&A
  5. 5. 5 Background
  6. 6. 6 Data Science Data => Actionable information Highly interdisciplinary Nascent Necessary http://wikipedia.org/wiki/Data_science
  7. 7. 7 Digital Signal Explosion A model for the world: signal and sinks Growth in data exhaust is accelerating Digital fingerprints "Software is eating the world" Data mining opportunities galore...
  8. 8. 8 Digital Data Stats 100 terabytes of data uploaded daily to Facebook. Brands and organizations on Facebook receive 34,722 Likes every minute of the day. According to Twitter’s own research in early 2012, it sees roughly 175 million tweets every day 30 Billion pieces of content shared on Facebook every month. Data production will be 44 times greater in 2020 than it was in 2009 According to estimates, the volume of business data worldwide, across all companies, doubles every 1.2 years. See http://wikibon.org/blog/big-data-statistics
  9. 9. 9 Social Media Is All the Rage World population: ~7B people Facebook: 1.15B users Twitter: 500M users Google+ 343M users LinkedIn: 238M users ~200M+ blogs (conservative estimate)
  10. 10. 10 Why Does Social Media Matter? It's the frontier for predictive analytics Understanding world events Swaying political elections Modeling human behavior Analyzing sentiment Making intelligent recommendations
  11. 11. 11 Twitter Is All the Rage It satisfies fundamental human desires We want to be heard We want to satisfy our curiosity We want it easy We want it now Accessible, rich, and (mostly) "open" data RESTful APIs and JSON responses Great proving ground for predictive analytics
  12. 12. 12 Twitter's Network Dynamics 500M curious users 100M curious users actively engaging Real-time communication Short, sweet, ... and fast Asymmetric Following Model An interest graph
  13. 13. 13 Twitter as a data science platform
  14. 14. 14 What's in a Tweet? 140 Characters ... ... Plus ~5KB of metadata! Authorship Time & location Tweet "entities" Replying, retweeting, favoriting, etc.
  15. 15. 15 Twitter and Facebook Compared Twitter Facebook Accounts Types: "Anything" Accounts Types: People & Pages "Following" Relationships Mutual Connections Favorites "Likes" Retweets "Shares" Replies "Comments" (Almost) No Privacy Controls Extensive Privacy Controls
  16. 16. 16 Social Network Mechanics Roberto Mercedes Jorge Nina Ana
  17. 17. 17 Interest Graph Mechanics U2 Roberto Mercedes Juan Luis Luís Guerra Ana Jorge Nina
  18. 18. 18 A (Social) Interest Graph U2 Roberto Mercedes Juan Luis Luís Guerra Ana Jorge Nina
  19. 19. 19 A (Political) Interest Graph Johnny Araya Roberto Mercedes Rodolfo Hernández Ana Jorge Nina
  20. 20. 20 Costa Rican Presidential Candidates @ElDoctor2014 @Johnny_Araya
  21. 21. 21 ~3 Months on Twitter Aug 2013 Sept 2013 % Change Johnny Araya 14,573 15,506 6.40% Otto Guevara Guth 114 159 39.47% José María Villalta FlorezEstrada 8,160 8,990 10.17% 745 858 15.17% 1,192 1,487 24.75% Dr. Rodolfo Hernández Luis Guillermo Solís Rivera
  22. 22. 22 Who are Candidates Following?
  23. 23. 23 What are Candidates Tweeting?
  24. 24. 24 Potential Influence
  25. 25. 25 Potential Twitter Influence Araya Hernández Followers ~14k ~750 Theoretical Reach ~40M ~550k Reach (10) 490 673 Reach (100) 289 702 Reach (1000) 2782 X Reach (10,000) 2832 X "Suspect" Followers 3,246 94 See also http://wp.me/p3QiJd-2a
  26. 26. 26 Considerations for Measuring Influence Spam bot accounts that effectively are zombies and can’t be harnessed for any utility at all Inactive or abandoned accounts that can’t influence or be influenced since they are not in use Accounts that follow so many other accounts that the likelihood of getting noticed (and thus influencing) is practically zero The network effects of retweets by accounts that are active and can be influenced to spread a message See also http://wp.me/p3QiJd-2a
  27. 27. 27 Social Media Popularity: Araya vs Hernández Twitter Popularity Facebook Popularity Araya% Araya% Hernandez% Hernandez%
  28. 28. 28 Realtime Analysis: #Syria Monitor Twitter's firehose for realtime data using filters such as #Syria Keep in mind the sheer volume of data can be considerable Analysis at MiningTheSocialWeb.com
  29. 29. 29 #Syria: Who? See http://wp.me/p3QiJd-1I
  30. 30. 30 #Syria: Who? See http://wp.me/p3QiJd-1I
  31. 31. 31 #Syria: Who? See http://wp.me/p3QiJd-1I
  32. 32. 32 #Syria: What? See http://wp.me/p3QiJd-1I
  33. 33. 33 #Syria: What? See http://wp.me/p3QiJd-1I
  34. 34. 34 #Syria: Where? See http://wp.me/p3QiJd-1I
  35. 35. 35 #Syria: When? See http://wp.me/p3QiJd-1I
  36. 36. 36 #Syria: Why? That's for you (as the data scientist) to decide Quantitative automation can amplify human intelligence Qualitative analysis is still requires human intelligence
  37. 37. 37 Data science tools for mining Twitter
  38. 38. 38 MTSW Virtual Machine Experience Goal: Make it easy to transform curiosity into insight Vagrant-based virtual machine Virtualbox or AWS IPython Notebook User Experience Point-and-click GUI 100+ turn-key examples and templates Social web mining for the masses
  39. 39. 39 Social Media Analysis Framework A memorable four step process to guide data science experiments: Aspire Acquire Analyze Summarize
  40. 40. 40
  41. 41. 41
  42. 42. 42
  43. 43. 43
  44. 44. 44 Free Resources Mining the Social Web 2E Chapter 1 (Chimera) http://bit.ly/13XgNWR Source Code (GitHub) http://bit.ly/MiningTheSocialWeb2E http://bit.ly/1fVf5ej (numbered examples) Screencasts (Vimeo) http://bit.ly/mtsw2e-screencasts http://MiningTheSocialWeb.com
  45. 45. 45 Q&A