Applications of Twitter Data Extraction For Business and Research John Conroy [email_address]
This talk… Twitter 101 Twitter data: Open, Plentiful, Real-time Twitter Size, Growth & User Profile Acquiring Twitter Data (the easy & the hard way) Applications of Twitter data analysis Business Research Limitations of Twitter data – demographics and spam
Twitter 101 Post short messages Follow Other Users Messages (tweets) can contain hyperlinks i.e. subscribe to see their tweets when you log in 1
Twitter 101
Twitter 101 Retweets, at/replies…
Twitter Data is Open, Plentiful, Real-time Open attitude to data: most users’ tweets are public (>90%) Channels: API, Twitter Search Data is plentiful: ~100m tweets per day by Nov. ’10   Data is real-time:  140 char posts + retweets = wildfire dissemination of news & viral content
Iran protests ‘09: retweets
Twitter Size, Growth Size: 105m users by April ’10 2.1m new users per week 600m search queries/day Williams(CEO), Chirp, April ‘10 User growth: 155% p.a. Daily tweets growing at 550% p.a ~100m tweets per day by Nov. ‘10 Conroy/Griffith June ’10 (6 months data)
Twitter User-Profile Even Male-Female split Brand knowledge now ubiquitous 1/6 as many users as Facebook Age: 1/3 between 25-34 years old Better educated, earn more  http://www.edisonresearch.com/twitter_usage_2010.php Edison research (U.S.-oriented research)
Twitter users: Age http://www.edisonresearch.com/twitter_usage_2010.php Edison research (U.S.-oriented research) Twitter Users Profile
Acquiring Twitter Data Twitter Search http://search.twitter.com For anybody The Easy Way The Hard Way Twitter APIs REST, Search, Streaming APIs Code (Python/PHP/Java etc…)
Acquiring Twitter Data- Twitter Search
Acquiring Twitter Data- Twitter Search
Things to do with Twitter Search Acquiring Twitter Data- Twitter Search Find business opportunities Intel on competitors Community-building: Answer “Does anyone know…?” queries in your segment Find gripes/compliments on your service Find anything else people are saying about you … etc…
Acquiring Data from Twitter APIs REST api –  find out about users – how many friends, how often they tweet, get last N tweets, are they active etc. SEARCH api –  programmatic access to Twitter search STREAMING api –  ‘firehose’ of tweets from everyone
What can we do with this data? Model the social graph of sub groups: find most-influential users (retweets, replies, follower/friend quotient) Eg Modelling the Irish Twittersphere  (Conroy, Griffith, 2010) Find the ‘true’ social graph described by conversations, find authoritative users, broadcasters User engagement metrics (how often they tweet etc.) Find similar users based on graph theory Study viral news propagation through this sub-group Find super-users (with a view to engaging them) Acquiring Data from Twitter APIs
Irish users: time since last tweet (c.23k users) Acquiring Data from Twitter APIs
Most replied-to by Irish users Feb-March ’10 93k replies from 23k users Also c.7k retweets (?) Acquiring Data from Twitter APIs
Predictive Modelling Business Intelligence Non-Twitter example: satellite images of Wal-Mart car-parks to predict earnings – smart but expensive! Acquiring Data from Twitter APIs
Using Twitter for Predictive Modelling Eg 1: holiday destinations  Acquiring Data from Twitter APIs
Eg 2: Movie pre-launch “Buzz” & marketing budget (not real figures!) Acquiring Data from Twitter APIs – Predictive Modelling
Brand Sentiment Analysis Sentiment analysis of Super Bowl commercials 2010  Conroy and Griffith, 2010 300k tweets collected during the game Probabilistic classification models & machine learning Naïve Bayes, Maximum Entropy, (S.V.M.) Try to find out which were the most popular commercials Hard!! Human language is complex… Acquiring Data from Twitter APIs
Sentiment Analysis of Superbowl Commercials: Results Acquiring Data from Twitter APIs Note: initial manual verification of these results shows disappointing results… the research continues  
What else can we do with this data? Acquiring Data from Twitter APIs
Limitations of Twitter Data Twitter < Facebook for “knowing your customer” Facebook has demographics- age, sex etc Demographic skewed towards 25-34 yr olds & tech-savvy- not ubiquitous Spam: The game can be rigged

John Conroy

  • 1.
    Applications of TwitterData Extraction For Business and Research John Conroy [email_address]
  • 2.
    This talk… Twitter101 Twitter data: Open, Plentiful, Real-time Twitter Size, Growth & User Profile Acquiring Twitter Data (the easy & the hard way) Applications of Twitter data analysis Business Research Limitations of Twitter data – demographics and spam
  • 3.
    Twitter 101 Postshort messages Follow Other Users Messages (tweets) can contain hyperlinks i.e. subscribe to see their tweets when you log in 1
  • 4.
  • 5.
    Twitter 101 Retweets,at/replies…
  • 6.
    Twitter Data isOpen, Plentiful, Real-time Open attitude to data: most users’ tweets are public (>90%) Channels: API, Twitter Search Data is plentiful: ~100m tweets per day by Nov. ’10 Data is real-time: 140 char posts + retweets = wildfire dissemination of news & viral content
  • 7.
  • 8.
    Twitter Size, GrowthSize: 105m users by April ’10 2.1m new users per week 600m search queries/day Williams(CEO), Chirp, April ‘10 User growth: 155% p.a. Daily tweets growing at 550% p.a ~100m tweets per day by Nov. ‘10 Conroy/Griffith June ’10 (6 months data)
  • 9.
    Twitter User-Profile EvenMale-Female split Brand knowledge now ubiquitous 1/6 as many users as Facebook Age: 1/3 between 25-34 years old Better educated, earn more http://www.edisonresearch.com/twitter_usage_2010.php Edison research (U.S.-oriented research)
  • 10.
    Twitter users: Agehttp://www.edisonresearch.com/twitter_usage_2010.php Edison research (U.S.-oriented research) Twitter Users Profile
  • 11.
    Acquiring Twitter DataTwitter Search http://search.twitter.com For anybody The Easy Way The Hard Way Twitter APIs REST, Search, Streaming APIs Code (Python/PHP/Java etc…)
  • 12.
  • 13.
  • 14.
    Things to dowith Twitter Search Acquiring Twitter Data- Twitter Search Find business opportunities Intel on competitors Community-building: Answer “Does anyone know…?” queries in your segment Find gripes/compliments on your service Find anything else people are saying about you … etc…
  • 15.
    Acquiring Data fromTwitter APIs REST api – find out about users – how many friends, how often they tweet, get last N tweets, are they active etc. SEARCH api – programmatic access to Twitter search STREAMING api – ‘firehose’ of tweets from everyone
  • 16.
    What can wedo with this data? Model the social graph of sub groups: find most-influential users (retweets, replies, follower/friend quotient) Eg Modelling the Irish Twittersphere (Conroy, Griffith, 2010) Find the ‘true’ social graph described by conversations, find authoritative users, broadcasters User engagement metrics (how often they tweet etc.) Find similar users based on graph theory Study viral news propagation through this sub-group Find super-users (with a view to engaging them) Acquiring Data from Twitter APIs
  • 17.
    Irish users: timesince last tweet (c.23k users) Acquiring Data from Twitter APIs
  • 18.
    Most replied-to byIrish users Feb-March ’10 93k replies from 23k users Also c.7k retweets (?) Acquiring Data from Twitter APIs
  • 19.
    Predictive Modelling BusinessIntelligence Non-Twitter example: satellite images of Wal-Mart car-parks to predict earnings – smart but expensive! Acquiring Data from Twitter APIs
  • 20.
    Using Twitter forPredictive Modelling Eg 1: holiday destinations Acquiring Data from Twitter APIs
  • 21.
    Eg 2: Moviepre-launch “Buzz” & marketing budget (not real figures!) Acquiring Data from Twitter APIs – Predictive Modelling
  • 22.
    Brand Sentiment AnalysisSentiment analysis of Super Bowl commercials 2010 Conroy and Griffith, 2010 300k tweets collected during the game Probabilistic classification models & machine learning Naïve Bayes, Maximum Entropy, (S.V.M.) Try to find out which were the most popular commercials Hard!! Human language is complex… Acquiring Data from Twitter APIs
  • 23.
    Sentiment Analysis ofSuperbowl Commercials: Results Acquiring Data from Twitter APIs Note: initial manual verification of these results shows disappointing results… the research continues 
  • 24.
    What else canwe do with this data? Acquiring Data from Twitter APIs
  • 25.
    Limitations of TwitterData Twitter < Facebook for “knowing your customer” Facebook has demographics- age, sex etc Demographic skewed towards 25-34 yr olds & tech-savvy- not ubiquitous Spam: The game can be rigged