Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Upcoming SlideShare
Loading in...5
×
 

Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

on

  • 304 views

I gave this presentation at Workshop on Interactive Language Learning, Visualization, and Interfaces / ACL 2014 in Baltimore, MD on June 27, 2014.

I gave this presentation at Workshop on Interactive Language Learning, Visualization, and Interfaces / ACL 2014 in Baltimore, MD on June 27, 2014.
http://nlp.stanford.edu/events/illvi2014/index.html

ABSTRACT
Everyday on Twitter, there are millions of thoughts that are captured and shared to the world in the form of 140-character messages, or Tweets. There are many things we could learn from these thoughts if we could figure out a way to digest this gigantic dataset. Visualization is one of the many ways to extract information from these Tweets. In this presentation, I will talk about several visualizations based on Tweets, as well as share experiences and challenges from working with Tweet data.

Statistics

Views

Total Views
304
Views on SlideShare
299
Embed Views
5

Actions

Likes
2
Downloads
8
Comments
0

1 Embed 5

https://twitter.com 5

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Making Sense of Millions of Thoughts: Finding Patterns in the Tweets Making Sense of Millions of Thoughts: Finding Patterns in the Tweets Presentation Transcript

  • Making Sense of Millions of Thoughts Finding patterns in the Tweets “Knowing comes from learning, from seeking.” “What we call chaos is just we haven't recognized.” “I am looking for a needle haystack.” “140-character text messages, called ” Krist Wongsuphasawat (50 characters) (58 characters) (42 characters) (42 characters)
  • X-Men
  • Prof. X Ability: Telepathy (mind reading)
  • Cerebro Enhance telepathy Prof. X
  • Cerebro
  • With this power…
  • What are you thinking?
  • What are people thinking about x? Product Event Person etc.
  • Reality
  • Cerebro
  • Internet
  • Platform thought thought thought thought thought crowdsourcing social networks Data
  • Twitter tweet tweet tweet tweet tweet Tweets
  • Tweets • 140 characters • text + media • geo • time
  • Twitter tweet tweet tweet tweet tweet Tweets
  • What can we learn from these Tweets?
  • visual-insights@twitter @miguelrios @philogb @trebor @kristw
  • World Cup Election Oscars Pure Curiosity Grammy TV Shows New Year Breaking news Earthquake
  • Insights, Stories (Tweets) DATA with limited time Audience: general public
  • Tools • Hadoop • Apache Pig • Vertica • node.js, python • d3 & co.
  • Pig
  • Insights, Stories (Tweets) DATA
  • Insights, Stories (Tweets) Filter DATA
  • Having all Tweets How people think I feel.
  • Having all Tweets How people think I feel. How I really feel.
  • Filter data Good news: Bad news: Want only relevant Tweets Have all Tweets Too many Tweets
  • Filter data (2) • #hashtags — e.g. #world-cup • easy to filter • hashtags must be presented • typo?
  • Filter data (2) • #hashtags — e.g. #world-cup • easy to filter • hashtags must be presented • keywords — e.g. goal • broader • can be ambiguous
  • Filter data (3) • Combine with other attributes • Time • during the first half of World Cup final
  • Filter data (3) • Combine with other attributes • Time • during the first half of World Cup final • Location • Tweets from Brazil • Not every Tweet is geotagged.
  • Filter data (4) • Languages • Sometimes use only English Tweets • Future • Translation?
  • Insights, Stories (Tweets) Filter Clean DATA
  • Clean data • Typo (Mobile input) • Abbreviation (due to 140-character limit) • Exaggeration (e.g. GOOOOALLLL) • Twitter specific e.g., Old-style retweet “RT …” • Inappropriate content
  • Insights, Stories (Tweets) Filter Clean Visualize DATA
  • (+ media) photos, videos What? Where? When? GEO TIME TEXT DATA
  • What? Where? When? GEO TIME TEXT Visualize Data
  • What? Where? When? GEO TIME TEXT Visualize Data
  • TIME Tweets/second
  • TIME Tweets/second
  • TIME Tweets/second + Annotation http://www.flickr.com/photos/twitteroffice/5681263084/
  • TIME Tweets/second + Annotation Manual To automate Top tweets (most Retweets, Favs)
  • What? Where? When? GEO TIME TEXT Visualize Data
  • GEO Heatmap Low density High density
  • GEO New York City flickr.com/photos/twitteroffice/8798020541
  • GEO San Francisco flickr.com/photos/twitteroffice/8798020541
  • GEO San Francisco Rebuild the world based on tweet volumes twitter.github.io/interactive/andes/
  • What? Where? When? GEO TIME TEXT Visualize Data
  • TIME + GEO blog.twitter.com/2011/global-pulse youtu.be/SybWjN9pKQk Japan Earthquake 2011
  • TIME + GEO Tweet pattern [Rios & Lin 2012] Night Late night Daytime Night Late night Daytime
  • What? Where? When? GEO TIME TEXT Visualize Data
  • TEXT Trends
  • TEXT www.wordle.net Some samples from World Cup
  • TEXT Word cloud of Tweets right after the 1st goal www.wordle.net
  • TEXT WordTree [Wattenberg & Viégas 2008] www.jasondavies.com/wordtre www.jasondavies.com/wordtree
  • TEXT • Now • Derived information: Sentiment, Topic • Combine with other information (geo & time) + context • Future • Better technique + involves more NLP e.g. key phrases, etc.
  • TEXT Descriptive Keyphrases [Chuang et al. 2012]
  • TEXT • Challenge • Scale
  • What? Where? When? GEO TIME TEXT Visualize Data
  • GEO + TEXT Real-time Tweet map
  • GEO + TEXT Real-time Tweet map
  • GEO + TEXT Real-time Tweet map most frequent term
  • GEO + TEXT Real-time Tweet map Gmail went down Jan 24, 2014
  • GEO + TEXT Real-time Tweet map Nelson Mandela passed away Dec 5, 2013
  • GEO + TEXT Real-time Tweet map • Next: • Involves more NLP • Tokenization - Languages without space between words • etc. • Challenge: • Real-time
  • GEO + TEXT www.yelp.com/wordmap Yelp Wordmap
  • What? Where? When? GEO TIME TEXT Visualize Data
  • TIME + TEXT http://www.babynamewizard.com/voyager Baby Name Voyager
  • TIME + TEXT http://www.babynamewizard.com/voyager Baby Name Voyager
  • TIME + TEXT UEFA Champions League Biggest Tournament for European soccer clubs Many Tweets during the matches
  • TIME + TEXT UEFA Champions League Dortmund Bayern Munich Count Tweets mentioning the teams every minute Team 1 Team 2
  • TIME + TEXT UEFA Champions League
  • TIME + TEXT UEFA Champions League + “goal” count + context
  • TIME + TEXT UEFA Champions League + “offside”
  • TIME + TEXT UEFA Champions League + players
  • A B C D A C C Competition Tree vs vs vs
  • A B C D A C C Competition Tree + vs vs vs
  • A B C D A C C Competition Tree + = uclfinal.twitter.com vs vs vs
  • TIME + TEXT UEFA Champions League • Challenges • Filter relevance tweets • Multiple matches at the same time • Ambiguous words: “goal”, “red”, “yellow” • Tweets mentioning both teams e.g. “#GER 2-2 #GHA”
  • What? Where? When? GEO TIME TEXT Visualize Data
  • TIME + GEO + TEXT State of the Union twitter.github.io/interactive/sotu2014
  • TIME + GEO + TEXT State of the Union 1) timeline + topic from Tweets 4) Density map of Tweets about selected topic 3) Volume of Tweets by topics during selected part of the SOTU 2) context (speech) twitter.github.io/interactive/sotu2014
  • TIME + GEO + TEXT New Year 2014
  • TIME + GEO + TEXT New Year 2014
  • TIME + GEO + TEXT New Year 2014 twitter.github.io/interactive/newyear2014/
  • Recap
  • What can we learn from these Tweets? many, many things.
  • better the examples in this talk imagine… DATA (Tweets)
  • Insights, Stories (Tweets) Filter Clean Visualize DATA
  • (Tweets) Insights, Stories Filter Clean Process & Visualize DATA
  • (Tweets) Insights, Stories Filter Clean Process & Visualize DATA NLP
  • TEXT What? Where? When? GEO TIME Visualize data
  • (Tweets) Insights, Stories Filter Clean Process & Visualize DATA Research
  • Working together Raw data Human
  • Working together Raw data Human Computer (One machine, Cloud, MapReduce, etc.)
  • Working together Raw data Human Ignored informationProcessed information Computer (One machine, Cloud, MapReduce, etc.)
  • Working together Raw data Human Aggregated information Ignored informationProcessed information Computer (One machine, Cloud, MapReduce, etc.)
  • Working together Raw data Human Aggregated information Ignored informationProcessed information Computer (One machine, Cloud, MapReduce, etc.) NLP Make computers think more like Human.
  • Working together Raw data Human Aggregated information Ignored informationProcessed information VIS Help people consume information. Computer (One machine, Cloud, MapReduce, etc.) NLP Make computers think more like Human.
  • Working together Raw data Human Aggregated information Ignored informationProcessed information VIS Help people consume information. Computer (One machine, Cloud, MapReduce, etc.) NLP Make computers think more like Human. HCI User interactions or Provide feedback Bridge the gap. Connect human & computer.
  • Advanced techniques vs. Scalability
  • LifeFlow => Flying Sessions Research System at Twitter
  • Summary • Thoughts are captured in the Tweets: what, where, when • Finding patterns from: text + geo + time • Opportunities for NLP + HCI + VIS collaboration • Better technique vs. Scalability + Real-time @kristw / interactive.twitter.com
  • Questions?
  • Thank you