0
Making Sense of
Millions of Thoughts
Finding
patterns
in the
Tweets
“Knowing comes from learning, from seeking.”
“What we ...
X-Men
Prof. X
Ability: Telepathy
(mind reading)
Cerebro
Enhance telepathy
Prof. X
Cerebro
With this power…
What are you thinking?
What are people thinking
about x?
Product Event
Person
etc.
Reality
Cerebro
Internet
Platform
thought
thought
thought
thought
thought
crowdsourcing
social networks
Data
Twitter
tweet
tweet
tweet
tweet
tweet
Tweets
Tweets
• 140 characters
• text + media
• geo
• time
Twitter
tweet
tweet
tweet
tweet
tweet
Tweets
What can we learn
from these Tweets?
visual-insights@twitter
@miguelrios @philogb @trebor @kristw
World Cup
Election
Oscars
Pure Curiosity
Grammy
TV Shows
New Year
Breaking news
Earthquake
Insights, Stories
(Tweets)
DATA
with limited time
Audience: general public
Tools
• Hadoop
• Apache Pig
• Vertica
• node.js, python
• d3 & co.
Pig
Insights, Stories
(Tweets)
DATA
Insights, Stories
(Tweets)
Filter
DATA
Having all Tweets
How people think I feel.
Having all Tweets
How people think I feel. How I really feel.
Filter data
Good news:
Bad news:
Want only relevant Tweets
Have all Tweets
Too many Tweets
Filter data (2)
• #hashtags — e.g. #world-cup
• easy to filter
• hashtags must be presented
• typo?
Filter data (2)
• #hashtags — e.g. #world-cup
• easy to filter
• hashtags must be presented
• keywords — e.g. goal
• broade...
Filter data (3)
• Combine with other attributes
• Time
• during the first half of World Cup final
Filter data (3)
• Combine with other attributes
• Time
• during the first half of World Cup final
• Location
• Tweets from B...
Filter data (4)
• Languages
• Sometimes use only English Tweets
• Future
• Translation?
Insights, Stories
(Tweets)
Filter
Clean
DATA
Clean data
• Typo (Mobile input)
• Abbreviation (due to 140-character limit)
• Exaggeration (e.g. GOOOOALLLL)
• Twitter sp...
Insights, Stories
(Tweets)
Filter
Clean
Visualize
DATA
(+ media)
photos, videos
What?
Where? When?
GEO TIME
TEXT
DATA
What?
Where? When?
GEO TIME
TEXT
Visualize Data
What?
Where? When?
GEO TIME
TEXT
Visualize Data
TIME Tweets/second
TIME Tweets/second
TIME Tweets/second + Annotation
http://www.flickr.com/photos/twitteroffice/5681263084/
TIME Tweets/second + Annotation
Manual
To automate
Top tweets
(most Retweets, Favs)
What?
Where? When?
GEO TIME
TEXT
Visualize Data
GEO
Heatmap
Low density
High density
GEO
New York City
flickr.com/photos/twitteroffice/8798020541
GEO
San Francisco
flickr.com/photos/twitteroffice/8798020541
GEO
San Francisco
Rebuild
the world
based on
tweet volumes
twitter.github.io/interactive/andes/
What?
Where? When?
GEO TIME
TEXT
Visualize Data
TIME + GEO
blog.twitter.com/2011/global-pulse
youtu.be/SybWjN9pKQk
Japan Earthquake 2011
TIME + GEO Tweet pattern [Rios & Lin 2012]
Night
Late night
Daytime
Night
Late night
Daytime
What?
Where? When?
GEO TIME
TEXT
Visualize Data
TEXT Trends
TEXT
www.wordle.net
Some samples from
World Cup
TEXT Word cloud of Tweets right after the 1st goal
www.wordle.net
TEXT WordTree [Wattenberg & Viégas 2008]
www.jasondavies.com/wordtre
www.jasondavies.com/wordtree
TEXT
• Now
• Derived information: Sentiment, Topic
• Combine with other information (geo & time) + context
• Future
• Bett...
TEXT Descriptive Keyphrases [Chuang et al. 2012]
TEXT
• Challenge
• Scale
What?
Where? When?
GEO TIME
TEXT
Visualize Data
GEO + TEXT Real-time Tweet map
GEO + TEXT Real-time Tweet map
GEO + TEXT Real-time Tweet map
most
frequent
term
GEO + TEXT Real-time Tweet map
Gmail went down
Jan 24, 2014
GEO + TEXT Real-time Tweet map
Nelson Mandela
passed away
Dec 5, 2013
GEO + TEXT Real-time Tweet map
• Next:
• Involves more NLP
• Tokenization - Languages without space between words
• etc.
•...
GEO + TEXT
www.yelp.com/wordmap
Yelp Wordmap
What?
Where? When?
GEO TIME
TEXT
Visualize Data
TIME + TEXT
http://www.babynamewizard.com/voyager
Baby Name Voyager
TIME + TEXT
http://www.babynamewizard.com/voyager
Baby Name Voyager
TIME + TEXT
UEFA Champions League
Biggest Tournament for European soccer clubs
Many Tweets during the matches
TIME + TEXT UEFA Champions League
Dortmund Bayern Munich
Count Tweets mentioning
the teams every minute
Team 1 Team 2
TIME + TEXT UEFA Champions League
TIME + TEXT UEFA Champions League
+ “goal” count
+ context
TIME + TEXT UEFA Champions League
+ “offside”
TIME + TEXT UEFA Champions League
+ players
A B C D
A C
C
Competition Tree
vs vs
vs
A B C D
A C
C
Competition Tree
+
vs vs
vs
A B C D
A C
C
Competition Tree
+ =
uclfinal.twitter.com
vs vs
vs
TIME + TEXT UEFA Champions League
• Challenges
• Filter relevance tweets
• Multiple matches at the same time
• Ambiguous w...
What?
Where? When?
GEO TIME
TEXT
Visualize Data
TIME + GEO + TEXT State of the Union
twitter.github.io/interactive/sotu2014
TIME + GEO + TEXT State of the Union
1) timeline + topic from Tweets
4) Density map of
Tweets about
selected topic
3) Volu...
TIME + GEO + TEXT New Year 2014
TIME + GEO + TEXT New Year 2014
TIME + GEO + TEXT New Year 2014
twitter.github.io/interactive/newyear2014/
Recap
What can we learn
from these Tweets?
many, many things.
better
the examples
in this talk
imagine…
DATA
(Tweets)
Insights, Stories
(Tweets)
Filter
Clean
Visualize
DATA
(Tweets)
Insights, Stories
Filter
Clean
Process &
Visualize
DATA
(Tweets)
Insights, Stories
Filter
Clean
Process &
Visualize
DATA
NLP
TEXT
What?
Where? When?
GEO TIME
Visualize data
(Tweets)
Insights, Stories
Filter
Clean
Process &
Visualize
DATA
Research
Working
together
Raw data
Human
Working
together
Raw data
Human
Computer (One machine, Cloud, MapReduce, etc.)
Working
together
Raw data
Human
Ignored informationProcessed information
Computer (One machine, Cloud, MapReduce, etc.)
Working
together
Raw data
Human
Aggregated information
Ignored informationProcessed information
Computer (One machine, Clo...
Working
together
Raw data
Human
Aggregated information
Ignored informationProcessed information
Computer (One machine, Clo...
Working
together
Raw data
Human
Aggregated information
Ignored informationProcessed information
VIS
Help people consume in...
Working
together
Raw data
Human
Aggregated information
Ignored informationProcessed information
VIS
Help people consume in...
Advanced techniques
vs.
Scalability
LifeFlow => Flying Sessions
Research System at Twitter
Summary
• Thoughts are captured in the Tweets: what, where, when
• Finding patterns from: text + geo + time
• Opportunitie...
Questions?
Thank you
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Making Sense of Millions of Thoughts: Finding Patterns in the Tweets
Upcoming SlideShare
Loading in...5
×

Making Sense of Millions of Thoughts: Finding Patterns in the Tweets

679

Published on

I gave this presentation at Workshop on Interactive Language Learning, Visualization, and Interfaces / ACL 2014 in Baltimore, MD on June 27, 2014.
http://nlp.stanford.edu/events/illvi2014/index.html

ABSTRACT
Everyday on Twitter, there are millions of thoughts that are captured and shared to the world in the form of 140-character messages, or Tweets. There are many things we could learn from these thoughts if we could figure out a way to digest this gigantic dataset. Visualization is one of the many ways to extract information from these Tweets. In this presentation, I will talk about several visualizations based on Tweets, as well as share experiences and challenges from working with Tweet data.

Transcript of "Making Sense of Millions of Thoughts: Finding Patterns in the Tweets"

  1. 1. Making Sense of Millions of Thoughts Finding patterns in the Tweets “Knowing comes from learning, from seeking.” “What we call chaos is just we haven't recognized.” “I am looking for a needle haystack.” “140-character text messages, called ” Krist Wongsuphasawat (50 characters) (58 characters) (42 characters) (42 characters)
  2. 2. X-Men
  3. 3. Prof. X Ability: Telepathy (mind reading)
  4. 4. Cerebro Enhance telepathy Prof. X
  5. 5. Cerebro
  6. 6. With this power…
  7. 7. What are you thinking?
  8. 8. What are people thinking about x? Product Event Person etc.
  9. 9. Reality
  10. 10. Cerebro
  11. 11. Internet
  12. 12. Platform thought thought thought thought thought crowdsourcing social networks Data
  13. 13. Twitter tweet tweet tweet tweet tweet Tweets
  14. 14. Tweets • 140 characters • text + media • geo • time
  15. 15. Twitter tweet tweet tweet tweet tweet Tweets
  16. 16. What can we learn from these Tweets?
  17. 17. visual-insights@twitter @miguelrios @philogb @trebor @kristw
  18. 18. World Cup Election Oscars Pure Curiosity Grammy TV Shows New Year Breaking news Earthquake
  19. 19. Insights, Stories (Tweets) DATA with limited time Audience: general public
  20. 20. Tools • Hadoop • Apache Pig • Vertica • node.js, python • d3 & co.
  21. 21. Pig
  22. 22. Insights, Stories (Tweets) DATA
  23. 23. Insights, Stories (Tweets) Filter DATA
  24. 24. Having all Tweets How people think I feel.
  25. 25. Having all Tweets How people think I feel. How I really feel.
  26. 26. Filter data Good news: Bad news: Want only relevant Tweets Have all Tweets Too many Tweets
  27. 27. Filter data (2) • #hashtags — e.g. #world-cup • easy to filter • hashtags must be presented • typo?
  28. 28. Filter data (2) • #hashtags — e.g. #world-cup • easy to filter • hashtags must be presented • keywords — e.g. goal • broader • can be ambiguous
  29. 29. Filter data (3) • Combine with other attributes • Time • during the first half of World Cup final
  30. 30. Filter data (3) • Combine with other attributes • Time • during the first half of World Cup final • Location • Tweets from Brazil • Not every Tweet is geotagged.
  31. 31. Filter data (4) • Languages • Sometimes use only English Tweets • Future • Translation?
  32. 32. Insights, Stories (Tweets) Filter Clean DATA
  33. 33. Clean data • Typo (Mobile input) • Abbreviation (due to 140-character limit) • Exaggeration (e.g. GOOOOALLLL) • Twitter specific e.g., Old-style retweet “RT …” • Inappropriate content
  34. 34. Insights, Stories (Tweets) Filter Clean Visualize DATA
  35. 35. (+ media) photos, videos What? Where? When? GEO TIME TEXT DATA
  36. 36. What? Where? When? GEO TIME TEXT Visualize Data
  37. 37. What? Where? When? GEO TIME TEXT Visualize Data
  38. 38. TIME Tweets/second
  39. 39. TIME Tweets/second
  40. 40. TIME Tweets/second + Annotation http://www.flickr.com/photos/twitteroffice/5681263084/
  41. 41. TIME Tweets/second + Annotation Manual To automate Top tweets (most Retweets, Favs)
  42. 42. What? Where? When? GEO TIME TEXT Visualize Data
  43. 43. GEO Heatmap Low density High density
  44. 44. GEO New York City flickr.com/photos/twitteroffice/8798020541
  45. 45. GEO San Francisco flickr.com/photos/twitteroffice/8798020541
  46. 46. GEO San Francisco Rebuild the world based on tweet volumes twitter.github.io/interactive/andes/
  47. 47. What? Where? When? GEO TIME TEXT Visualize Data
  48. 48. TIME + GEO blog.twitter.com/2011/global-pulse youtu.be/SybWjN9pKQk Japan Earthquake 2011
  49. 49. TIME + GEO Tweet pattern [Rios & Lin 2012] Night Late night Daytime Night Late night Daytime
  50. 50. What? Where? When? GEO TIME TEXT Visualize Data
  51. 51. TEXT Trends
  52. 52. TEXT www.wordle.net Some samples from World Cup
  53. 53. TEXT Word cloud of Tweets right after the 1st goal www.wordle.net
  54. 54. TEXT WordTree [Wattenberg & Viégas 2008] www.jasondavies.com/wordtre www.jasondavies.com/wordtree
  55. 55. TEXT • Now • Derived information: Sentiment, Topic • Combine with other information (geo & time) + context • Future • Better technique + involves more NLP e.g. key phrases, etc.
  56. 56. TEXT Descriptive Keyphrases [Chuang et al. 2012]
  57. 57. TEXT • Challenge • Scale
  58. 58. What? Where? When? GEO TIME TEXT Visualize Data
  59. 59. GEO + TEXT Real-time Tweet map
  60. 60. GEO + TEXT Real-time Tweet map
  61. 61. GEO + TEXT Real-time Tweet map most frequent term
  62. 62. GEO + TEXT Real-time Tweet map Gmail went down Jan 24, 2014
  63. 63. GEO + TEXT Real-time Tweet map Nelson Mandela passed away Dec 5, 2013
  64. 64. GEO + TEXT Real-time Tweet map • Next: • Involves more NLP • Tokenization - Languages without space between words • etc. • Challenge: • Real-time
  65. 65. GEO + TEXT www.yelp.com/wordmap Yelp Wordmap
  66. 66. What? Where? When? GEO TIME TEXT Visualize Data
  67. 67. TIME + TEXT http://www.babynamewizard.com/voyager Baby Name Voyager
  68. 68. TIME + TEXT http://www.babynamewizard.com/voyager Baby Name Voyager
  69. 69. TIME + TEXT UEFA Champions League Biggest Tournament for European soccer clubs Many Tweets during the matches
  70. 70. TIME + TEXT UEFA Champions League Dortmund Bayern Munich Count Tweets mentioning the teams every minute Team 1 Team 2
  71. 71. TIME + TEXT UEFA Champions League
  72. 72. TIME + TEXT UEFA Champions League + “goal” count + context
  73. 73. TIME + TEXT UEFA Champions League + “offside”
  74. 74. TIME + TEXT UEFA Champions League + players
  75. 75. A B C D A C C Competition Tree vs vs vs
  76. 76. A B C D A C C Competition Tree + vs vs vs
  77. 77. A B C D A C C Competition Tree + = uclfinal.twitter.com vs vs vs
  78. 78. TIME + TEXT UEFA Champions League • Challenges • Filter relevance tweets • Multiple matches at the same time • Ambiguous words: “goal”, “red”, “yellow” • Tweets mentioning both teams e.g. “#GER 2-2 #GHA”
  79. 79. What? Where? When? GEO TIME TEXT Visualize Data
  80. 80. TIME + GEO + TEXT State of the Union twitter.github.io/interactive/sotu2014
  81. 81. TIME + GEO + TEXT State of the Union 1) timeline + topic from Tweets 4) Density map of Tweets about selected topic 3) Volume of Tweets by topics during selected part of the SOTU 2) context (speech) twitter.github.io/interactive/sotu2014
  82. 82. TIME + GEO + TEXT New Year 2014
  83. 83. TIME + GEO + TEXT New Year 2014
  84. 84. TIME + GEO + TEXT New Year 2014 twitter.github.io/interactive/newyear2014/
  85. 85. Recap
  86. 86. What can we learn from these Tweets? many, many things.
  87. 87. better the examples in this talk imagine… DATA (Tweets)
  88. 88. Insights, Stories (Tweets) Filter Clean Visualize DATA
  89. 89. (Tweets) Insights, Stories Filter Clean Process & Visualize DATA
  90. 90. (Tweets) Insights, Stories Filter Clean Process & Visualize DATA NLP
  91. 91. TEXT What? Where? When? GEO TIME Visualize data
  92. 92. (Tweets) Insights, Stories Filter Clean Process & Visualize DATA Research
  93. 93. Working together Raw data Human
  94. 94. Working together Raw data Human Computer (One machine, Cloud, MapReduce, etc.)
  95. 95. Working together Raw data Human Ignored informationProcessed information Computer (One machine, Cloud, MapReduce, etc.)
  96. 96. Working together Raw data Human Aggregated information Ignored informationProcessed information Computer (One machine, Cloud, MapReduce, etc.)
  97. 97. Working together Raw data Human Aggregated information Ignored informationProcessed information Computer (One machine, Cloud, MapReduce, etc.) NLP Make computers think more like Human.
  98. 98. Working together Raw data Human Aggregated information Ignored informationProcessed information VIS Help people consume information. Computer (One machine, Cloud, MapReduce, etc.) NLP Make computers think more like Human.
  99. 99. Working together Raw data Human Aggregated information Ignored informationProcessed information VIS Help people consume information. Computer (One machine, Cloud, MapReduce, etc.) NLP Make computers think more like Human. HCI User interactions or Provide feedback Bridge the gap. Connect human & computer.
  100. 100. Advanced techniques vs. Scalability
  101. 101. LifeFlow => Flying Sessions Research System at Twitter
  102. 102. Summary • Thoughts are captured in the Tweets: what, where, when • Finding patterns from: text + geo + time • Opportunities for NLP + HCI + VIS collaboration • Better technique vs. Scalability + Real-time @kristw / interactive.twitter.com
  103. 103. Questions?
  104. 104. Thank you
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×