Adventure in Data: A tour of visualization projects at Twitter

Krist Wongsuphasawat
Krist WongsuphasawatData Visualization
Krist Wongsuphasawat / @kristw
Adventure in data
A tour of visualizations at Twitter
Krist Wongsuphasawat / @kristw
Computer Engineer

Bangkok, Thailand
PhD in Computer Science

Univ. of Maryland
Information Visualization
IBM
Microsoft
Data Visualization Scientist

Twitter
Krist Wongsuphasawat / @kristw
Krist Wongsuphasawat / @kristw
Adventure in data
A whirlwind tour of visualization projects at Twitter
Get data
Adventure in Data: A tour of visualization projects at Twitter
Having all Tweets
How people think I feel.
How people think I feel. How I really feel.
Having all Tweets
• Too much data. Want only relevant Tweets
• hashtag: #BRA
• keywords: “goal”
• Need to aggregate & reduce size
• Long processing time (hours)
Challenges
Hadoop Cluster
Data Storage
Workflow
Hadoop Cluster
Pig / Scalding (slow)
Data Storage
Tool
Workflow
Hadoop Cluster
Pig / Scalding (slow)
Data Storage
Tool
Workflow
Hadoop Cluster
Pig / Scalding (slow)
Data Storage
Tool
Smaller datasetYour laptop
Workflow
Hadoop Cluster
Pig / Scalding (slow)
Data Storage
Tool
Final dataset
Tool node.js / python / excel (fast)
Your laptop
Workflow
Smaller dataset
Krist Wongsuphasawat / @kristw
Cleaning data
a story of my life
Storytelling
Analytics Tools
Creative
Projects
To understand the world
and share the stories
To understand Twitter users
and improve the service
To showcase the data
and inspire
Projects
Storytelling
Analytics Tools
Creative
Storytelling1
World Cup Election
Oscars
TV Shows New Year
Earthquake
Super Bowl
Protest
…
Behaviors
Sleeping
Daylight saving
Language
…
Events
Fasting
Information spread
Commute
So many things
we could learn
from Twitter data
Give us interesting vis
about xxxx by Nov 10
Challenge accepted
Tweets
(+ media)
photos, videos
What?
Where? When?
GEO TIME
TEXT
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
Time Tweets/second
Time Tweets/second
Time Tweets/second + Annotation
http://www.flickr.com/photos/twitteroffice/5681263084/
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
Geo
Heatmap
Low density
High density
Geo
San Francisco
flickr.com/photos/twitteroffice/8798020541
Low density
High density
Geo
San Francisco
Rebuild the world
based on
tweet volumes
twitter.github.io/interactive/andes/
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
Text
www.wordle.net
Some experiments
during World Cup
Text
www.wordle.net
Word cloud of Tweets right after the 1st goal
Text Word cloud of Tweets right after the 1st goal
www.wordle.net
It was an “own” goal.
Text WordTree [Wattenberg & Viégas 2008]
www.jasondavies.com/wordtre
www.jasondavies.com/wordtree
Text
word/phrase/hashtag count
topic
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
Time + Geo Tweet pattern [Rios & Lin 2012]
Night
Late night
Daytime
Night
Late night
Daytime
Night
Late night
Daytime
Night
Late night
Daytime
Time + Geo Tweet pattern [Rios & Lin 2012]
Night
Late night
Daytime
Night
Late night
Daytime
Time + Geo Tweet pattern [Rios & Lin 2012]
Night
Late night
Daytime
Night
Late night
Daytime
Time + Geo Tweet pattern [Rios & Lin 2012]
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
Geo + Text Real-time Tweet map
Geo + Text Real-time Tweet map
most
frequent
term
Geo + Text Real-time Tweet map
Gmail was down
Jan 24, 2014
Geo + Text Real-time Tweet map
Nelson Mandela
passed away
Dec 5, 2013
Geo + Text Real-time Tweet map
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
Time + Text
UEFA Champions League
Biggest tournament for European soccer clubs
Many Tweets during the matches
UEFA Champions League
Dortmund Bayern Munich
Count Tweets mentioning
the teams every minute
Team 1 Team 2
Time + Text
Time + Text UEFA Champions League
+ “goal” count
+ context
Time + Text UEFA Champions League
+ “offside”
Time + Text UEFA Champions League
+ players
Time + Text UEFA Champions League
A B C D
A C
C
Competition Tree
vs vs
vs + =
uclfinal.twitter.com
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
Time + Text + Geo State of the Union
twitter.github.io/interactive/sotu2014
1) timeline + topic from Tweets
4) Density map of
Tweets about
selected topic
3) Volume of Tweets
by topics
during selected
part of the SOTU
2) context
(speech)
twitter.github.io/interactive/sotu2014
Time + Text + Geo State of the Union
World Cup 2014Time + Text
interactive.twitter.com/wccompetitree
Time + Text + Geo World Cup 2014
interactive.twitter.com/wccompetitree
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
+
Non-Twitter data
CONTEXT
Time + Text New Year 2014
Time + Text New Year 2014
Time + Text + Geo (c) New Year 2014
twitter.github.io/interactive/newyear2014/
Analytics Tools2
Data sources
Output
explore
analyze
present
get
*
*
Analytics Tools2
Data sources
Output
explore
analyze
present
get
*
*
Analytics Tools2
Data sources
Output
explore
analyze
present
get
*
*
ad-hoc scripts
Analytics Tools2
Data sources
Output
explore
analyze
present
get
*
*
ad-hoc scripts tools for exploration
User activity logs
UsersUseTwitter
UsersUse
Product Managers
Curious
Twitter
UsersUse
Curious
Engineers
Log data
in Hadoop
Write Twitter
Instrument
Product Managers
What are being logged?
tweet
activities
What are being logged?
tweet from home timeline on twitter.com
tweet from search page on iPhone
activities
What are being logged?
tweet from home timeline on twitter.com
tweet from search page on iPhone
sign up
log in
retweet
etc.
activities
Organize?
log event a.k.a. “client event”
[Lee et al. 2012]
log event a.k.a. “client event”
client : page : section : component : element : action
web : home : timeline : tweet_box : button : tweet
1) User ID
2) Timestamp
3) Event name
4) Event detail
[Lee et al. 2012]
Log data
UsersUse
Curious
Engineers
Log data
in Hadoop
Twitter
Instrument
Write
Product Managers
bigger than
Tweet data
UsersUse
Curious
Engineers
Log data
in Hadoop
Data Scientists
Ask
Twitter
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log data
in Hadoop
Data Scientists
Find
Ask
Twitter
Instrument
Write
Product Managers
Log data
UsersUse
Curious
Engineers
Log data
in Hadoop
Data Scientists
Find, Clean
Ask
Twitter
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log data
in Hadoop
Data Scientists
Find, Clean
Ask
Monitor
Twitter
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log data
in Hadoop
Data Scientists
Find, Clean, Analyze
Ask
Monitor
Twitter
Instrument
Write
Product Managers
Log data
EngineersData Scientists
Usersin Hadoop
Find, Clean, Analyze
Use
Monitor
Ask
Curious
1 2
Twitter
Instrument
Write
Product Managers
Part I
Find & Monitor
Client Events
Motivation
Log data
in Hadoop
Engineers & Data Scientists
billions of rows
Log data
in Hadoop
Aggregate
10,000+ event types
date client page section comp. elem. action count
20141011 web home home - - impression 100
20141011 web home wtf - - click 20
Engineers & Data Scientists
Client event collection
Log data
in Hadoop
Aggregate
10,000+ event types
date client page section comp. elem. action count
20141011 web home home - - impression 100
20141011 web home wtf - - click 20
Engineers & Data Scientists
Client event collection
(Who-to-Follow)
Log data
in Hadoop
Aggregate
Client event collection
Engineers & Data Scientists
Log data
in Hadoop
Aggregate
Find
client page section component element action
Search
Client event collection
Engineers & Data Scientists
Log data
in Hadoop
Aggregate
Find
client page section component element action
Search
Client event collection
Engineers & Data Scientists
section?
component?
element?
client page section component element action
Search
Find
Log data
in Hadoop
Aggregate
web home * * impression*
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
web home * * impression*
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
10,000+ event types
search can be better
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
10,000+ event types
not everybody knows
What are all sections under web:home?
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
search can be better
one graph / event
10,000+ event types
not everybody knows
What are all sections under web:home?
Client event collection
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
search can be better
one graph / event
x 10,000
10,000+ event types
not everybody knows
What are all sections under web:home?
Client event collection
Engineers & Data Scientists
!
• Search for client events
• Explore client event collection
• Monitor changes
Goals
Design
Client event collection
Engineers & Data Scientists
See
Client event collection
Engineers & Data Scientists
See
Interactions
search box => filter
Client event collection
narrow down
Engineers & Data Scientists
See
How to visualize?
narrow down
Client event collection
Engineers & Data Scientists
Interactions
search box => filter
See
How to visualize?
narrow down
Client event collection
Engineers & Data Scientists
client : page : section : component : element : actionInteractions
search box => filter
Client event hierarchy
iphone home -
- - impression
tweet tweet click
iphone:home:-:-:-:impression
iphone:home:-:tweet:tweet:click
Detect changes
iphone home -
- - impression
tweet tweet click
iphone home -
- - impression
tweet tweet click
TODAY
7 DAYS AGO
compared to
Calculate changes
+5% +5% +5%
+10% +10% +10%
-5% -5% -5%
DIFF
Display changes
iphone home -
- - impression
tweet tweet click
Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]
Display changes
home -
- - impression
tweet tweet click
iphone
Demo
Scribe Radar
Twitter for Banana
Adventure in Data: A tour of visualization projects at Twitter
Part II
Analysis
Count page visits
banana : home : - : - : - : impression
home page
Funnel
home page
profile page
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression
1 jobhome page
profile page
1 hour
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
2 jobs
2 hours
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
Specify all funnels manually!
n jobs
n hours
Goal
banana : home : - : - : - : impression
… ……
1 job => all funnels, visualized
home page
• Visualize an overview of event sequences
!
• Big data? eBay checkout sequences
Related work
[Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
[Shen et al. 2013]
User sessions
Session#1
A
B
start
end
Session#4
start
end
A
Session#2
B
start
end
A
Session#3
C
start
end
A
Aggregate
4 sessions
A
BB C
start
end endend
A A
end
A
Aggregate
A
BB C
start
end endend
end
4 sessions
Aggregate
C
start
end endend
end
A
B
4 sessions
Aggregate
C
start
end endend
end
A
B
4 sessions
Aggregate
C
start
end endend
A
B end
4 sessions
Aggregate
C
start
endend
A
B end
4 sessions
Aggregate
C
start
endend
A
B end
4 sessions
Aggregate
start
endend
A
CB end
4 sessions
Aggregate
4,000,000 sessions
endend
A
CB end
start
Twitter for Banana
try with sample data
(~millions sessions, 10,000+ event types)
!
original paper
(100,000 sessions, ~10 event types)
fail…
How to make it work?
# of unique sequences
1. Reduce event types
Reduce # of unique sequences
1. Reduce event types
Reduce # of unique sequences
10,000 types select
tweet
sign up
log out
1. Reduce event types
Reduce # of unique sequences
10,000 types select
tweet
sign up
log out
1. Reduce event types
Reduce # of unique sequences
10,000 types select merge
tweet from home timeline
tweet from search page
tweet …
= tweet
1. Reduce event types
2. Reduce sequence length
Reduce # of unique sequences
1. Reduce event types
2. Reduce sequence length
Reduce # of unique sequences
session
1000 events
1. Reduce event types
2. Reduce sequence length
Reduce # of unique sequences
session
10 events after (window size & direction)
1000 events
visit home page (alignment)
1. Reduce event types
2. Reduce sequence length
Reduce # of unique sequences
Ask users for input}
1. Reduce event types
2. Reduce sequence length
3. More aggregation on Hadoop
Reduce # of unique sequences
Ask users for input}
Collapse events
Sequence
ABBBCCCC
ABBCC
ABC
ABCCCC
ABCD
ABCCCD
ABCCE
ABCDF
ABCDG
ABCDH
e.g.
tweet, tweet, tweet, … = tweet
Sequence
ABC
ABC
ABC
ABC
ABCD
ABCD
ABCE
ABCDF
ABCDG
ABCDH
Collapse events
Group & Count
Sequence
ABC
ABCD
ABCE
ABCDF
ABCDG
ABCDH
…
Count
2000
80
20
1
1
1
…
Group & Count
Sequence
ABC
ABCD
ABCE
ABCDF
ABCDG
ABCDH
ABCDI
ABCDJK
ABCDJL
Count
2000
80
20
1
1
1
1
1
1
rare sequences
(count < threshold)
Truncate
Sequence
ABC
ABCD
ABCE
ABCDx
ABCDx
ABCDx
ABCDx
ABCDJx
ABCDJx
Count
2000
80
20
1
1
1
1
1
1
Replace last event with x (…)
Sequence
ABC
ABCD
ABCE
ABCDx
ABCDJx
Count
2000
80
20
4
2
Group & Count
Truncate more
Sequence
ABC
ABCD
ABCE
ABCDx
ABCDx
Count
2000
80
20
4
2
Group & Count
Sequence
ABC
ABCD
ABCE
ABCDx
Count
2000
80
20
6
1. Define set of events
2. Pick alignment, direction and window size
3. Run Hadoop job (with more aggregation)
4. Wait for it… (2+ hrs)
5. Visualize
Final process
~100,000 patterns (10MB)
gazillion patterns (TBs)
Demo
Flying Sessions
• Large-scale User Activity Logs + Visual Analytics
• Used in day-to-day operations at Twitter
• Generalize to smaller systems
Summary
Challenge
big data
small data
visualize & interact
aggregate
& sacrifice
Data sources
Output
Creative3
…
https://medium.com/@kristw/designing-the-game-of-tweets-7f87c30dc5a2
Demo / Game of Tweets
To understand the world
and share the stories
To understand Twitter users
and improve the service
To showcase the data
and inspire
Projects
Storytelling
Analytics Tools
Creative
Oh no…. NOT AGAIN
To understand the world
and share the stories
To understand Twitter users
and improve the service
To showcase the data
and inspire
Projects
Storytelling
Analytics Tools
Creative
Reusable
Toolkits
To implement
once and for all
Coming soon
Demo / Labella.js
https://github.com/twitter/d3kit
Demo / d3Kit
http://www.slideshare.net/kristw/d3kit
Conclusions
Data are everywhere.
Many applications: 

Journalism, Product development, Art, etc.
Combine visualization with other skills: 

HCI, Design, Stats, ML, etc.
Don’t repeat yourself.
Krist Wongsuphasawat / @kristw
interactive.twitter.com kristw.yellowpigz.com
@philogb @trebor @miguelrios
@smrogers @lintool @linuslee @chuangl4
and many other colleagues at @twitter
Acknowledgement
Conclusions
Data are everywhere.
Many applications: 

Journalism, Product development, Art, etc.
Combine visualization with other skills: 

HCI, Design, Stats, ML, etc.
Don’t repeat yourself.
Krist Wongsuphasawat / @kristw
interactive.twitter.com kristw.yellowpigz.com
Thank you
Questions?
1 of 175

More Related Content

What's hot(20)

Apache Spark 101 [in 50 min]Apache Spark 101 [in 50 min]
Apache Spark 101 [in 50 min]
Pawel Szulc4.9K views
BDACA1516s2 - Lecture6BDACA1516s2 - Lecture6
BDACA1516s2 - Lecture6
Department of Communication Science, University of Amsterdam249 views
BDACA1516s2 - Lecture5BDACA1516s2 - Lecture5
BDACA1516s2 - Lecture5
Department of Communication Science, University of Amsterdam317 views
BDACA1516s2 - Lecture8BDACA1516s2 - Lecture8
BDACA1516s2 - Lecture8
Department of Communication Science, University of Amsterdam202 views
GoogleGoogle
Google
Mohd Arif587 views
BDACA1617s2 - Lecture6BDACA1617s2 - Lecture6
BDACA1617s2 - Lecture6
Department of Communication Science, University of Amsterdam327 views
BD-ACA Week6BD-ACA Week6
BD-ACA Week6
Department of Communication Science, University of Amsterdam417 views
BDACA1617s2 - Lecture7BDACA1617s2 - Lecture7
BDACA1617s2 - Lecture7
Department of Communication Science, University of Amsterdam328 views
Python webinar 2nd julyPython webinar 2nd july
Python webinar 2nd july
Vineet Chaturvedi4.9K views
BDACA1617s2 - Lecture5BDACA1617s2 - Lecture5
BDACA1617s2 - Lecture5
Department of Communication Science, University of Amsterdam338 views
Google SearchologyGoogle Searchology
Google Searchology
Maheshkumar Darji504 views
Sourcing Candidates Using Twitter and Google+Sourcing Candidates Using Twitter and Google+
Sourcing Candidates Using Twitter and Google+
HM Revenue & Customs4.2K views
Linking media, data, and servicesLinking media, data, and services
Linking media, data, and services
Ruben Verborgh2K views

Viewers also liked(13)

Data_Visualization_ProjectData_Visualization_Project
Data_Visualization_Project
Mewanbanjop Mawroh282 views
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
David Graus939 views
Data visualization in python/DjangoData visualization in python/Django
Data visualization in python/Django
kenluck200115.8K views
Between Minds Between Minds
Between Minds
Mindjet9.6K views
d3Kitd3Kit
d3Kit
Krist Wongsuphasawat3.1K views
Art of Visual ThinkingArt of Visual Thinking
Art of Visual Thinking
Heidi Forbes Öste, PhD2K views
Introduction To ConfluenceIntroduction To Confluence
Introduction To Confluence
Hua Soon Sim37.1K views

Similar to Adventure in Data: A tour of visualization projects at Twitter(20)

Analyzing social media with Python and other tools (2/4) Analyzing social media with Python and other tools (2/4)
Analyzing social media with Python and other tools (2/4)
Department of Communication Science, University of Amsterdam1.8K views
Data VisualizationData Visualization
Data Visualization
Vera Kovaleva339 views
Tactical Information GatheringTactical Information Gathering
Tactical Information Gathering
Christian Martorella81.6K views
Internet of Things in TbilisiInternet of Things in Tbilisi
Internet of Things in Tbilisi
Alexey Bokov747 views
Living Labs Challenge WorkshopLiving Labs Challenge Workshop
Living Labs Challenge Workshop
Torben Brodt348 views
Developer friendly open dataDeveloper friendly open data
Developer friendly open data
Albert O'Connor560 views
Unleashing twitter data for fun and insightUnleashing twitter data for fun and insight
Unleashing twitter data for fun and insight
Digital Reasoning1.3K views
DataXDay - Building a Real Time Analytics API at ScaleDataXDay - Building a Real Time Analytics API at Scale
DataXDay - Building a Real Time Analytics API at Scale
DataXDay Conference by Xebia211 views
Accidental DataOpsAccidental DataOps
Accidental DataOps
Steve Ross239 views
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC Talk
Georgi Kobilarov1.5K views

More from Krist Wongsuphasawat(20)

What I tell myself before visualizingWhat I tell myself before visualizing
What I tell myself before visualizing
Krist Wongsuphasawat26 views
What to expect when you are visualizing (v.2)What to expect when you are visualizing (v.2)
What to expect when you are visualizing (v.2)
Krist Wongsuphasawat607 views
A Narrative Display for Sports Tournament RecapA Narrative Display for Sports Tournament Recap
A Narrative Display for Sports Tournament Recap
Krist Wongsuphasawat1.8K views
Visualization for Event Sequences ExplorationVisualization for Event Sequences Exploration
Visualization for Event Sequences Exploration
Krist Wongsuphasawat11.9K views
Usability of Google DocsUsability of Google Docs
Usability of Google Docs
Krist Wongsuphasawat2.1K views
Information Visualization for Knowledge DiscoveryInformation Visualization for Knowledge Discovery
Information Visualization for Knowledge Discovery
Krist Wongsuphasawat1.5K views
Information Visualization for Health CareInformation Visualization for Health Care
Information Visualization for Health Care
Krist Wongsuphasawat2.1K views
Finding Patterns in Temporal DataFinding Patterns in Temporal Data
Finding Patterns in Temporal Data
Krist Wongsuphasawat2.1K views

Recently uploaded(20)

RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela164 views
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej6 views
ColonyOSColonyOS
ColonyOS
JohanKristiansson69 views
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann88 views
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE909 views
Microsoft Fabric.pptxMicrosoft Fabric.pptx
Microsoft Fabric.pptx
Shruti Chaurasia17 views
Journey of Generative AIJourney of Generative AI
Journey of Generative AI
thomasjvarghese4917 views
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 12 views
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdf
HiNedHaJar7 views
PTicketInput.pdfPTicketInput.pdf
PTicketInput.pdf
stuartmcphersonflipm286 views
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)
Narendra Narendra10 views
Introduction to Microsoft Fabric.pdfIntroduction to Microsoft Fabric.pdf
Introduction to Microsoft Fabric.pdf
ishaniuudeshika19 views
MOSORE_BRESCIAMOSORE_BRESCIA
MOSORE_BRESCIA
Federico Karagulian5 views

Adventure in Data: A tour of visualization projects at Twitter