SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
Adventure in Data: A tour of visualization projects at Twitter
3.
Computer Engineer
Bangkok, Thailand
PhD in Computer Science
Univ. of Maryland
Information Visualization
IBM
Microsoft
Data Visualization Scientist
Twitter
Krist Wongsuphasawat / @kristw
4.
Krist Wongsuphasawat / @kristw
Adventure in data
A whirlwind tour of visualization projects at Twitter
7.
How people think I feel. How I really feel.
Having all Tweets
8.
• Too much data. Want only relevant Tweets
• hashtag: #BRA
• keywords: “goal”
• Need to aggregate & reduce size
• Long processing time (hours)
Challenges
16.
To understand the world
and share the stories
To understand Twitter users
and improve the service
To showcase the data
and inspire
Projects
Storytelling
Analytics Tools
Creative
17.
Storytelling1
World Cup Election
Oscars
TV Shows New Year
Earthquake
Super Bowl
Protest
…
Behaviors
Sleeping
Daylight saving
Language
…
Events
Fasting
Information spread
Commute
18.
So many things
we could learn
from Twitter data
55.
A B C D
A C
C
Competition Tree
vs vs
vs + =
uclfinal.twitter.com
56.
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
57.
Time + Text + Geo State of the Union
twitter.github.io/interactive/sotu2014
58.
1) timeline + topic from Tweets
4) Density map of
Tweets about
selected topic
3) Volume of Tweets
by topics
during selected
part of the SOTU
2) context
(speech)
twitter.github.io/interactive/sotu2014
Time + Text + Geo State of the Union
59.
World Cup 2014Time + Text
interactive.twitter.com/wccompetitree
60.
Time + Text + Geo World Cup 2014
interactive.twitter.com/wccompetitree
61.
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
62.
What?
Where? When?
GEO TIME
TEXT
Visualize Tweets
+
Non-Twitter data
CONTEXT
91.
Log data
in Hadoop
Engineers & Data Scientists
billions of rows
92.
Log data
in Hadoop
Aggregate
10,000+ event types
date client page section comp. elem. action count
20141011 web home home - - impression 100
20141011 web home wtf - - click 20
Engineers & Data Scientists
Client event collection
93.
Log data
in Hadoop
Aggregate
10,000+ event types
date client page section comp. elem. action count
20141011 web home home - - impression 100
20141011 web home wtf - - click 20
Engineers & Data Scientists
Client event collection
(Who-to-Follow)
94.
Log data
in Hadoop
Aggregate
Client event collection
Engineers & Data Scientists
95.
Log data
in Hadoop
Aggregate
Find
client page section component element action
Search
Client event collection
Engineers & Data Scientists
96.
Log data
in Hadoop
Aggregate
Find
client page section component element action
Search
Client event collection
Engineers & Data Scientists
98.
client page section component element action
Search
Find
Log data
in Hadoop
Aggregate
web home * * impression*
Client event collection
Engineers & Data Scientists
99.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
web home * * impression*
Client event collection
Engineers & Data Scientists
100.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
Client event collection
Engineers & Data Scientists
101.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
10,000+ event types
search can be better
Client event collection
Engineers & Data Scientists
102.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
10,000+ event types
not everybody knows
What are all sections under web:home?
Client event collection
Engineers & Data Scientists
103.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
search can be better
one graph / event
10,000+ event types
not everybody knows
What are all sections under web:home?
Client event collection
Engineers & Data Scientists
104.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
search can be better
one graph / event
x 10,000
10,000+ event types
not everybody knows
What are all sections under web:home?
Client event collection
Engineers & Data Scientists
107.
Client event collection
Engineers & Data Scientists
108.
See
Client event collection
Engineers & Data Scientists
109.
See
Interactions
search box => filter
Client event collection
narrow down
Engineers & Data Scientists
110.
See
How to visualize?
narrow down
Client event collection
Engineers & Data Scientists
Interactions
search box => filter
111.
See
How to visualize?
narrow down
Client event collection
Engineers & Data Scientists
client : page : section : component : element : actionInteractions
search box => filter
124.
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
Specify all funnels manually!
n jobs
n hours
125.
Goal
banana : home : - : - : - : impression
… ……
1 job => all funnels, visualized
home page
126.
• Visualize an overview of event sequences
!
• Big data? eBay checkout sequences
Related work
[Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
[Shen et al. 2013]
127.
User sessions
Session#1
A
B
start
end
Session#4
start
end
A
Session#2
B
start
end
A
Session#3
C
start
end
A
128.
Aggregate
4 sessions
A
BB C
start
end endend
A A
end
A
129.
Aggregate
A
BB C
start
end endend
end
4 sessions
130.
Aggregate
C
start
end endend
end
A
B
4 sessions
131.
Aggregate
C
start
end endend
end
A
B
4 sessions
132.
Aggregate
C
start
end endend
A
B end
4 sessions
159.
1. Define set of events
2. Pick alignment, direction and window size
3. Run Hadoop job (with more aggregation)
4. Wait for it… (2+ hrs)
5. Visualize
Final process
~100,000 patterns (10MB)
gazillion patterns (TBs)
161.
• Large-scale User Activity Logs + Visual Analytics
• Used in day-to-day operations at Twitter
• Generalize to smaller systems
Summary
Challenge
big data
small data
visualize & interact
aggregate
& sacrifice
163.
https://medium.com/@kristw/designing-the-game-of-tweets-7f87c30dc5a2
Demo / Game of Tweets
164.
To understand the world
and share the stories
To understand Twitter users
and improve the service
To showcase the data
and inspire
Projects
Storytelling
Analytics Tools
Creative
166.
To understand the world
and share the stories
To understand Twitter users
and improve the service
To showcase the data
and inspire
Projects
Storytelling
Analytics Tools
Creative
Reusable
Toolkits
To implement
once and for all
169.
Conclusions
Data are everywhere.
Many applications:
Journalism, Product development, Art, etc.
Combine visualization with other skills:
HCI, Design, Stats, ML, etc.
Don’t repeat yourself.
Krist Wongsuphasawat / @kristw
interactive.twitter.com kristw.yellowpigz.com
170.
@philogb @trebor @miguelrios
@smrogers @lintool @linuslee @chuangl4
and many other colleagues at @twitter
Acknowledgement
171.
Conclusions
Data are everywhere.
Many applications:
Journalism, Product development, Art, etc.
Combine visualization with other skills:
HCI, Design, Stats, ML, etc.
Don’t repeat yourself.
Krist Wongsuphasawat / @kristw
interactive.twitter.com kristw.yellowpigz.com