Guest lecture at Prof. David Gotz's UNC Chapel Hill INLS 690 Visual Analytics class (Given remotely) on Nov 10, 2015.
Many demos can also be accessed from interactive.twitter.com and kristw.yellowpigz.com
3. Computer Engineer
Bangkok, Thailand
PhD in Computer Science
Univ. of Maryland
Information Visualization
IBM
Microsoft
Data Visualization Scientist
Twitter
Krist Wongsuphasawat / @kristw
4. Krist Wongsuphasawat / @kristw
Adventure in data
A whirlwind tour of visualization projects at Twitter
9. • Too much data. Want only relevant Tweets
• hashtag: #BRA
• keywords: “goal”
• Need to aggregate & reduce size
• Long processing time (hours)
Challenges
17. To understand the world
and share the stories
To understand Twitter users
and improve the service
To showcase the data
and inspire
Projects
Storytelling
Analytics Tools
Creative
18. Storytelling1
World Cup Election
Oscars
TV Shows New Year
Earthquake
Super Bowl
Protest
…
Behaviors
Sleeping
Daylight saving
Language
…
Events
Fasting
Information spread
Commute
58. Time + Text + Geo State of the Union
twitter.github.io/interactive/sotu2014
59. 1) timeline + topic from Tweets
4) Density map of
Tweets about
selected topic
3) Volume of Tweets
by topics
during selected
part of the SOTU
2) context
(speech)
twitter.github.io/interactive/sotu2014
Time + Text + Geo State of the Union
93. Log data
in Hadoop
Aggregate
10,000+ event types
date client page section comp. elem. action count
20141011 web home home - - impression 100
20141011 web home wtf - - click 20
Engineers & Data Scientists
Client event collection
94. Log data
in Hadoop
Aggregate
10,000+ event types
date client page section comp. elem. action count
20141011 web home home - - impression 100
20141011 web home wtf - - click 20
Engineers & Data Scientists
Client event collection
(Who-to-Follow)
99. client page section component element action
Search
Find
Log data
in Hadoop
Aggregate
web home * * impression*
Client event collection
Engineers & Data Scientists
100. client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
web home * * impression*
Client event collection
Engineers & Data Scientists
101. client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
Client event collection
Engineers & Data Scientists
102. client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
10,000+ event types
search can be better
Client event collection
Engineers & Data Scientists
103. client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
10,000+ event types
not everybody knows
What are all sections under web:home?
Client event collection
Engineers & Data Scientists
104. client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
search can be better
one graph / event
10,000+ event types
not everybody knows
What are all sections under web:home?
Client event collection
Engineers & Data Scientists
105. client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
search can be better
one graph / event
x 10,000
10,000+ event types
not everybody knows
What are all sections under web:home?
Client event collection
Engineers & Data Scientists
111. See
How to visualize?
narrow down
Client event collection
Engineers & Data Scientists
Interactions
search box => filter
112. See
How to visualize?
narrow down
Client event collection
Engineers & Data Scientists
client : page : section : component : element : actionInteractions
search box => filter
126. Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
Specify all funnels manually!
n jobs
n hours
127. Goal
banana : home : - : - : - : impression
… ……
1 job => all funnels, visualized
home page
128. • Visualize an overview of event sequences
!
• Big data? eBay checkout sequences
Related work
[Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
[Shen et al. 2013]
161. 1. Define set of events
2. Pick alignment, direction and window size
3. Run Hadoop job (with more aggregation)
4. Wait for it… (2+ hrs)
5. Visualize
Final process
~100,000 patterns (10MB)
gazillion patterns (TBs)
163. • Large-scale User Activity Logs + Visual Analytics
• Used in day-to-day operations at Twitter
• Generalize to smaller systems
Summary
Challenge
big data
small data
visualize & interact
aggregate
& sacrifice
166. To understand the world
and share the stories
To understand Twitter users
and improve the service
To showcase the data
and inspire
Projects
Storytelling
Analytics Tools
Creative
168. To understand the world
and share the stories
To understand Twitter users
and improve the service
To showcase the data
and inspire
Projects
Storytelling
Analytics Tools
Creative
Reusable
Toolkits
To implement
once and for all
171. Conclusions
Data are everywhere.
Many applications:
Journalism, Product development, Art, etc.
Combine visualization with other skills:
HCI, Design, Stats, ML, etc.
Don’t repeat yourself.
Krist Wongsuphasawat / @kristw
interactive.twitter.com kristw.yellowpigz.com
173. Conclusions
Data are everywhere.
Many applications:
Journalism, Product development, Art, etc.
Combine visualization with other skills:
HCI, Design, Stats, ML, etc.
Don’t repeat yourself.
Krist Wongsuphasawat / @kristw
interactive.twitter.com kristw.yellowpigz.com