SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
7.
Krist Wongsuphasawat /@kristw
Computer Engineer
Bangkok, Thailand
M.S. in Computer Science
Univ. of Maryland
8.
Krist Wongsuphasawat /@kristw
Computer Engineer
Bangkok, Thailand
PhD in Computer Science
Univ. of Maryland
Information Visualization
9.
Krist Wongsuphasawat /@kristw
Computer Engineer
Bangkok, Thailand
PhD in Computer Science
Univ. of Maryland
Information Visualization
IBM
Microsoft
10.
Krist Wongsuphasawat /@kristw
Computer Engineer
Bangkok, Thailand
PhD in Computer Science
Univ. of Maryland
Information Visualization
IBM
Microsoft
Data Visualization Scientist
Twitter
15.
Krist Wongsuphasawat & Jimmy Lin
@kristw
Using visualizations
to monitor changes and harvest insights
from log data at Twitter
@lintool
IEEE VAST 2014
37.
Log data
in Hadoop
Engineers & Data Scientists
billions of rows
38.
Log data
in Hadoop
Aggregate
10,000+ event types
date client page section comp. elem. action count
20141011 web home home - - impression 100
20141011 web home wtf - - click 20
Engineers & Data Scientists
Client event collection
39.
Log data
in Hadoop
Aggregate
10,000+ event types
date client page section comp. elem. action count
20141011 web home home - - impression 100
20141011 web home wtf - - click 20
Engineers & Data Scientists
Client event collection
(Who-to-Follow)
40.
Log data
in Hadoop
Aggregate
Client event collection
Engineers & Data Scientists
41.
Log data
in Hadoop
Aggregate
Find
client page section component element action
Search
Client event collection
Engineers & Data Scientists
42.
Log data
in Hadoop
Aggregate
Find
client page section component element action
Search
Client event collection
Engineers & Data Scientists
44.
client page section component element action
Search
Find
Log data
in Hadoop
Aggregate
web home * * impression*
Client event collection
Engineers & Data Scientists
45.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
web home * * impression*
Client event collection
Engineers & Data Scientists
46.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
Client event collection
Engineers & Data Scientists
47.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
10,000+ event types
search can be better
Client event collection
Engineers & Data Scientists
48.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
10,000+ event types
not everybody knows
What are all sections under web:home?
Client event collection
Engineers & Data Scientists
49.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
search can be better
one graph / event
10,000+ event types
not everybody knows
What are all sections under web:home?
Client event collection
Engineers & Data Scientists
50.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
search can be better
one graph / event
x 10,000
10,000+ event types
not everybody knows
What are all sections under web:home?
Client event collection
Engineers & Data Scientists
54.
Client event collection
Engineers & Data Scientists
55.
See
Client event collection
Engineers & Data Scientists
56.
See
Interactions
search box => filter
Client event collection
narrow down
Engineers & Data Scientists
57.
See
How to visualize?
narrow down
Client event collection
Engineers & Data Scientists
Interactions
search box => filter
58.
See
How to visualize?
narrow down
Client event collection
Engineers & Data Scientists
client : page : section : component : element : actionInteractions
search box => filter
73.
Funnel analysis
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
Specify all funnels manually!
n jobs
n hours
74.
Goal
banana : home : - : - : - : impression
… ……
1 job => all funnels, visualized
home page
75.
• Visualize an overview of event sequences
!
Related work
[Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
76.
• Visualize an overview of event sequences
!
• Big data? eBay checkout sequences
!
One funnel at a time
Checkout > Payment > Confirm > Success
Related work
[Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
[Shen et al. 2013]
109.
1. Define set of events
2. Pick alignment, direction and window size
3. Run Hadoop job (with more aggregation)
4. Wait for it… (2+ hrs)
5. Visualize
Final process
~100,000 patterns (10MB)
gazillion patterns (TBs)
111.
• Since Jan 2013
• Fewer users, but more in-depth ad-hoc analysis
• Initial meeting to provide support
Deployment
112.
• What did users do when they visit Twitter? (in demo)
• Where did users give up in the sign up process?
• more in the paper
Case studies
113.
• Large-scale User Activity Logs + Visual Analytics
Conclusions & Future work
114.
• Large-scale User Activity Logs + Visual Analytics
• Find, Monitor & Explore
+ Anomaly detection & automatic alert
• Funnel Analysis
+ More interactivity & data / reduce wait time
• Used in day-to-day operations at Twitter
Conclusions & Future work
115.
Conclusions & Future work
Challenge
big data
small data
visualize & interact
• Large-scale User Activity Logs + Visual Analytics
• Find, Monitor & Explore
+ Anomaly detection & automatic alert
• Funnel Analysis
+ More interactivity & data / reduce wait time
• Used in day-to-day operations at Twitter
aggregate
& sacrifice
116.
• Large-scale User Activity Logs + Visual Analytics
• Find, Monitor & Explore
+ Anomaly detection & automatic alert
• Funnel Analysis
+ More interactivity & data / reduce wait time
• Used in day-to-day operations at Twitter
• Generalize to smaller systems
Conclusions & Future work
Challenge
big data
small data
visualize & interact
aggregate
& sacrifice
117.
• Data Scientists & Engineers @Twitter — Linus Lee, Chuang Liu
• Feedback from reviewers, Ben Shneiderman & Catherine Plaisant
Acknowledgement
118.
• Large-scale User Activity Logs + Visual Analytics
• Find, Monitor & Explore
+ Anomaly detection & automatic alert
• Funnel Analysis
+ More interactivity & data / reduce wait time
• Used in day-to-day operations at Twitter
• Generalize to smaller systems
Conclusions & Future work
Challenge
big data
small data
visualize & interact
kristw@twitter.com / @kristw
aggregate
& sacrifice