Using visualizations 
to monitor changes and harvest insights 
from log data at Twitter 
Krist Wongsuphasawat & Jimmy Lin 
@kristw 
@lintool
Logging user activities 
& data analysis
Twitter Use Users
Use Users 
Curious 
Twitter 
Product Managers
Use Users 
Curious 
Log data 
in Hadoop Write Twitter 
Instrument 
Engineers 
Product Managers
What are being logged? 
activities 
tweet
What are being logged? 
activities 
tweet from home timeline on twitter.com 
tweet from search page on iPhone
What are being logged? 
activities 
tweet from home timeline on twitter.com 
tweet from search page on iPhone 
sign up 
log in 
retweet 
etc.
Organize?
log event a.k.a. “client event” 
[Lee et al. 2012]
log event a.k.a. “client event” 
1) User ID 
2) Timestamp 
3) Event name 
client : page : section : component : element : action 
web : home : timeline : tweet_box : button : tweet 
4) Event detail 
[Lee et al. 2012]
Log data
Use Users 
Curious 
Log data 
in Hadoop Twitter 
Instrument 
Engineers 
Write 
Product Managers 
bigger than 
Tweet data
Use Users 
Curious 
Engineers 
Log data 
in Hadoop 
Data Scientists 
Ask 
Twitter 
Instrument 
Write 
Product Managers
Use Users 
Curious 
Engineers 
Log data 
in Hadoop 
Find 
Data Scientists 
Ask 
Twitter 
Instrument 
Write 
Product Managers
Log data
Use Users 
Curious 
Engineers 
Log data 
in Hadoop 
Find, Clean 
Data Scientists 
Ask 
Twitter 
Instrument 
Write 
Product Managers
Use Users 
Curious 
Engineers 
Log data 
in Hadoop 
Find, Clean 
Data Scientists 
Monitor 
Ask 
Twitter 
Instrument 
Write 
Product Managers
Use Users 
Curious 
Engineers 
Log data 
in Hadoop 
Find, Clean, Analyze 
Data Scientists 
Monitor 
Ask 
Twitter 
Instrument 
Write 
Product Managers
Log data 
Users in Hadoop 
Find, Clean, Analyze 
Data Scientists Engineers 
Use 
Monitor 
Ask 
Curious 
1 2 
Twitter 
Instrument 
Write 
Product Managers
Part I 
Find & Monitor 
Client Events
Motivation
Log data 
in Hadoop 
Engineers & Data Scientists 
billions of rows
Log data 
in Hadoop 
Aggregate 
Client event collection 
10,000+ event types 
date client page section comp. elem. action count 
20141011 web home home - - impression 100 
20141011 web home wtf - - click 20 
Engineers & Data Scientists
Log data 
in Hadoop 
Aggregate 
Client event collection 
10,000+ event types 
date client page section comp. elem. action count 
20141011 web home home - - impression 100 
20141011 web home wtf - - click 20 
(Who-to-Follow) 
Engineers & Data Scientists
Log data 
in Hadoop 
Aggregate 
Client event collection 
Engineers & Data Scientists
Log data 
in Hadoop 
Aggregate 
Client event collection 
client page section component element action 
Find 
Search 
Engineers & Data Scientists
Log data 
in Hadoop 
Aggregate 
Client event collection 
client page section component element action 
Find 
Search 
Engineers & Data Scientists
section? 
component? 
element?
Client event collection 
Search 
client page section component element action 
Find 
Log data 
in Hadoop 
Aggregate 
web home * * * impression 
Engineers & Data Scientists
Client event collection 
Search 
Query 
client page section component element action 
Find 
Aggregate 
Return 
Log data 
in Hadoop 
Results 
web : home : home : - : - : impression 
web : home : wtf : - : - : impression 
web home * * * impression 
Engineers & Data Scientists
Client event collection 
Search 
Query 
client page section component element action 
Find 
Aggregate 
Return 
Log data 
in Hadoop 
Results 
web : home : home : - : - : impression 
web : home : wtf : - : - : impression 
search can be better 
Engineers & Data Scientists
Client event collection 
Search 
Query 
client page section component element action 
Find 
Aggregate 
Return 
Log data 
in Hadoop 
Results 
web : home : home : - : - : impression 
web : home : wtf : - : - : impression 
10,000+ event types 
search can be better 
Engineers & Data Scientists
Client event collection 
10,000+ event types 
What are all sections under web:home? 
Search 
Query 
not everybody knows 
client page section component element action 
Find 
Aggregate 
Return 
Log data 
in Hadoop 
Results 
web : home : home : - : - : impression 
web : home : wtf : - : - : impression 
search can be better 
Engineers & Data Scientists
Client event collection 
Search 
Query 
client page section component element action 
Find 
Aggregate 
Return 
Log data 
in Hadoop 
Results 
web : home : home : - : - : impression 
search can be better 
one graph / event 
10,000+ event types 
not everybody knows 
What are all sections under web:home? 
Engineers & Data Scientists
Client event collection 
Search 
Query 
client page section component element action 
Find 
Aggregate 
Return 
Log data 
in Hadoop 
Results 
web : home : home : - : - : impression 
search can be better 
one graph / event 
x 10,000 
10,000+ event types 
not everybody knows 
What are all sections under web:home? 
Engineers & Data Scientists
! 
Goals 
• Search for client events 
• Explore client event collection 
• Monitor changes
• Session analysis 
[Lam et al. 2007, Shen et al. 2013] 
! 
Related work 
• Monitor network logs, not user activity logs 
[Ghoniem et al. 2013]
Design
Client event collection 
Engineers & Data Scientists
Client event collection 
See 
Engineers & Data Scientists
narrow down 
See 
Interactions 
search box => filter 
Client event collection 
Engineers & Data Scientists
See 
How to visualize? 
narrow down 
Client event collection 
Engineers & Data Scientists 
Interactions 
search box => filter
Interactions client : page : section : component : element : action 
search box => filter 
See 
How to visualize? 
narrow down 
Client event collection 
Engineers & Data Scientists
Client event hierarchy 
iphone:home:-:-:-:impression 
iphone:home:-:tweet:tweet:click 
iphone home - 
- - impression 
tweet tweet click
Detect changes 
iphone home - 
- - impression 
tweet tweet click 
iphone home - 
- - impression 
tweet tweet click 
TODAY 
7 DAYS AGO 
compared to
Calculate changes 
+5% +5% +5% 
+10% +10% +10% 
-5% -5% -5% 
DIFF
Display changes 
iphone home - 
- - impression 
tweet tweet click 
Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]
Display changes 
home - 
- - impression 
tweet tweet click 
iphone
Demo 
Scribe Radar
Twitter for Banana
• Since Dec 2013 
• 500 unique users, 10 users / day 
! 
• No training 
Deployment
Use cases 
Users: PMs, Data Scientists, Engineers 
• Search 
• Monitor 
• See effects after major product launch 
read the paper :)
Part II 
Analysis
Count page visits 
home page 
banana : home : - : - : - : impression
Funnel 
home page 
profile page
Funnel analysis 
banana : home : - : - : - : impression 
banana : profile : - : - : - : impression 
home page 1 job 
profile page 
1 hour
Funnel analysis 
home page 
banana : home : - : - : - : impression 
profile page search page 
2 jobs 
2 hours 
banana : profile : - : - : - : impression banana : search : - : - : - : impression
Funnel analysis 
home page 
banana : home : - : - : - : impression 
profile page search page 
banana : profile : - : - : - : impression banana : search : - : - : - : impression 
Specify all funnels manually! 
n jobs 
n hours
Goal 
home page 
banana : home : - : - : - : impression 
… … … 
1 job => all funnels, visualized
• Visualize an overview of event sequences 
! 
Related work 
[Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
Related work 
• Visualize an overview of event sequences 
[Wongsuphasawat et al. 2011, Monroe et al. 2013, …] 
! 
• Big data? eBay checkout sequences 
[Shen et al. 2013] 
! 
One funnel at a time 
Checkout > Payment > Confirm > Success
LifeFlow 
[CHI2011] 
! 
(simplified)
User sessions 
Session#1 
start 
A 
B 
end 
Session#4 
start 
A 
end 
Session#2 
start 
A 
B 
end 
Session#3 
start 
A 
C 
end
Aggregate 
4 sessions 
A 
start 
A A 
B B C 
end end end 
A 
end
Aggregate 
start 
A 
B B C 
end end end 
end 
4 sessions
Aggregate 
C 
start 
end end end 
end 
A 
B 
4 sessions
Aggregate 
C 
start 
end end end 
end 
A 
B 
4 sessions
Aggregate 
4 sessions 
B C 
end 
start 
A 
end end end
Aggregate 
4 sessions 
B C 
end 
start 
A 
end end
Aggregate 
4 sessions 
B C 
end 
start 
A 
end end
Aggregate 
4 sessions 
start 
A 
B C end 
end end
Aggregate 
4,000,000 sessions 
start 
A 
B C end 
end end
try with sample data 
(~millions sessions, 10,000+ event types) 
! 
original paper 
(100,000 sessions, ~10 event types)
not meaningful 
! 
small slice of data 
but huge file
How to make it work?
# of unique sequences
Reduce # of unique sequences 
1. Reduce event types
Reduce # of unique sequences 
1. Reduce event types 
10,000 types select 
tweet 
sign up 
log out
Reduce # of unique sequences 
1. Reduce event types 
10,000 types select 
tweet 
sign up 
log out
Reduce # of unique sequences 
1. Reduce event types 
10,000 types select merge 
tweet from home timeline 
tweet from search page 
tweet … 
= tweet
Reduce # of unique sequences 
1. Reduce event types 
2. Reduce sequence length
Reduce # of unique sequences 
1. Reduce event types 
2. Reduce sequence length 
session 
1000 events
Reduce # of unique sequences 
1. Reduce event types 
2. Reduce sequence length 
session 
10 events after (window size & direction) 
1000 events 
visit home page (alignment)
Reduce # of unique sequences 
1. Reduce event types 
2. Reduce sequence length 
} Ask users for input
Reduce # of unique sequences 
} Ask users for input 
1. Reduce event types 
2. Reduce sequence length 
3. More aggregation on Hadoop
Collapse events 
Sequence 
ABBBCCCC 
ABBCC 
ABC 
ABCCCC 
e.g. 
ABCD 
ABCCCD 
ABCCE 
ABCDF 
ABCDG 
ABCDH 
tweet, tweet, tweet, … = tweet
Collapse events 
Sequence 
ABC 
ABC 
ABC 
ABC 
ABCD 
ABCD 
ABCE 
ABCDF 
ABCDG 
ABCDH
Group & Count 
Sequence 
Count 
ABC 
2000 
ABCD 
80 
ABCE 
20 
ABCDF 
1 
ABCDG 
1 
ABCDH 
1 
… 
…
Group & Count 
Sequence 
Count 
ABC 
2000 
ABCD 
80 
ABCE 
20 
ABCDF 
1 
ABCDG 
1 
ABCDH 
1 
ABCDI 
1 
ABCDJK 
1 
ABCDJL 
1 
rare sequences 
(count < threshold)
Truncate 
Sequence 
ABC 
ABCD 
ABCE 
ABCDx 
ABCDx 
ABCDx 
ABCDx 
ABCDJx 
ABCDJx 
Count 
2000 
80 
20 
1 
1 
1 
1 
1 
1 
Replace last event with x (…)
Group & Count 
Sequence 
ABC 
ABCD 
ABCE 
ABCDx 
ABCDJx 
Count 
2000 
80 
20 
4 
2
Truncate more 
Sequence 
ABC 
ABCD 
ABCE 
ABCDx 
ABCDx 
Count 
2000 
80 
20 
4 
2
Group & Count 
Sequence 
Count 
ABC 
2000 
ABCD 
80 
ABCE 
20 
ABCDx 
6
Final process 
1. Define set of events 
2. Pick alignment, direction and window size 
3. Run Hadoop job (with more aggregation) 
4. Wait for it… (2+ hrs) 
5. Visualize 
gazillion patterns (TBs) 
~100,000 patterns (10MB)
Demo 
Flying Sessions
Deployment 
• Since Jan 2013 
• Fewer users, but more in-depth ad-hoc analysis 
• Initial meeting to provide support
Case studies 
• What did users do when they visit Twitter? (in demo) 
• Where did users give up in the sign up process? 
• more in the paper
Case studies 
click on “sign up” 
fill personal info 
import address book 
etc. 
• What did users do when they visit Twitter? (in demo) 
• Where did users give up in the sign up process? 
• more in the paper
Case studies 
• What did users do when they visit Twitter? (in demo) 
• Where did users give up in the sign up process? 
• more in the paper 
read the paper :)
Conclusions & Future work 
• Large-scale User Activity Logs + Visual Analytics
Conclusions & Future work 
• Large-scale User Activity Logs + Visual Analytics 
• Find, Monitor & Explore 
+ Anomaly detection & automatic alert 
• Funnel Analysis 
+ More interactivity & data / reduce wait time / latency study? 
• Used in day-to-day operations at Twitter
Conclusions & Future work 
Challenge 
big data 
small data 
visualize & interact 
• Large-scale User Activity Logs + Visual Analytics 
• Find, Monitor & Explore 
+ Anomaly detection & automatic alert 
• Funnel Analysis 
+ More interactivity & data / reduce wait time / latency study? 
• Used in day-to-day operations at Twitter 
aggregate 
& sacrifice
Conclusions & Future work 
• Large-scale User Activity Logs + Visual Analytics 
• Find, Monitor & Explore 
+ Anomaly detection & automatic alert 
• Funnel Analysis 
+ More interactivity & data / reduce wait time / latency study? 
• Used in day-to-day operations at Twitter 
• Generalize to smaller systems 
Challenge 
big data 
aggregate 
& sacrifice 
small data 
visualize & interact
Acknowledgement 
• Data Scientists & Engineers @Twitter — Linus Lee, Chuang Liu 
• Feedback from reviewers, Ben Shneiderman & Catherine Plaisant
Conclusions & Future work 
• Large-scale User Activity Logs + Visual Analytics 
• Find, Monitor & Explore 
+ Anomaly detection & automatic alert 
• Funnel Analysis 
+ More interactivity & data / reduce wait time / latency study? 
• Used in day-to-day operations at Twitter 
• Generalize to smaller systems 
Challenge 
big data 
aggregate 
& sacrifice 
small data 
visualize & interact 
kristw@twitter.com / @kristw
Questions?
Thank you

Using Visualizations to Monitor Changes and Harvest Insights from a Global-scale Logging Infrastructure at Twitter

  • 1.
    Using visualizations tomonitor changes and harvest insights from log data at Twitter Krist Wongsuphasawat & Jimmy Lin @kristw @lintool
  • 2.
    Logging user activities & data analysis
  • 3.
  • 4.
    Use Users Curious Twitter Product Managers
  • 5.
    Use Users Curious Log data in Hadoop Write Twitter Instrument Engineers Product Managers
  • 6.
    What are beinglogged? activities tweet
  • 7.
    What are beinglogged? activities tweet from home timeline on twitter.com tweet from search page on iPhone
  • 8.
    What are beinglogged? activities tweet from home timeline on twitter.com tweet from search page on iPhone sign up log in retweet etc.
  • 9.
  • 10.
    log event a.k.a.“client event” [Lee et al. 2012]
  • 11.
    log event a.k.a.“client event” 1) User ID 2) Timestamp 3) Event name client : page : section : component : element : action web : home : timeline : tweet_box : button : tweet 4) Event detail [Lee et al. 2012]
  • 12.
  • 13.
    Use Users Curious Log data in Hadoop Twitter Instrument Engineers Write Product Managers bigger than Tweet data
  • 14.
    Use Users Curious Engineers Log data in Hadoop Data Scientists Ask Twitter Instrument Write Product Managers
  • 15.
    Use Users Curious Engineers Log data in Hadoop Find Data Scientists Ask Twitter Instrument Write Product Managers
  • 16.
  • 17.
    Use Users Curious Engineers Log data in Hadoop Find, Clean Data Scientists Ask Twitter Instrument Write Product Managers
  • 18.
    Use Users Curious Engineers Log data in Hadoop Find, Clean Data Scientists Monitor Ask Twitter Instrument Write Product Managers
  • 19.
    Use Users Curious Engineers Log data in Hadoop Find, Clean, Analyze Data Scientists Monitor Ask Twitter Instrument Write Product Managers
  • 20.
    Log data Usersin Hadoop Find, Clean, Analyze Data Scientists Engineers Use Monitor Ask Curious 1 2 Twitter Instrument Write Product Managers
  • 21.
    Part I Find& Monitor Client Events
  • 22.
  • 23.
    Log data inHadoop Engineers & Data Scientists billions of rows
  • 24.
    Log data inHadoop Aggregate Client event collection 10,000+ event types date client page section comp. elem. action count 20141011 web home home - - impression 100 20141011 web home wtf - - click 20 Engineers & Data Scientists
  • 25.
    Log data inHadoop Aggregate Client event collection 10,000+ event types date client page section comp. elem. action count 20141011 web home home - - impression 100 20141011 web home wtf - - click 20 (Who-to-Follow) Engineers & Data Scientists
  • 26.
    Log data inHadoop Aggregate Client event collection Engineers & Data Scientists
  • 27.
    Log data inHadoop Aggregate Client event collection client page section component element action Find Search Engineers & Data Scientists
  • 28.
    Log data inHadoop Aggregate Client event collection client page section component element action Find Search Engineers & Data Scientists
  • 29.
  • 30.
    Client event collection Search client page section component element action Find Log data in Hadoop Aggregate web home * * * impression Engineers & Data Scientists
  • 31.
    Client event collection Search Query client page section component element action Find Aggregate Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression web home * * * impression Engineers & Data Scientists
  • 32.
    Client event collection Search Query client page section component element action Find Aggregate Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression search can be better Engineers & Data Scientists
  • 33.
    Client event collection Search Query client page section component element action Find Aggregate Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression 10,000+ event types search can be better Engineers & Data Scientists
  • 34.
    Client event collection 10,000+ event types What are all sections under web:home? Search Query not everybody knows client page section component element action Find Aggregate Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression search can be better Engineers & Data Scientists
  • 35.
    Client event collection Search Query client page section component element action Find Aggregate Return Log data in Hadoop Results web : home : home : - : - : impression search can be better one graph / event 10,000+ event types not everybody knows What are all sections under web:home? Engineers & Data Scientists
  • 36.
    Client event collection Search Query client page section component element action Find Aggregate Return Log data in Hadoop Results web : home : home : - : - : impression search can be better one graph / event x 10,000 10,000+ event types not everybody knows What are all sections under web:home? Engineers & Data Scientists
  • 37.
    ! Goals •Search for client events • Explore client event collection • Monitor changes
  • 38.
    • Session analysis [Lam et al. 2007, Shen et al. 2013] ! Related work • Monitor network logs, not user activity logs [Ghoniem et al. 2013]
  • 39.
  • 40.
    Client event collection Engineers & Data Scientists
  • 41.
    Client event collection See Engineers & Data Scientists
  • 42.
    narrow down See Interactions search box => filter Client event collection Engineers & Data Scientists
  • 43.
    See How tovisualize? narrow down Client event collection Engineers & Data Scientists Interactions search box => filter
  • 44.
    Interactions client :page : section : component : element : action search box => filter See How to visualize? narrow down Client event collection Engineers & Data Scientists
  • 45.
    Client event hierarchy iphone:home:-:-:-:impression iphone:home:-:tweet:tweet:click iphone home - - - impression tweet tweet click
  • 46.
    Detect changes iphonehome - - - impression tweet tweet click iphone home - - - impression tweet tweet click TODAY 7 DAYS AGO compared to
  • 47.
    Calculate changes +5%+5% +5% +10% +10% +10% -5% -5% -5% DIFF
  • 48.
    Display changes iphonehome - - - impression tweet tweet click Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]
  • 49.
    Display changes home- - - impression tweet tweet click iphone
  • 50.
  • 51.
  • 53.
    • Since Dec2013 • 500 unique users, 10 users / day ! • No training Deployment
  • 54.
    Use cases Users:PMs, Data Scientists, Engineers • Search • Monitor • See effects after major product launch read the paper :)
  • 55.
  • 56.
    Count page visits home page banana : home : - : - : - : impression
  • 57.
    Funnel home page profile page
  • 58.
    Funnel analysis banana: home : - : - : - : impression banana : profile : - : - : - : impression home page 1 job profile page 1 hour
  • 59.
    Funnel analysis homepage banana : home : - : - : - : impression profile page search page 2 jobs 2 hours banana : profile : - : - : - : impression banana : search : - : - : - : impression
  • 60.
    Funnel analysis homepage banana : home : - : - : - : impression profile page search page banana : profile : - : - : - : impression banana : search : - : - : - : impression Specify all funnels manually! n jobs n hours
  • 61.
    Goal home page banana : home : - : - : - : impression … … … 1 job => all funnels, visualized
  • 62.
    • Visualize anoverview of event sequences ! Related work [Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
  • 63.
    Related work •Visualize an overview of event sequences [Wongsuphasawat et al. 2011, Monroe et al. 2013, …] ! • Big data? eBay checkout sequences [Shen et al. 2013] ! One funnel at a time Checkout > Payment > Confirm > Success
  • 64.
    LifeFlow [CHI2011] ! (simplified)
  • 65.
    User sessions Session#1 start A B end Session#4 start A end Session#2 start A B end Session#3 start A C end
  • 66.
    Aggregate 4 sessions A start A A B B C end end end A end
  • 67.
    Aggregate start A B B C end end end end 4 sessions
  • 68.
    Aggregate C start end end end end A B 4 sessions
  • 69.
    Aggregate C start end end end end A B 4 sessions
  • 70.
    Aggregate 4 sessions B C end start A end end end
  • 71.
    Aggregate 4 sessions B C end start A end end
  • 72.
    Aggregate 4 sessions B C end start A end end
  • 73.
    Aggregate 4 sessions start A B C end end end
  • 74.
    Aggregate 4,000,000 sessions start A B C end end end
  • 75.
    try with sampledata (~millions sessions, 10,000+ event types) ! original paper (100,000 sessions, ~10 event types)
  • 76.
    not meaningful ! small slice of data but huge file
  • 77.
    How to makeit work?
  • 78.
    # of uniquesequences
  • 79.
    Reduce # ofunique sequences 1. Reduce event types
  • 80.
    Reduce # ofunique sequences 1. Reduce event types 10,000 types select tweet sign up log out
  • 81.
    Reduce # ofunique sequences 1. Reduce event types 10,000 types select tweet sign up log out
  • 82.
    Reduce # ofunique sequences 1. Reduce event types 10,000 types select merge tweet from home timeline tweet from search page tweet … = tweet
  • 83.
    Reduce # ofunique sequences 1. Reduce event types 2. Reduce sequence length
  • 84.
    Reduce # ofunique sequences 1. Reduce event types 2. Reduce sequence length session 1000 events
  • 85.
    Reduce # ofunique sequences 1. Reduce event types 2. Reduce sequence length session 10 events after (window size & direction) 1000 events visit home page (alignment)
  • 86.
    Reduce # ofunique sequences 1. Reduce event types 2. Reduce sequence length } Ask users for input
  • 87.
    Reduce # ofunique sequences } Ask users for input 1. Reduce event types 2. Reduce sequence length 3. More aggregation on Hadoop
  • 88.
    Collapse events Sequence ABBBCCCC ABBCC ABC ABCCCC e.g. ABCD ABCCCD ABCCE ABCDF ABCDG ABCDH tweet, tweet, tweet, … = tweet
  • 89.
    Collapse events Sequence ABC ABC ABC ABC ABCD ABCD ABCE ABCDF ABCDG ABCDH
  • 90.
    Group & Count Sequence Count ABC 2000 ABCD 80 ABCE 20 ABCDF 1 ABCDG 1 ABCDH 1 … …
  • 91.
    Group & Count Sequence Count ABC 2000 ABCD 80 ABCE 20 ABCDF 1 ABCDG 1 ABCDH 1 ABCDI 1 ABCDJK 1 ABCDJL 1 rare sequences (count < threshold)
  • 92.
    Truncate Sequence ABC ABCD ABCE ABCDx ABCDx ABCDx ABCDx ABCDJx ABCDJx Count 2000 80 20 1 1 1 1 1 1 Replace last event with x (…)
  • 93.
    Group & Count Sequence ABC ABCD ABCE ABCDx ABCDJx Count 2000 80 20 4 2
  • 94.
    Truncate more Sequence ABC ABCD ABCE ABCDx ABCDx Count 2000 80 20 4 2
  • 95.
    Group & Count Sequence Count ABC 2000 ABCD 80 ABCE 20 ABCDx 6
  • 96.
    Final process 1.Define set of events 2. Pick alignment, direction and window size 3. Run Hadoop job (with more aggregation) 4. Wait for it… (2+ hrs) 5. Visualize gazillion patterns (TBs) ~100,000 patterns (10MB)
  • 97.
  • 98.
    Deployment • SinceJan 2013 • Fewer users, but more in-depth ad-hoc analysis • Initial meeting to provide support
  • 99.
    Case studies •What did users do when they visit Twitter? (in demo) • Where did users give up in the sign up process? • more in the paper
  • 100.
    Case studies clickon “sign up” fill personal info import address book etc. • What did users do when they visit Twitter? (in demo) • Where did users give up in the sign up process? • more in the paper
  • 101.
    Case studies •What did users do when they visit Twitter? (in demo) • Where did users give up in the sign up process? • more in the paper read the paper :)
  • 102.
    Conclusions & Futurework • Large-scale User Activity Logs + Visual Analytics
  • 103.
    Conclusions & Futurework • Large-scale User Activity Logs + Visual Analytics • Find, Monitor & Explore + Anomaly detection & automatic alert • Funnel Analysis + More interactivity & data / reduce wait time / latency study? • Used in day-to-day operations at Twitter
  • 104.
    Conclusions & Futurework Challenge big data small data visualize & interact • Large-scale User Activity Logs + Visual Analytics • Find, Monitor & Explore + Anomaly detection & automatic alert • Funnel Analysis + More interactivity & data / reduce wait time / latency study? • Used in day-to-day operations at Twitter aggregate & sacrifice
  • 105.
    Conclusions & Futurework • Large-scale User Activity Logs + Visual Analytics • Find, Monitor & Explore + Anomaly detection & automatic alert • Funnel Analysis + More interactivity & data / reduce wait time / latency study? • Used in day-to-day operations at Twitter • Generalize to smaller systems Challenge big data aggregate & sacrifice small data visualize & interact
  • 106.
    Acknowledgement • DataScientists & Engineers @Twitter — Linus Lee, Chuang Liu • Feedback from reviewers, Ben Shneiderman & Catherine Plaisant
  • 107.
    Conclusions & Futurework • Large-scale User Activity Logs + Visual Analytics • Find, Monitor & Explore + Anomaly detection & automatic alert • Funnel Analysis + More interactivity & data / reduce wait time / latency study? • Used in day-to-day operations at Twitter • Generalize to smaller systems Challenge big data aggregate & sacrifice small data visualize & interact kristw@twitter.com / @kristw
  • 108.
  • 109.