Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Visualizations to Monitor Changes and Harvest Insights from a Global-scale Logging Infrastructure at Twitter

7,437 views

Published on

Slides from my talk at the IEEE Conference on Visual Analytics Science and Technology (VAST) 2014 in Paris, France.

ABSTRACT
Logging user activities is essential to data analysis for internet products and services.
Twitter has built a unified logging infrastructure that captures user activities across all clients it owns, making it one of the largest datasets in the organization.
This paper describes challenges and opportunities in applying information visualization to log analysis at this massive scale, and shows how various visualization techniques can be adapted to help data scientists extract insights.
In particular, we focus on two scenarios:\ (1) monitoring and exploring a large collection of log events, and (2) performing visual funnel analysis on log data with tens of thousands of event types.
Two interactive visualizations were developed for these purposes:
we discuss design choices and the implementation of these systems, along with case studies of how they are being used in day-to-day operations at Twitter.

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Using Visualizations to Monitor Changes and Harvest Insights from a Global-scale Logging Infrastructure at Twitter

  1. 1. Using visualizations to monitor changes and harvest insights from log data at Twitter Krist Wongsuphasawat & Jimmy Lin @kristw @lintool
  2. 2. Logging user activities & data analysis
  3. 3. Twitter Use Users
  4. 4. Use Users Curious Twitter Product Managers
  5. 5. Use Users Curious Log data in Hadoop Write Twitter Instrument Engineers Product Managers
  6. 6. What are being logged? activities tweet
  7. 7. What are being logged? activities tweet from home timeline on twitter.com tweet from search page on iPhone
  8. 8. What are being logged? activities tweet from home timeline on twitter.com tweet from search page on iPhone sign up log in retweet etc.
  9. 9. Organize?
  10. 10. log event a.k.a. “client event” [Lee et al. 2012]
  11. 11. log event a.k.a. “client event” 1) User ID 2) Timestamp 3) Event name client : page : section : component : element : action web : home : timeline : tweet_box : button : tweet 4) Event detail [Lee et al. 2012]
  12. 12. Log data
  13. 13. Use Users Curious Log data in Hadoop Twitter Instrument Engineers Write Product Managers bigger than Tweet data
  14. 14. Use Users Curious Engineers Log data in Hadoop Data Scientists Ask Twitter Instrument Write Product Managers
  15. 15. Use Users Curious Engineers Log data in Hadoop Find Data Scientists Ask Twitter Instrument Write Product Managers
  16. 16. Log data
  17. 17. Use Users Curious Engineers Log data in Hadoop Find, Clean Data Scientists Ask Twitter Instrument Write Product Managers
  18. 18. Use Users Curious Engineers Log data in Hadoop Find, Clean Data Scientists Monitor Ask Twitter Instrument Write Product Managers
  19. 19. Use Users Curious Engineers Log data in Hadoop Find, Clean, Analyze Data Scientists Monitor Ask Twitter Instrument Write Product Managers
  20. 20. Log data Users in Hadoop Find, Clean, Analyze Data Scientists Engineers Use Monitor Ask Curious 1 2 Twitter Instrument Write Product Managers
  21. 21. Part I Find & Monitor Client Events
  22. 22. Motivation
  23. 23. Log data in Hadoop Engineers & Data Scientists billions of rows
  24. 24. Log data in Hadoop Aggregate Client event collection 10,000+ event types date client page section comp. elem. action count 20141011 web home home - - impression 100 20141011 web home wtf - - click 20 Engineers & Data Scientists
  25. 25. Log data in Hadoop Aggregate Client event collection 10,000+ event types date client page section comp. elem. action count 20141011 web home home - - impression 100 20141011 web home wtf - - click 20 (Who-to-Follow) Engineers & Data Scientists
  26. 26. Log data in Hadoop Aggregate Client event collection Engineers & Data Scientists
  27. 27. Log data in Hadoop Aggregate Client event collection client page section component element action Find Search Engineers & Data Scientists
  28. 28. Log data in Hadoop Aggregate Client event collection client page section component element action Find Search Engineers & Data Scientists
  29. 29. section? component? element?
  30. 30. Client event collection Search client page section component element action Find Log data in Hadoop Aggregate web home * * * impression Engineers & Data Scientists
  31. 31. Client event collection Search Query client page section component element action Find Aggregate Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression web home * * * impression Engineers & Data Scientists
  32. 32. Client event collection Search Query client page section component element action Find Aggregate Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression search can be better Engineers & Data Scientists
  33. 33. Client event collection Search Query client page section component element action Find Aggregate Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression 10,000+ event types search can be better Engineers & Data Scientists
  34. 34. Client event collection 10,000+ event types What are all sections under web:home? Search Query not everybody knows client page section component element action Find Aggregate Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression search can be better Engineers & Data Scientists
  35. 35. Client event collection Search Query client page section component element action Find Aggregate Return Log data in Hadoop Results web : home : home : - : - : impression search can be better one graph / event 10,000+ event types not everybody knows What are all sections under web:home? Engineers & Data Scientists
  36. 36. Client event collection Search Query client page section component element action Find Aggregate Return Log data in Hadoop Results web : home : home : - : - : impression search can be better one graph / event x 10,000 10,000+ event types not everybody knows What are all sections under web:home? Engineers & Data Scientists
  37. 37. ! Goals • Search for client events • Explore client event collection • Monitor changes
  38. 38. • Session analysis [Lam et al. 2007, Shen et al. 2013] ! Related work • Monitor network logs, not user activity logs [Ghoniem et al. 2013]
  39. 39. Design
  40. 40. Client event collection Engineers & Data Scientists
  41. 41. Client event collection See Engineers & Data Scientists
  42. 42. narrow down See Interactions search box => filter Client event collection Engineers & Data Scientists
  43. 43. See How to visualize? narrow down Client event collection Engineers & Data Scientists Interactions search box => filter
  44. 44. Interactions client : page : section : component : element : action search box => filter See How to visualize? narrow down Client event collection Engineers & Data Scientists
  45. 45. Client event hierarchy iphone:home:-:-:-:impression iphone:home:-:tweet:tweet:click iphone home - - - impression tweet tweet click
  46. 46. Detect changes iphone home - - - impression tweet tweet click iphone home - - - impression tweet tweet click TODAY 7 DAYS AGO compared to
  47. 47. Calculate changes +5% +5% +5% +10% +10% +10% -5% -5% -5% DIFF
  48. 48. Display changes iphone home - - - impression tweet tweet click Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]
  49. 49. Display changes home - - - impression tweet tweet click iphone
  50. 50. Demo Scribe Radar
  51. 51. Twitter for Banana
  52. 52. • Since Dec 2013 • 500 unique users, 10 users / day ! • No training Deployment
  53. 53. Use cases Users: PMs, Data Scientists, Engineers • Search • Monitor • See effects after major product launch read the paper :)
  54. 54. Part II Analysis
  55. 55. Count page visits home page banana : home : - : - : - : impression
  56. 56. Funnel home page profile page
  57. 57. Funnel analysis banana : home : - : - : - : impression banana : profile : - : - : - : impression home page 1 job profile page 1 hour
  58. 58. Funnel analysis home page banana : home : - : - : - : impression profile page search page 2 jobs 2 hours banana : profile : - : - : - : impression banana : search : - : - : - : impression
  59. 59. Funnel analysis home page banana : home : - : - : - : impression profile page search page banana : profile : - : - : - : impression banana : search : - : - : - : impression Specify all funnels manually! n jobs n hours
  60. 60. Goal home page banana : home : - : - : - : impression … … … 1 job => all funnels, visualized
  61. 61. • Visualize an overview of event sequences ! Related work [Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
  62. 62. Related work • Visualize an overview of event sequences [Wongsuphasawat et al. 2011, Monroe et al. 2013, …] ! • Big data? eBay checkout sequences [Shen et al. 2013] ! One funnel at a time Checkout > Payment > Confirm > Success
  63. 63. LifeFlow [CHI2011] ! (simplified)
  64. 64. User sessions Session#1 start A B end Session#4 start A end Session#2 start A B end Session#3 start A C end
  65. 65. Aggregate 4 sessions A start A A B B C end end end A end
  66. 66. Aggregate start A B B C end end end end 4 sessions
  67. 67. Aggregate C start end end end end A B 4 sessions
  68. 68. Aggregate C start end end end end A B 4 sessions
  69. 69. Aggregate 4 sessions B C end start A end end end
  70. 70. Aggregate 4 sessions B C end start A end end
  71. 71. Aggregate 4 sessions B C end start A end end
  72. 72. Aggregate 4 sessions start A B C end end end
  73. 73. Aggregate 4,000,000 sessions start A B C end end end
  74. 74. try with sample data (~millions sessions, 10,000+ event types) ! original paper (100,000 sessions, ~10 event types)
  75. 75. not meaningful ! small slice of data but huge file
  76. 76. How to make it work?
  77. 77. # of unique sequences
  78. 78. Reduce # of unique sequences 1. Reduce event types
  79. 79. Reduce # of unique sequences 1. Reduce event types 10,000 types select tweet sign up log out
  80. 80. Reduce # of unique sequences 1. Reduce event types 10,000 types select tweet sign up log out
  81. 81. Reduce # of unique sequences 1. Reduce event types 10,000 types select merge tweet from home timeline tweet from search page tweet … = tweet
  82. 82. Reduce # of unique sequences 1. Reduce event types 2. Reduce sequence length
  83. 83. Reduce # of unique sequences 1. Reduce event types 2. Reduce sequence length session 1000 events
  84. 84. Reduce # of unique sequences 1. Reduce event types 2. Reduce sequence length session 10 events after (window size & direction) 1000 events visit home page (alignment)
  85. 85. Reduce # of unique sequences 1. Reduce event types 2. Reduce sequence length } Ask users for input
  86. 86. Reduce # of unique sequences } Ask users for input 1. Reduce event types 2. Reduce sequence length 3. More aggregation on Hadoop
  87. 87. Collapse events Sequence ABBBCCCC ABBCC ABC ABCCCC e.g. ABCD ABCCCD ABCCE ABCDF ABCDG ABCDH tweet, tweet, tweet, … = tweet
  88. 88. Collapse events Sequence ABC ABC ABC ABC ABCD ABCD ABCE ABCDF ABCDG ABCDH
  89. 89. Group & Count Sequence Count ABC 2000 ABCD 80 ABCE 20 ABCDF 1 ABCDG 1 ABCDH 1 … …
  90. 90. Group & Count Sequence Count ABC 2000 ABCD 80 ABCE 20 ABCDF 1 ABCDG 1 ABCDH 1 ABCDI 1 ABCDJK 1 ABCDJL 1 rare sequences (count < threshold)
  91. 91. Truncate Sequence ABC ABCD ABCE ABCDx ABCDx ABCDx ABCDx ABCDJx ABCDJx Count 2000 80 20 1 1 1 1 1 1 Replace last event with x (…)
  92. 92. Group & Count Sequence ABC ABCD ABCE ABCDx ABCDJx Count 2000 80 20 4 2
  93. 93. Truncate more Sequence ABC ABCD ABCE ABCDx ABCDx Count 2000 80 20 4 2
  94. 94. Group & Count Sequence Count ABC 2000 ABCD 80 ABCE 20 ABCDx 6
  95. 95. Final process 1. Define set of events 2. Pick alignment, direction and window size 3. Run Hadoop job (with more aggregation) 4. Wait for it… (2+ hrs) 5. Visualize gazillion patterns (TBs) ~100,000 patterns (10MB)
  96. 96. Demo Flying Sessions
  97. 97. Deployment • Since Jan 2013 • Fewer users, but more in-depth ad-hoc analysis • Initial meeting to provide support
  98. 98. Case studies • What did users do when they visit Twitter? (in demo) • Where did users give up in the sign up process? • more in the paper
  99. 99. Case studies click on “sign up” fill personal info import address book etc. • What did users do when they visit Twitter? (in demo) • Where did users give up in the sign up process? • more in the paper
  100. 100. Case studies • What did users do when they visit Twitter? (in demo) • Where did users give up in the sign up process? • more in the paper read the paper :)
  101. 101. Conclusions & Future work • Large-scale User Activity Logs + Visual Analytics
  102. 102. Conclusions & Future work • Large-scale User Activity Logs + Visual Analytics • Find, Monitor & Explore + Anomaly detection & automatic alert • Funnel Analysis + More interactivity & data / reduce wait time / latency study? • Used in day-to-day operations at Twitter
  103. 103. Conclusions & Future work Challenge big data small data visualize & interact • Large-scale User Activity Logs + Visual Analytics • Find, Monitor & Explore + Anomaly detection & automatic alert • Funnel Analysis + More interactivity & data / reduce wait time / latency study? • Used in day-to-day operations at Twitter aggregate & sacrifice
  104. 104. Conclusions & Future work • Large-scale User Activity Logs + Visual Analytics • Find, Monitor & Explore + Anomaly detection & automatic alert • Funnel Analysis + More interactivity & data / reduce wait time / latency study? • Used in day-to-day operations at Twitter • Generalize to smaller systems Challenge big data aggregate & sacrifice small data visualize & interact
  105. 105. Acknowledgement • Data Scientists & Engineers @Twitter — Linus Lee, Chuang Liu • Feedback from reviewers, Ben Shneiderman & Catherine Plaisant
  106. 106. Conclusions & Future work • Large-scale User Activity Logs + Visual Analytics • Find, Monitor & Explore + Anomaly detection & automatic alert • Funnel Analysis + More interactivity & data / reduce wait time / latency study? • Used in day-to-day operations at Twitter • Generalize to smaller systems Challenge big data aggregate & sacrifice small data visualize & interact kristw@twitter.com / @kristw
  107. 107. Questions?
  108. 108. Thank you

×