Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Logs & Visualizations at Twitter

1,421 views

Published on

Published in: Data & Analytics
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Logs & Visualizations at Twitter

  1. 1. Krist Wongsuphasawat /@kristw visualization at Twitter logs &
  2. 2. Krist Wongsuphasawat /@kristw
  3. 3. Krist Wongsuphasawat /@kristw Computer Engineer Bangkok, Thailand Chulalongkorn University
  4. 4. Krist Wongsuphasawat /@kristw Computer Engineer Bangkok, Thailand Programming + Soccer
  5. 5. Krist Wongsuphasawat /@kristw Computer Engineer Bangkok, Thailand Programming + Soccer
  6. 6. Krist Wongsuphasawat /@kristw Computer Engineer Bangkok, Thailand Programming + Soccer
  7. 7. Krist Wongsuphasawat /@kristw Computer Engineer Bangkok, Thailand M.S. in Computer Science Univ. of Maryland
  8. 8. Krist Wongsuphasawat /@kristw Computer Engineer Bangkok, Thailand PhD in Computer Science Univ. of Maryland Information Visualization
  9. 9. Krist Wongsuphasawat /@kristw Computer Engineer Bangkok, Thailand PhD in Computer Science Univ. of Maryland Information Visualization IBM Microsoft
  10. 10. Krist Wongsuphasawat /@kristw Computer Engineer Bangkok, Thailand PhD in Computer Science Univ. of Maryland Information Visualization IBM Microsoft Data Visualization Scientist Twitter
  11. 11. internal tools public-facing
  12. 12. public-facing
  13. 13. interactive.twitter.com
  14. 14. internal tools
  15. 15. Krist Wongsuphasawat & Jimmy Lin @kristw Using visualizations to monitor changes and harvest insights from log data at Twitter @lintool IEEE VAST 2014
  16. 16. Logging user activities & data analysis
  17. 17. UsersUseTwitter
  18. 18. UsersUse Product Managers Curious Twitter
  19. 19. UsersUse Curious Engineers Log data in Hadoop Write Twitter Instrument Product Managers
  20. 20. What are being logged? tweet activities
  21. 21. What are being logged? tweet from home timeline on twitter.com tweet from search page on iPhone activities
  22. 22. What are being logged? tweet from home timeline on twitter.com tweet from search page on iPhone sign up log in retweet etc. activities
  23. 23. Organize?
  24. 24. log event a.k.a. “client event” [Lee et al. 2012]
  25. 25. log event a.k.a. “client event” client : page : section : component : element : action web : home : timeline : tweet_box : button : tweet 1) User ID 2) Timestamp 3) Event name 4) Event detail [Lee et al. 2012]
  26. 26. Log data
  27. 27. UsersUse Curious Engineers Log data in Hadoop Twitter Instrument Write Product Managers bigger than Tweet data
  28. 28. UsersUse Curious Engineers Log data in Hadoop Data Scientists Ask Twitter Instrument Write Product Managers
  29. 29. UsersUse Curious Engineers Log data in Hadoop Data Scientists Find Ask Twitter Instrument Write Product Managers
  30. 30. Log data
  31. 31. UsersUse Curious Engineers Log data in Hadoop Data Scientists Find, Clean Ask Twitter Instrument Write Product Managers
  32. 32. UsersUse Curious Engineers Log data in Hadoop Data Scientists Find, Clean Ask Monitor Twitter Instrument Write Product Managers
  33. 33. UsersUse Curious Engineers Log data in Hadoop Data Scientists Find, Clean, Analyze Ask Monitor Twitter Instrument Write Product Managers
  34. 34. Log data EngineersData Scientists Usersin Hadoop Find, Clean, Analyze Use Monitor Ask Curious 1 2 Twitter Instrument Write Product Managers
  35. 35. Part I Find & Monitor Client Events
  36. 36. Motivation
  37. 37. Log data in Hadoop Engineers & Data Scientists billions of rows
  38. 38. Log data in Hadoop Aggregate 10,000+ event types date client page section comp. elem. action count 20141011 web home home - - impression 100 20141011 web home wtf - - click 20 Engineers & Data Scientists Client event collection
  39. 39. Log data in Hadoop Aggregate 10,000+ event types date client page section comp. elem. action count 20141011 web home home - - impression 100 20141011 web home wtf - - click 20 Engineers & Data Scientists Client event collection (Who-to-Follow)
  40. 40. Log data in Hadoop Aggregate Client event collection Engineers & Data Scientists
  41. 41. Log data in Hadoop Aggregate Find client page section component element action Search Client event collection Engineers & Data Scientists
  42. 42. Log data in Hadoop Aggregate Find client page section component element action Search Client event collection Engineers & Data Scientists
  43. 43. section? component? element?
  44. 44. client page section component element action Search Find Log data in Hadoop Aggregate web home * * impression* Client event collection Engineers & Data Scientists
  45. 45. client page section component element action Search Find Query Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression Aggregate web home * * impression* Client event collection Engineers & Data Scientists
  46. 46. client page section component element action Search Find Query Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression Aggregate search can be better Client event collection Engineers & Data Scientists
  47. 47. client page section component element action Search Find Query Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression Aggregate 10,000+ event types search can be better Client event collection Engineers & Data Scientists
  48. 48. client page section component element action Search Find Query Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression Aggregate search can be better 10,000+ event types not everybody knows What are all sections under web:home? Client event collection Engineers & Data Scientists
  49. 49. client page section component element action Search Find Query Return Log data in Hadoop Results web : home : home : - : - : impression Aggregate search can be better one graph / event 10,000+ event types not everybody knows What are all sections under web:home? Client event collection Engineers & Data Scientists
  50. 50. client page section component element action Search Find Query Return Log data in Hadoop Results web : home : home : - : - : impression Aggregate search can be better one graph / event x 10,000 10,000+ event types not everybody knows What are all sections under web:home? Client event collection Engineers & Data Scientists
  51. 51. ! • Search for client events • Explore client event collection • Monitor changes Goals
  52. 52. • Session analysis ! • Monitor network logs, not user activity logs Related work [Lam et al. 2007, Shen et al. 2013] [Ghoniem et al. 2013]
  53. 53. Design
  54. 54. Client event collection Engineers & Data Scientists
  55. 55. See Client event collection Engineers & Data Scientists
  56. 56. See Interactions search box => filter Client event collection narrow down Engineers & Data Scientists
  57. 57. See How to visualize? narrow down Client event collection Engineers & Data Scientists Interactions search box => filter
  58. 58. See How to visualize? narrow down Client event collection Engineers & Data Scientists client : page : section : component : element : actionInteractions search box => filter
  59. 59. Client event hierarchy iphone home - - - impression tweet tweet click iphone:home:-:-:-:impression iphone:home:-:tweet:tweet:click
  60. 60. Detect changes iphone home - - - impression tweet tweet click iphone home - - - impression tweet tweet click TODAY 7 DAYS AGO compared to
  61. 61. Calculate changes +5% +5% +5% +10% +10% +10% -5% -5% -5% DIFF
  62. 62. Display changes iphone home - - - impression tweet tweet click Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]
  63. 63. Display changes home - - - impression tweet tweet click iphone
  64. 64. Demo Scribe Radar
  65. 65. Twitter for Banana
  66. 66. • Since Dec 2013 • 500 unique users, 10 users / day ! • No training Deployment
  67. 67. Users: PMs, Data Scientists, Engineers • Search • Monitor • See effects after major product launch Use cases more information in the paper
  68. 68. Part II Analysis
  69. 69. Count page visits banana : home : - : - : - : impression home page
  70. 70. Funnel home page profile page
  71. 71. Funnel analysis banana : home : - : - : - : impression banana : profile : - : - : - : impression 1 jobhome page profile page 1 hour
  72. 72. Funnel analysis banana : home : - : - : - : impression banana : profile : - : - : - : impression banana : search : - : - : - : impression home page profile page search page 2 jobs 2 hours
  73. 73. Funnel analysis banana : home : - : - : - : impression banana : profile : - : - : - : impression banana : search : - : - : - : impression home page profile page search page Specify all funnels manually! n jobs n hours
  74. 74. Goal banana : home : - : - : - : impression … …… 1 job => all funnels, visualized home page
  75. 75. • Visualize an overview of event sequences ! Related work [Wongsuphasawat et al. 2011, Monroe et al. 2013, …]
  76. 76. • Visualize an overview of event sequences ! • Big data? eBay checkout sequences ! One funnel at a time
 Checkout > Payment > Confirm > Success Related work [Wongsuphasawat et al. 2011, Monroe et al. 2013, …] [Shen et al. 2013]
  77. 77. LifeFlow [CHI2011] ! (simplified)
  78. 78. User sessions Session#1 A B start end Session#4 start end A Session#2 B start end A Session#3 C start end A
  79. 79. Aggregate 4 sessions A BB C start end endend A A end A
  80. 80. Aggregate A BB C start end endend end 4 sessions
  81. 81. Aggregate C start end endend end A B 4 sessions
  82. 82. Aggregate C start end endend end A B 4 sessions
  83. 83. Aggregate C start end endend A B end 4 sessions
  84. 84. Aggregate C start endend A B end 4 sessions
  85. 85. Aggregate C start endend A B end 4 sessions
  86. 86. Aggregate start endend A CB end 4 sessions
  87. 87. Aggregate 4,000,000 sessions endend A CB end start
  88. 88. try with “sample” data (~millions sessions, 10,000+ event types) ! original paper (100,000 sessions, ~10 event types)
  89. 89. not meaningful ! small slice of data but huge file
  90. 90. How to make it work?
  91. 91. # of unique sequences
  92. 92. 1. Reduce event types Reduce # of unique sequences
  93. 93. 1. Reduce event types Reduce # of unique sequences 10,000 types select tweet sign up log out
  94. 94. 1. Reduce event types Reduce # of unique sequences 10,000 types select tweet sign up log out
  95. 95. 1. Reduce event types Reduce # of unique sequences 10,000 types select merge tweet from home timeline tweet from search page tweet … = tweet
  96. 96. 1. Reduce event types 2. Reduce sequence length Reduce # of unique sequences
  97. 97. 1. Reduce event types 2. Reduce sequence length Reduce # of unique sequences session 1000 events
  98. 98. 1. Reduce event types 2. Reduce sequence length Reduce # of unique sequences session 10 events after (window size & direction) 1000 events visit home page (alignment)
  99. 99. 1. Reduce event types 2. Reduce sequence length Reduce # of unique sequences Ask users for input}
  100. 100. 1. Reduce event types 2. Reduce sequence length 3. More aggregation on Hadoop Reduce # of unique sequences Ask users for input}
  101. 101. Collapse events Sequence ABBBCCCC ABBCC ABC ABCCCC ABCD ABCCCD ABCCE ABCDF ABCDG ABCDH e.g. tweet, tweet, tweet, … = tweet
  102. 102. Sequence ABC ABC ABC ABC ABCD ABCD ABCE ABCDF ABCDG ABCDH Collapse events
  103. 103. Group & Count Sequence ABC ABCD ABCE ABCDF ABCDG ABCDH … Count 2000 80 20 1 1 1 …
  104. 104. Group & Count Sequence ABC ABCD ABCE ABCDF ABCDG ABCDH ABCDI ABCDJK ABCDJL Count 2000 80 20 1 1 1 1 1 1 rare sequences (count < threshold)
  105. 105. Truncate Sequence ABC ABCD ABCE ABCDx ABCDx ABCDx ABCDx ABCDJx ABCDJx Count 2000 80 20 1 1 1 1 1 1 Replace last event with x (…)
  106. 106. Sequence ABC ABCD ABCE ABCDx ABCDJx Count 2000 80 20 4 2 Group & Count
  107. 107. Truncate more Sequence ABC ABCD ABCE ABCDx ABCDx Count 2000 80 20 4 2
  108. 108. Group & Count Sequence ABC ABCD ABCE ABCDx Count 2000 80 20 6
  109. 109. 1. Define set of events 2. Pick alignment, direction and window size 3. Run Hadoop job (with more aggregation) 4. Wait for it… (2+ hrs) 5. Visualize Final process ~100,000 patterns (10MB) gazillion patterns (TBs)
  110. 110. Demo Flying Sessions
  111. 111. • Since Jan 2013 • Fewer users, but more in-depth ad-hoc analysis • Initial meeting to provide support Deployment
  112. 112. • What did users do when they visit Twitter? (in demo) • Where did users give up in the sign up process? • more in the paper Case studies
  113. 113. • Large-scale User Activity Logs + Visual Analytics Conclusions & Future work
  114. 114. • Large-scale User Activity Logs + Visual Analytics • Find, Monitor & Explore + Anomaly detection & automatic alert • Funnel Analysis + More interactivity & data / reduce wait time • Used in day-to-day operations at Twitter Conclusions & Future work
  115. 115. Conclusions & Future work Challenge big data small data visualize & interact • Large-scale User Activity Logs + Visual Analytics • Find, Monitor & Explore + Anomaly detection & automatic alert • Funnel Analysis + More interactivity & data / reduce wait time • Used in day-to-day operations at Twitter aggregate & sacrifice
  116. 116. • Large-scale User Activity Logs + Visual Analytics • Find, Monitor & Explore + Anomaly detection & automatic alert • Funnel Analysis + More interactivity & data / reduce wait time • Used in day-to-day operations at Twitter • Generalize to smaller systems Conclusions & Future work Challenge big data small data visualize & interact aggregate & sacrifice
  117. 117. • Data Scientists & Engineers @Twitter — Linus Lee, Chuang Liu • Feedback from reviewers, Ben Shneiderman & Catherine Plaisant Acknowledgement
  118. 118. • Large-scale User Activity Logs + Visual Analytics • Find, Monitor & Explore + Anomaly detection & automatic alert • Funnel Analysis + More interactivity & data / reduce wait time • Used in day-to-day operations at Twitter • Generalize to smaller systems Conclusions & Future work Challenge big data small data visualize & interact kristw@twitter.com / @kristw aggregate & sacrifice
  119. 119. One more thing …
  120. 120. Questions?
  121. 121. Thank you

×