Advertisement
Advertisement

More Related Content

Similar to What to expect when you are visualizing(20)

Advertisement

More from Krist Wongsuphasawat(20)

Recently uploaded(20)

Advertisement

What to expect when you are visualizing

  1. WHAT TO EXPECT WHEN YOU ARE VISUALIZING Krist Wongsuphasawat / @kristw Based on true stories Forever querying Never-ending cleaning Hopelessly prototyping Last minute coding and many more…
  2. Computer Engineer Bangkok, Thailand PhD in Computer Science Information Visualization Univ. of Maryland IBM Microsoft Data Visualization Scientist Twitter Krist Wongsuphasawat / @kristw
  3. VISUALIZE DATA
  4. INPUT (DATA) =YOU+ OUTPUT (VIS)
  5. EXPECT THE MISMATCHES
  6. INPUT (DATA) What clients think they have
  7. INPUT (DATA) What clients think they have What they usually have
  8. YOU What clients think you are
  9. YOU What clients think you are What they will get
  10. OUTPUT (VIS) What clients ask for
  11. OUTPUT (VIS) What clients ask for What they really need
  12. I need this. Take this.
  13. I need this. Here you are. I need this. Take this.
  14. EXPECT THESE TASKS
  15. INPUT (DATA) =YOU+ OUTPUT (VIS)
  16. INPUT (DATA) =YOU+ OUTPUT (VIS) + Get data & Wrangle 1 + Analyze & Visualize 2
  17. GET DATA & WRANGLE1
  18. DATA SOURCES Open data Publicly available Internal data Private, owned by clients’ organization Self-collected data Manual, site scraping, etc. Combine the above
  19. MANY FORMS OF DATA Standalone files txt, csv, tsv, json, Google Docs, …, pdf* APIs better quality with more overhead Databases doesn’t necessary mean they are organized Big data bigger pain
  20. HAVING ALL TWEETS How people think I feel.
  21. How people think I feel. How I really feel. HAVING ALL TWEETS
  22. CHALLENGES Get relevant Tweets hashtag: #oscars keywords: “spotlight” (movie name) Too big Need to aggregate & reduce size Slow Long processing time (hours)
  23. Hadoop Cluster GETTING BIG DATA Data Storage
  24. Pig / Scalding (slow) GETTING BIG DATA Hadoop Cluster Data Storage Tool
  25. Hadoop Cluster Pig / Scalding (slow) GETTING BIG DATA Data Storage Tool
  26. Pig / Scalding (slow) GETTING BIG DATA Hadoop Cluster Data Storage Tool Your laptop Smaller dataset
  27. Hadoop Cluster Pig / Scalding (slow) Data Storage Tool Final dataset Tool node.js / python / excel (fast) Your laptop GETTING BIG DATA Smaller dataset
  28. EXPECT TO WAIT FOR (BIG) DATA
  29. DATA WRANGLING Clean A clean dataset? Joking, right? Filter Less is more Parse, Format, Correct, etc. Change country code from 3-letter to 2-letter Correct time of day based on users’ timezone etc.
  30. EXPECT A LOT OF TIME WITH DATA WRANGLING 70-80% of time “Data Janitor”
  31. RECOMMENDATIONS Always think that you will have to do it again document the process, automation Reusable scripts break a gigantic do-it-all function into smaller ones Reusable data keep for future project
  32. ANALYZE & VISUALIZE2
  33. EXPECT DIFFERENT REQUIREMENTS
  34. TYPE OF PROJECTS Explanatory Exploratory Storytelling Analytics Tools Inspirations x x PMs, Data ScientistsGeneral Public General Public Understand product usage See what data can tell us Get inspired
  35. TYPE OF PROJECTS Explanatory Exploratory Storytelling Analytics Tools Inspirations x x PMs, Data ScientistsGeneral Public General Public Understand product usage See what data can tell us Get inspired
  36. So many things we could learn from Twitter data
  37. Give us interesting vis about xxxx by Nov 10
  38. STORYTELLING : WHAT TO EXPECT timely Deadline is strict. Also can be unexpected events. wide audience easy to explain and understand, multi-device support one-off projects content screening
  39. WHO/WHAT STORYTELLING WHERE WHEN location time user/content
  40. WHO/WHAT STORYTELLING WHERE WHEN location time user/content
  41. TIME : TWEETS/SECOND by Miguel Rios
  42. TIME : TWEETS/SECOND by Miguel Rios
  43. TIME : TWEETS/SECOND + ANNOTATION http://www.flickr.com/photos/twitteroffice/5681263084/ by Miguel Rios
  44. IT DOESN’T HAVE TO BE COMPLEX.
  45. WHO/WHAT STORYTELLING WHERE WHEN location time user/content
  46. LOCATION Low density High density by Miguel Rios
  47. LOCATION flickr.com/photos/twitteroffice/8798020541 San Francisco Low density High density by Miguel Rios
  48. Rebuild the world based on tweet density twitter.github.io/interactive/andes/ by Nicolas Garcia Belmonte
  49. WHO/WHAT STORYTELLING WHERE WHEN location time user/content
  50. CONTENT : US ELECTION 2016
  51. CONTENT : #MUSEUMWEEK
  52. CONTENT : #MUSEUMWEEK
  53. WHO/WHAT STORYTELLING WHERE WHEN location time user/content
  54. TIME + LOCATION : TWEET TIME BY CITY Night Late night Daytime Night Late night Daytime by Miguel Rios & Jimmy Lin
  55. Night Late night Daytime Night Late night Daytime TIME + LOCATION : TWEET TIME BY CITY by Miguel Rios & Jimmy Lin
  56. Night Late night Daytime Night Late night Daytime TIME + LOCATION : TWEET TIME BY CITY by Miguel Rios & Jimmy Lin
  57. TIME + LOCATION : TWEET TIME BY CITY Night Late night Daytime Night Late night Daytime by Miguel Rios & Jimmy Lin
  58. WHO/WHAT STORYTELLING WHERE WHEN location time user/content
  59. CONTENT + LOCATION : TWEET MAP by Robert Harris
  60. CONTENT + LOCATION : TWEET MAP by Robert Harris
  61. most frequent term CONTENT + LOCATION : TWEET MAP by Robert Harris
  62. Gmail was down Jan 24, 2014 CONTENT + LOCATION : TWEET MAP by Robert Harris
  63. USER + LOCATION : FAN MAP interactive.twitter.com/nfl_followers2014
  64. USER + LOCATION : FAN MAP interactive.twitter.com/nba_followers
  65. USER + LOCATION : FAN MAP interactive.twitter.com/premierleague
  66. WHO/WHAT STORYTELLING WHERE WHEN location time user/content
  67. CONTENT + TIME : STREAMGRAPH
  68. CONTENT + TIME : MATCH SUMMARY Biggest tournament for European soccer clubs
  69. CONTENT + TIME : MATCH SUMMARY Count Tweets mentioning the teams every minute Dortmund Bayern Munich Team 1 Team 2 time begin end
  70. CONTENT + TIME : MATCH SUMMARY
  71. CONTENT + TIME : MATCH SUMMARY + goals
  72. CONTENT + TIME : MATCH SUMMARY + goals + players
  73. CONTENT + TIME : COMPETITION SUMMARY A B C D A C C vs vs vs + = uclfinal.twitter.com
  74. WHO/WHAT STORYTELLING WHERE WHEN location time user/content
  75. CONTENT + TIME + LOCATION : NEW YEAR 2014 twitter.github.io/interactive/newyear2014/
  76. BEHIND THE SCENE
  77. https://interactive.twitter.com/tenyears Project / Twitter 10 years
  78. REQUEST
  79. EXPECT FUNNY REQUESTS
  80. DESIGN & PROTOTYPE Engagements First Minute First Hour First Day First Week 0 24h 0 7d0 60s 0 60m
  81. EXPECT REVISIONS
  82. Visualization is an important piece, but not the entire experience. DON’T FORGET THE BIG PICTURE.
  83. https://interactive.twitter.com/tenyears Demo / Twitter 10 years
  84. WORKFLOW Requested / Identify needs Design & Prototype Refine Mobile, Embed Logging Release
  85. EXPECT THE UNEXPECTED
  86. WORKFLOW Requested / Identify needs Design & Prototype Refine Mobile, Embed Logging Translations Release
  87. TYPE OF PROJECTS Explanatory Exploratory Storytelling Analytics Tools Inspirations x x PMs, Data ScientistsGeneral Public General Public Understand product usage See what data can tell us Get inspired
  88. Data sources Output explore analyze present get * *
  89. Data sources Output explore analyze present get * * ad-hoc scripts
  90. Data sources Output explore analyze present get * * ad-hoc scripts tools for exploration
  91. ANALYTICS TOOLS : WHAT TO EXPECT richer, more features to support exploration of complex data more technical audience product managers, engineers, data scientists accuracy designed for dynamic input long-term projects
  92. USER ACTIVITY LOGS
  93. UsersUseTwitter
  94. UsersUse Product Managers Curious Twitter
  95. UsersUse Curious Engineers Log data in Hadoop Write Twitter Instrument Product Managers
  96. WHAT ARE BEING LOGGED? tweet activities
  97. WHAT ARE BEING LOGGED? tweet from home timeline on twitter.com tweet from search page on iPhone activities
  98. WHAT ARE BEING LOGGED? tweet from home timeline on twitter.com tweet from search page on iPhone sign up log in retweet etc. activities
  99. ORGANIZE?
  100. LOG EVENT A.K.A. “CLIENT EVENT” [Lee et al. 2012]
  101. LOG EVENT A.K.A. “CLIENT EVENT” client : page : section : component : element : action web : home : timeline : tweet_box : button : tweet 1) User ID 2) Timestamp 3) Event name 4) Event detail [Lee et al. 2012]
  102. LOG DATA
  103. UsersUse Curious Engineers Log data in Hadoop Twitter Instrument Write Product Managers bigger than Tweet data
  104. UsersUse Curious Engineers Log data in Hadoop Data Scientists Ask Twitter Instrument Write Product Managers
  105. UsersUse Curious Engineers Log data in Hadoop Data Scientists Find Ask Twitter Instrument Write Product Managers
  106. LOG DATA
  107. UsersUse Curious Engineers Log data in Hadoop Data Scientists Find, Clean Ask Twitter Instrument Write Product Managers
  108. UsersUse Curious Engineers Log data in Hadoop Data Scientists Find, Clean Ask Monitor Twitter Instrument Write Product Managers
  109. UsersUse Curious Engineers Log data in Hadoop Data Scientists Find, Clean, Analyze Ask Monitor Twitter Instrument Write Product Managers
  110. Log data EngineersData Scientists Usersin Hadoop Find, Clean, Analyze Use Monitor Ask Curious 1 2 Twitter Instrument Write Product Managers
  111. Scribe Radar Project / Find & Monitor client events
  112. Log data in Hadoop Engineers & Data Scientists billions of rows
  113. Log data in Hadoop Aggregate Client events count Engineers & Data Scientists
  114. Log data in Hadoop Aggregate Find client page section component element action Search Client events count Engineers & Data Scientists
  115. Log data in Hadoop Aggregate Find client page section component element action Search Client events count Engineers & Data Scientists
  116. SECTION? COMPONENT? ELEMENT?
  117. client page section component element action Search Find Log data in Hadoop Aggregate web home * * impression* Client events count Engineers & Data Scientists
  118. client page section component element action Search Find Query Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression Aggregate web home * * impression* Client events count Engineers & Data Scientists
  119. client page section component element action Search Find Query Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression Aggregate search can be better Client events count Engineers & Data Scientists
  120. client page section component element action Search Find Query Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression Aggregate 10,000+ event types search can be better Client events count Engineers & Data Scientists
  121. client page section component element action Search Find Query Return Log data in Hadoop Results web : home : home : - : - : impression web : home : wtf : - : - : impression Aggregate search can be better 10,000+ event types not everybody knows What are all sections under web:home? Client events count Engineers & Data Scientists
  122. client page section component element action Search Find Query Return Log data in Hadoop Results web : home : home : - : - : impression Aggregate one graph / event 10,000+ event types not everybody knows What are all sections under web:home? Client events count Engineers & Data Scientists search can be better
  123. client page section component element action Search Find Query Return Log data in Hadoop Results web : home : home : - : - : impression Aggregate one graph / event x 10,000 10,000+ event types not everybody knows What are all sections under web:home? Client events count Engineers & Data Scientists search can be better
  124. GOALS Search for client events Explore client event collection Monitor changes
  125. DESIGN
  126. Client event collection Engineers & Data Scientists
  127. See Client event collection Engineers & Data Scientists
  128. See Client event collection Engineers & Data Scientists narrow down Interactions search box => filter
  129. See HOW TO VISUALIZE? narrow down Client event collection Engineers & Data Scientists Interactions search box => filter
  130. See Client event collection Engineers & Data Scientists client : page : section : component : element : action HOW TO VISUALIZE? narrow down Interactions search box => filter
  131. CLIENT EVENT HIERARCHY iphone home - - - impression tweet tweet click iphone:home:-:-:-:impression iphone:home:-:tweet:tweet:click
  132. DETECT CHANGES iphone home - - - impression tweet tweet click iphone home - - - impression tweet tweet click TODAY 7 DAYS AGO compared to
  133. CALCULATE CHANGES +5% +5% +5% +10% +10% +10% -5% -5% -5% DIFF
  134. DISPLAY CHANGES iphone home - - - impression tweet tweet click Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]
  135. DISPLAY CHANGES home - - - impression tweet tweet click iphone
  136. Demo Demo Demo Demo / Scribe Radar
  137. Twitter for Banana
  138. Flying Sessions Project / Funnel Analysis
  139. COUNT PAGE VISITS banana : home : - : - : - : impression home page
  140. FUNNEL home page profile page
  141. FUNNEL ANALYSIS 1 jobhome page profile page 1 hourbanana : home : - : - : - : impression banana : profile : - : - : - : impression
  142. FUNNEL ANALYSIS banana : home : - : - : - : impression banana : profile : - : - : - : impression banana : search : - : - : - : impression home page profile page search page 2 jobs 2 hours
  143. FUNNEL ANALYSIS banana : home : - : - : - : impression banana : profile : - : - : - : impression banana : search : - : - : - : impression home page profile page search page Specify all funnels manually! n jobs Time to find a new job
  144. GOAL banana : home : - : - : - : impression … …… 1 job => all funnels, visualized home page
  145. USER SESSIONS Session#1 A B end Session#4 Start end A Session#2 B end A Session#3 C end A StartStartStart
  146. AGGREGATE A BB C Start end endend A A end A 4 sessions
  147. AGGREGATE A BB C Start end endend end 4 sessions
  148. AGGREGATE C Start end endend end A B 4 sessions
  149. AGGREGATE C Start end endend end A B 4 sessions
  150. AGGREGATE C Start end endend A B end 4 sessions
  151. AGGREGATE C Start endend A B end 4 sessions
  152. AGGREGATE C Start endend A B end 4 sessions
  153. AGGREGATE Start endend A CB end 4 sessions
  154. AGGREGATE endend A CB end Start 4,000,000 sessions
  155. (~millions sessions, 10,000+ event types) TRY WITH SAMPLE DATA
  156. FAIL…
  157. Keep trying to make it work EXPECT TRIALS AND ERRORS
  158. Read the details in Krist Wongsuphasawat and Jimmy Lin. “Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging Infrastructure at Twitter “ Proc. IEEE Conference on Visual Analytics Science and Technology (VAST) 2014 HOW TO MAKE IT WORK?
  159. Demo Demo Demo Demo / Flying Sessions
  160. WORKFLOW Requested / Identify needs Design & Prototype Make it work for sample dataset Refine & Generalize Productionize Document & Release Maintain & Support Keep it running, Feature requests & Bugs fix
  161. TYPE OF PROJECTS Explanatory Exploratory Storytelling Analytics Tools Inspirations x x PMs, Data ScientistsGeneral Public General Public Understand product usage See what data can tell us Get inspired
  162. https://medium.com/@kristw/designing-the-game-of-tweets-7f87c30dc5a2 Project / Game of Tweets
  163. EXPECT HARDWARE COMPLICATIONS
  164. INPUT (DATA) =YOU+ OUTPUT (VIS) + Get data & Wrangle 1 + Analyze & Visualize 2
  165. INPUT (DATA) =YOU+ OUTPUT (VIS) + Get data & Wrangle 1 + Analyze & Visualize 2
  166. EXPECT TO IMPROVE
  167. HOW TO BE BETTER? Time is limited.
 Grow the team Expand skills Improve tooling Solve a problem once and for all Automate repetitive tasks
  168. http://twitter.github.io/labella.js Demo / Labella.js
  169. https://github.com/twitter/d3kit Demo / d3Kit http://www.slideshare.net/kristw/d3kit
  170. yeoman.io Demo / Yeoman
  171. TO SUM UP
  172. INPUT (DATA) =YOU+ OUTPUT (VIS) + Get data & Wrangle 1 + Analyze & Visualize 2
  173. TYPE OF PROJECTS Explanatory Exploratory Storytelling Analytics Tools Inspirations x x PMs, Data ScientistsGeneral Public General Public Understand product usage See what data can tell us Get inspired
  174. TAKE-AWAY Getting data and data wrangling are time-consuming. Different projects, different requirements Storytelling, Product insights, Art, etc. Combine visualization with other skills HCI, Design, Stats, ML, etc. Expect the unexpected Learn and improve do more with less time grow the team, expand skills, improve tooling Krist Wongsuphasawat / @kristw kristw.yellowpigz.com
  175. Nicolas Garcia Belmonte, Robert Harris, Miguel Rios, Simon Rogers, Jimmy Lin, Linus Lee, Chuang Liu, and many colleagues at Twitter. Lastly, to my wife for taking care of our 3 months old baby, so I had time to prepare these slides. ACKNOWLEDGEMENT
  176. RESOURCES Images Banana phone http://goo.gl/GmcMPq Bar chart https://goo.gl/1G1GBg Boss https://goo.gl/gcY8Kw Champions League http://goo.gl/DjtNKE Database http://goo.gl/5N7zZz Fishing shark http://goo.gl/2fp4zW Globe visualization http://goo.gl/UiGMMj Harry Potter http://goo.gl/Q9Cy64 Holding phone http://goo.gl/It2TzH Kiwi orange http://goo.gl/ejQ73y Kiwi http://goo.gl/9yk7o5 Library https://goo.gl/HVeE6h Library earthquake http://goo.gl/rBqBrs Minion http://goo.gl/I19Ijg NBA http://goo.gl/p7HBdG NFL http://goo.gl/feQMZs Orange & Apple http://goo.gl/NG6RIL Pile of paper http://goo.gl/mGLQTx Premier League http://goo.gl/AqIINO Scrooge McDuck https://goo.gl/aKv8D7 The Sound of Music https://goo.gl/dqHlzj Trash pile http://goo.gl/OsFfo3 Tyrion http://goo.gl/WaBonl Watercolor Map by Stamen Design
  177. THANK YOU
Advertisement