Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

6 things to expect when you are visualizing

1,198 views

Published on

This talk was prepared as a note to my future self when working on future projects. I reflect on the tasks commonly involved in crafting visualizations, point out the common things to expect, pitfalls and provide recommendations. Along the way I include examples of 3 different applications of information/data visualization and details on how each project was started and developed.

These slides were from my guest lecture in InfoVis class at
(1) InfoVis class at UC Berkeley iSchool on Feb 27, 2017. Thank you Prof. Marti Hearst for the invitation.
(2) DataVis class at GATech on Apr 5, 2017. Thank you Prof. Rahul C. Basole for the invitation.

Published in: Data & Analytics
  • Hello! High Quality And Affordable Essays For You. Starting at $4.99 per page - Check our website! https://vk.cc/82gJD2
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

6 things to expect when you are visualizing

  1. 1. Krist Wongsuphasawat / @kristw 6 THINGS TO EXPECT WHEN YOU ARE VISUALIZING
  2. 2. 6 THINGS TO EXPECT WHEN YOU ARE VISUALIZING Krist Wongsuphasawat / @kristw
  3. 3. Computer Engineer Bangkok, Thailand Chulalongkorn University Krist Wongsuphasawat / @kristw
  4. 4. Programming + Soccer Computer Engineer Bangkok, Thailand Krist Wongsuphasawat / @kristw
  5. 5. Programming + Soccer Computer Engineer Bangkok, Thailand Krist Wongsuphasawat / @kristw
  6. 6. (P.S. These are actually not my robots, but our competitors’.) Krist Wongsuphasawat / @kristw Computer Engineer Bangkok, Thailand
  7. 7. Krist Wongsuphasawat / @kristw Computer Engineer Bangkok, Thailand PhD in Computer Science Information Visualization Univ. of Maryland
  8. 8. Krist Wongsuphasawat / @kristw Computer Engineer Bangkok, Thailand IBM Microsoft PhD in Computer Science Information Visualization Univ. of Maryland
  9. 9. PhD in Computer Science Information Visualization Univ. of Maryland IBM Microsoft Data Visualization Scientist Twitter Krist Wongsuphasawat / @kristw Computer Engineer Bangkok, Thailand
  10. 10. #interactive visualizations Open-source projects Visual Analytics Tools
  11. 11. DATA =ME+ VIS
  12. 12. Data, I’m ready!
  13. 13. Data, I’m ready! Here I come!
  14. 14. WHAT TO EXPECT?
  15. 15. 1. EXPECT TO FIND THE REAL NEED
  16. 16. INPUT (DATA) What clients think they have
  17. 17. INPUT (DATA) What clients think they have What they usually have
  18. 18. YOU What clients think you are
  19. 19. YOU What clients think you are What they will get
  20. 20. OUTPUT (VIS) What clients ask for
  21. 21. OUTPUT (VIS) What clients ask for What they really need
  22. 22. COMMUNICATE
  23. 23. GOALS Present data Communicate information effectively Analyze data Exploratory data analysis Tools to analyze data Reusable tools for exploration Enjoy Combination of above
  24. 24. GOALS Present data Communicate information effectively Analyze data Exploratory data analysis Tools to analyze data Reusable tools for exploration Enjoy Combination of above Who are the audience? What do you want to tell? What are the questions? Who will use this? What would they use this for? Who are the audience?
  25. 25. I need this. Take this.
  26. 26. I need this. Here you are. I need this. Take this.
  27. 27. & COMPROMISE
  28. 28. 2. EXPECT TO CLEAN DATA
  29. 29. 2. EXPECT TO CLEAN DATA A LOT
  30. 30. 70-80% of time cleaning data “DATA JANITOR”
  31. 31. Collect + Clean + Transform DATA WRANGLING
  32. 32. WHY DOES IT TAKE SO MUCH TIME?
  33. 33. 2.1 Many sources and data format
  34. 34. DATA SOURCES Open data Publicly available Internal data Private, owned by clients’ organization Self-collected data Manual, site scraping, etc. Combine the above
  35. 35. DATA FORMAT Standalone files txt, csv, tsv, json, Google Docs, …, pdf* Databases doesn’t necessary mean they are organized API better quality with more overhead Website Big data*
  36. 36. NEED TO… Change format e.g. tsv => json Combine data Resolve multiple sources of truth
  37. 37. 2.2 Data transformation is needed.
  38. 38. EXAMPLES Convert latitude/longitude into zip code Change country code from 3-letter (USA) to 2-letter (US) Correct time of day based on users’ timezone etc.
  39. 39. 2.3 Data collection issues
  40. 40. EXAMPLES Typos Incorrect values Incorrect timestamps Missing data
  41. 41. 2.4 Definition of “clean” data
  42. 42. IS THIS CLEAN? USER RESTAURANT RATING ======================== A MCDONALD’S 3 B MCDONALDS 3 C MCDONALD 4 D MCDONALDS 5 E IHOP 4 F SUBWAY 4
  43. 43. IS THIS CLEAN? USER RESTAURANT RATING ======================== A MCDONALD’S 3 B MCDONALDS 3 C MCDONALD 4 D MCDONALDS 5 E IHOP 4 F SUBWAY 4 How many reviews are there? Clean. How many restaurants are there? Not clean. McDonald, McDonald’s, McDonalds
  44. 44. 2.5 Bigger data, bigger problems
  45. 45. HAVING ALL TWEETS How people think I feel.
  46. 46. How people think I feel. How I really feel. HAVING ALL TWEETS
  47. 47. Hadoop Cluster GETTING BIG DATA Data Storage
  48. 48. Scalding (slow) GETTING BIG DATA Hadoop Cluster Data Storage Tool
  49. 49. Scalding (slow) GETTING BIG DATA Hadoop Cluster Data Storage Tool Your laptop Smaller dataset
  50. 50. Hadoop Cluster Scalding (slow) Data Storage Tool Final dataset Tool node.js / python / excel (fast) Your laptop GETTING BIG DATA Smaller dataset
  51. 51. CHALLENGES Slow Long processing time (hours) Get relevant Tweets hashtag: #oscars keywords: “moonlight” (movie name) Too big Need to aggregate & reduce size Harder to spot problems
  52. 52. RAMSAY & RAMSEY
  53. 53. 2.6 New issues can show up any time.
  54. 54. RECOMMENDATIONS Always think that you will have to do it again document the process, automation Reusable scripts break a gigantic do-it-all function into smaller ones Reusable data keep for future project
  55. 55. 3. EXPECT TRIALS AND ERRORS
  56. 56. It’s gonna be legen-
  57. 57. Celebrate your trials #D3BrokeAndMadeArt
  58. 58. When your vis starts working
  59. 59. “Necessity is the mother of invention.” — English Proverb
  60. 60. “Necessity is the mother of invention.” — English Proverb DEADLINE
  61. 61. EXAMPLE PROJECTS
  62. 62. PROJECT 1: GAME OF THRONES #INTERACTIVE
  63. 63. INTERACTIVE.TWITTER.COM
  64. 64. WHAT TO EXPECT timely Deadline is strict. Also can be unexpected events. wide audience easy to explain and understand, multi-device support one-off project scope analyze data to find stories and find best way to present them
  65. 65. from fans’ conversations Reveal the talking points of every episode of
  66. 66. Problem is coming. CHAPTER I
  67. 67. Problem Want to know what the audience talk about a TV show from Tweets
  68. 68. HBO’s Game of Thrones Based on a book series “A Song of Ice and Fire” Medieval Fantasy. Knights, magic and dragons.
  69. 69. Brief Story
  70. 70. A King dies.  A lot of contenders wage a war to reclaim the throne.
  71. 71. Minor characters with no claim to the throne set their own plans in action to gain power when all the major characters end up killing each other.
  72. 72. Brave/Honest/Honorable characters die. Intelligent but shady characters and characters who know nothing continue to live.
  73. 73. While humans are busy killing each other, ice zombies “White walkers” are invading from the North. The only group who seems to care about this is neutral group called the Night’s Watch.
  74. 74. HBO’s Game of Thrones Based on a book series “A Song of Ice and Fire” Medieval Fantasy. Knights, magic and dragons. Many characters. Anybody can die. 6 seasons (60 episodes) so far Multiple storylines in each episode
  75. 75. Problem Want to know what the audience talk about a TV show from Tweets
  76. 76. Ideas Common words Too much noise
  77. 77. Ideas Common words Too much noise Characters How o!en each character were mentioned?
  78. 78. I demand a trial by prototyping. CHAPTER II
  79. 79. Prototyping Pull sample data from Twitter API Entity recognition and counting naive approach
  80. 80. List of names Daenerys Targaryen,Khaleesi Jon Snow Sansa Stark Tyrion Lannister Arya Stark Cersei Lannister Khal Drogo Gregor Clegane,Mountain Margaery Tyrell Joffrey Baratheon Bran Stark Theon Greyjoy Jaime Lannister Brienne Eddard Stark,Ned Stark Ramsay Bolton Sandor Clegane,Hound Ygritte Stannis Baratheon Petyr Baelish,Little Finger Robb Stark Bronn Varys Catelyn Stark Oberyn Martell Daario Naharis Davos Seaworth Jorah Mormont Melisandre Myrcella Baratheon Tywin Lannister Tommen Baratheon Grey Worm Tyene Sand Rickon Stark Missandei Roose Bolton Robert Baratheon Jojen Reed Jeor Mormont Tormund Giantsbane Lysa Arryn Yara Greyjoy,Asha Greyjoy Samwell Tarly,Sam Hodor Victarion Greyjoy High Sparrow Dragon Winter Dothraki
  81. 81. Sample Tweet
  82. 82. Sample Tweet
  83. 83. Sample data Character Count Hodor 10000 Jon Snow 5000 Daenerys 4000 Bran Stark 3000 … … *These numbers are made up for presentation, not real data.
  84. 84. When you play the game of vis, you iterate or you die. CHAPTER III
  85. 85. Where to go from here?
  86. 86. + episodes The Guardian & Google Trends
 http://www.theguardian.com/news/datablog/ng-interactive/2016/apr/22/game-of-thrones-the-most-googled-characters-episode-by-episode
  87. 87. + emotion
  88. 88. + connections
  89. 89. + connections
  90. 90. Gain insights from a single episode emotion & connections
  91. 91. Sample data Character Count Jon Snow+Sansa 1000 Tormund+Brienne 500 Bran Stark+Hodor 300 … … Character Count Hodor 10000 Jon Snow 5000 Daenerys 4000 … … INDIVIDUALS CONNECTIONS + top emojis + top emojis *These numbers are made up for presentation, not real data.
  92. 92. Graph NODES LINKS + top emojis + top emojis Character Count Jon Snow+Sansa 1000 Tormund+Brienne 500 Bran Stark+Hodor 300 … … Character Count Hodor 1000 Jon Snow 500 Daenerys 400 … … *These numbers are made up for presentation, not real data.
  93. 93. Network Visualization Node-link diagram Force-directed layout http://blockbuilder.org/kristw/762b680690e4b2b2666dfec15838a384
  94. 94. Issue: Hairball
  95. 95. Issue: Occlusions
  96. 96. Tried: Fixed positions
  97. 97. + Collision Detection http://blockbuilder.org/kristw/2850f65d6329c5fef6d5c9118f1de6e6
  98. 98. + Community Detection https://github.com/upphiminn/jLouvain
  99. 99. + Collision Detection (with clusters) https://bl.ocks.org/mbostock/7881887
  100. 100. Tormund + Brienne
  101. 101. Issue: Convex hull http://bl.ocks.org/mbostock/4341699
  102. 102. x & y only, no radius
  103. 103. Example
  104. 104. Fix it
  105. 105. Fix it
  106. 106. Let’s get other episodes.
  107. 107. Hadoop remembers. CHAPTER IV
  108. 108. More data Hadoop Rewrite the scripts in Scalding to get archived data
  109. 109. How much data do we need? Whole week? 5 days? 2 days? A day? etc.
  110. 110. How much data do we need?
  111. 111. Transitions
  112. 112. Changing episode
  113. 113. A#er switching episode 1. Store old positions for existing characters. 2. Assign positions for new characters.
  114. 114. Community transition t=0 t=1
  115. 115. Smoother t=0 t=1t=0.5 t=0.51
  116. 116. Colors Default: D3 category10 Distinct but nothing about the context Custom palette Colors related to the groups/houses. Black = Night’s Watch Blue = North Red = Daenerys Gold = Lannister …
  117. 117. Hold the vis. CHAPTER V
  118. 118. The vis is not enough.
  119. 119. Legend
  120. 120. Navigation
  121. 121. Top 3
  122. 122. Adjust threshold
  123. 123. Recap
  124. 124. Filtered Recap Tooltip
  125. 125. Demo https://interactive.twitter.com/game-of-thrones
  126. 126. Mobile Support
  127. 127. A visualizer always evaluates his work. CHAPTER VI
  128. 128. Self & Peer Does it solve the problem?
  129. 129. Google Analytics Pageviews Visitors Actions Referrals Sites/Social
  130. 130. Feedback
  131. 131. Feedback
  132. 132. PROJECT 2: VISUAL ANALYTICS TOOLS FOR LOGGING
  133. 133. WHAT TO EXPECT richer, more features to support exploration of complex data more technical audience product managers, engineers, data scientists accuracy designed for dynamic input long-term projects
  134. 134. Data sources Output explore analyze present get * *
  135. 135. Data sources Output explore analyze present get * * ad-hoc scripts
  136. 136. Data sources Output explore analyze present get * * ad-hoc scripts tools for exploration
  137. 137. USER ACTIVITY LOGS
  138. 138. UsersUseTwitter
  139. 139. UsersUse Product Managers Curious Twitter
  140. 140. UsersUse Curious Engineers Log data in Hadoop Write Twitter Instrument Product Managers
  141. 141. WHAT ARE BEING LOGGED? tweet Activities
  142. 142. WHAT ARE BEING LOGGED? tweet from home timeline on twitter.com tweet from search page on iPhone Activities
  143. 143. WHAT ARE BEING LOGGED? tweet from home timeline on twitter.com tweet from search page on iPhone sign up log in retweet etc. Activities
  144. 144. ORGANIZE?
  145. 145. LOG EVENT A.K.A. “CLIENT EVENT” [Lee et al. 2012]
  146. 146. LOG EVENT A.K.A. “CLIENT EVENT” client : page : section : component : element : action web : home : timeline : tweet_box : button : tweet 1) User ID 2) Timestamp 3) Event name 4) Event detail [Lee et al. 2012]
  147. 147. LOG DATA
  148. 148. UsersUse Curious Engineers Log data in Hadoop Twitter Instrument Write Product Managers bigger than Tweet data
  149. 149. UsersUse Curious Engineers Log data in Hadoop Data Scientists Ask Twitter Instrument Write Product Managers
  150. 150. UsersUse Curious Engineers Log data in Hadoop Data Scientists Find Ask Twitter Instrument Write Product Managers
  151. 151. LOG DATA
  152. 152. UsersUse Curious Engineers Log data in Hadoop Data Scientists Find, Clean Ask Twitter Instrument Write Product Managers
  153. 153. UsersUse Curious Engineers Log data in Hadoop Data Scientists Find, Clean Ask Monitor Twitter Instrument Write Product Managers
  154. 154. UsersUse Curious Engineers Log data in Hadoop Data Scientists Find, Clean, Analyze Ask Monitor Twitter Instrument Write Product Managers
  155. 155. Log data EngineersData Scientists Usersin Hadoop Find, Clean, Analyze Use Monitor Ask Curious 1 2 Twitter Instrument Write Product Managers
  156. 156. client page section component element action Event 50,000+ event types
  157. 157. client page section component element action Event 50,000+ event types one graph / event x 50,000
  158. 158. DESIGN
  159. 159. See Client event collection Engineers & Data Scientists
  160. 160. See Client event collection Engineers & Data Scientists narrow down Interactions search box => filter
  161. 161. See HOW TO VISUALIZE? narrow down Client event collection Engineers & Data Scientists Interactions search box => filter
  162. 162. See Client event collection Engineers & Data Scientists client : page : section : component : element : action HOW TO VISUALIZE? narrow down Interactions search box => filter
  163. 163. CLIENT EVENT HIERARCHY iphone home - - - impression tweet tweet click iphone:home:-:-:-:impression iphone:home:-:tweet:tweet:click
  164. 164. DETECT CHANGES iphone home - - - impression tweet tweet click iphone home - - - impression tweet tweet click TODAY 7 DAYS AGO compared to
  165. 165. CALCULATE CHANGES +5% +5% +5% +10% +10% +10% -5% -5% -5% DIFF
  166. 166. DISPLAY CHANGES iphone home - - - impression tweet tweet click Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]
  167. 167. DISPLAY CHANGES home - - - impression tweet tweet click iphone
  168. 168. Demo Demo Demo Demo / Scribe Radar
  169. 169. Twitter for Banana
  170. 170. PROJECT 3: VISUAL ANALYTICS TOOLS FOR EXPERIMENTATION
  171. 171. A/B TESTING
  172. 172. RUN AN EXPERIMENT Develop feature Track metrics 1. No. of Tweets read 2. No. of Tweets sent 3. No. of Users 4. … Set bucket size How many users?
  173. 173. RETROSPECTIVE ANALYSIS Data scientist analyzed 100+ past experiments. Many useful insights. - We could move metric A by X% on average. - Experiment 18 moved metric A the most - Which team was able to move metric A successfully? - etc.
  174. 174. RETROSPECTIVE ANALYSIS Data scientist analyzed 100+ past experiments. Many useful insights. - We could move metric A by X% on average. - Experiment 18 moved metric A the most - Which team was able to move metric A successfully? - etc. Amount of knowledge transfer = slide deck + wiki page. Reproduce for recent experiments? Manually.
  175. 175. RETROSPECTIVE ANALYSIS Data scientist analyzed 100+ past experiments. Many useful insights. - We could move metric A by X% on average. - Experiment 18 moved metric A the most - Which team was able to move metric A successfully? - etc. Amount of knowledge transfer = slide deck + wiki page. Reproduce for recent experiments? Manually. Make results more accessible and convenient to use.
  176. 176. RETROSPECTIVE ANALYSIS Data scientist analyzed 100+ past experiments. Many useful insights. - We could move metric A by X% on average. - Experiment 18 moved metric A the most - Which team was able to move metric A successfully? - etc. Amount of knowledge transfer = slide deck + wiki page. Reproduce for recent experiments? Manually. Make results more accessible and convenient to use. Automatic
  177. 177. Metric Mover I like to move it, move it Krist Wongsuphasawat, Joseph Liu, Matthew Schreiner, Andy Schlaikjer, Lucile Lu and Busheng Lou
  178. 178. Set OKRs Process # of posts
  179. 179. Implement a feature Set OKRs Process Setup experiment # of posts # of posts
  180. 180. Implement a feature Set OKRs Interpret results Process Run experiment +1.0% Setup experiment # of posts # of posts
  181. 181. Implement a feature Set OKRs Interpret results Process Run experiment +1.0% Setup experiment How easy/hard it is to move this metric? How much change to aim for? Challenges # of posts # of posts
  182. 182. Implement a feature Set OKRs Interpret results Process Run experiment +1.0% How much to expect from one experiment? What were the successful features? Who had experience with this?Setup experiment How easy/hard it is to move this metric? How much change to aim for? Challenges # of posts # of posts
  183. 183. Implement a feature Set OKRs Interpret results Process Run experiment +1.0% How much to expect from one experiment? What were the successful features? Who had experience with this?Setup experiment How easy/hard it is to move this metric? How much change to aim for? How good is this? Challenges # of posts # of posts
  184. 184. Past experiments Metric Mover
  185. 185. Exp. 1 Exp. 2 Exp. 3 Exp. 4 Metric: No. of Posts
  186. 186. Exp. 1 Exp. 2 Exp. 3 Exp. 4 Metric: No. of Posts Control buckets
  187. 187. Exp. 1 Exp. 2 Exp. 3 Exp. 4 Metric: No. of Posts
  188. 188. Exp. 1 Exp. 2 Exp. 3 Exp. 4 Metric: No. of Posts Insignificant buckets
  189. 189. Exp. 1 Exp. 2 Exp. 3 Exp. 4 Metric: No. of Posts
  190. 190. Metric: No. of Posts
  191. 191. Metric: No. of Posts % change 0-1%-2% 2%1%
  192. 192. Metric: No. of Posts % change 0-1%-2% 2%1% |scaled impact| 100,000,000 1,000,000 10,000 100
  193. 193. Users who watch cat GIFs Users who like cat GIFs Users who post cat GIFs **These are fake data.**
  194. 194. WORKFLOW Identify needs Design and prototype Make it work for sample dataset Refine, generalize and productionize Make it work for other cases Document and release Maintain and support Keep it running, Feature requests & Bugs fix
  195. 195. What separates good and great work 4. EXPECT TIME FOR REFINEMENT
  196. 196. REFINE & POLISH UX / UI + Mobile Support Color Animation / Transition Performance Loading time, Data file size “The little of visualisation design” by Andy Kirk http://www.visualisingdata.com/2016/03/little-visualisation-design/
  197. 197. “The first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time.” — Tom Cargill, Bell Labs
  198. 198. or find ways to get some 5. EXPECT FEEDBACK
  199. 199. “Feedback is the breakfast of champion.” — Ken Blanchard
  200. 200. FEEDBACK During development Feedback sessions with clients/potential users After release Logging User study Forum, User group Office hours
  201. 201. 6. EXPECT TO IMPROVE
  202. 202. HOW TO BE BETTER? Time is limited.
 Learn from the past Expand skills Get help / Grow the team Improve tooling Solve a problem once and for all Automate repetitive tasks
  203. 203. http://twitter.github.io/labella.js Demo / Labella.js
  204. 204. https://github.com/twitter/d3kit Demo / d3Kit http://www.slideshare.net/kristw/d3kit
  205. 205. yeoman.io Demo / Yeoman
  206. 206. SUMMARY
  207. 207. EXPECT… 1. to find the real need 2. to clean data a lot 3. trials and errors 4. time for refinement 5. feedback 6. to improve Krist Wongsuphasawat / @kristw kristw.yellowpigz.com
  208. 208. THANK YOU
  209. 209. QUESTIONS?
  210. 210. My colleagues at Twitter for their collaboration and support in these projects; and my wife for taking care of the baby while I make these slides. ACKNOWLEDGEMENT
  211. 211. RESOURCES Images Banana phone http://goo.gl/GmcMPq Bar chart https://goo.gl/1G1GBg Boss https://goo.gl/gcY8Kw Champions League http://goo.gl/DjtNKE Database http://goo.gl/5N7zZz Fishing shark http://goo.gl/2fp4zW Frustrated programmer https://goo.gl/ZLDNny Globe visualization http://goo.gl/UiGMMj Harry Potter http://goo.gl/Q9Cy64 Holding phone http://goo.gl/It2TzH Jon Snow https://goo.gl/CACWxE Jon Snow lightsaber https://goo.gl/CJt1Tn Kiwi orange http://goo.gl/ejQ73y Kiwi http://goo.gl/9yk7o5 Library https://goo.gl/HVeE6h Library earthquake http://goo.gl/rBqBrs Minion http://goo.gl/I19Ijg Nemo https://goo.gl/m0pmzC Orange & Apple http://goo.gl/NG6RIL Pile of paper http://goo.gl/mGLQTx Scrooge McDuck https://goo.gl/aKv8D7 Trash pile http://goo.gl/OsFfo3 Watercolor Map by Stamen Design Yes GIF https://goo.gl/agvlAE

×