Successfully reported this slideshow.
Your SlideShare is downloading. ×

What I tell myself before visualizing

Ad

Krist Wongsuphasawat
@kristw
WHAT I TELL MYSELF
BEFORE
VISUALIZING

Ad

WHAT I TELL MYSELF
BEFORE
VISUALIZING
Krist Wongsuphasawat
@kristw

Ad

Computer Engineer
Bangkok, Thailand
Chulalongkorn University
Krist Wongsuphasawat / @kristw

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 138 Ad
1 of 138 Ad
Advertisement

More Related Content

More from Krist Wongsuphasawat (20)

Advertisement

What I tell myself before visualizing

  1. 1. Krist Wongsuphasawat @kristw WHAT I TELL MYSELF BEFORE VISUALIZING
  2. 2. WHAT I TELL MYSELF BEFORE VISUALIZING Krist Wongsuphasawat @kristw
  3. 3. Computer Engineer Bangkok, Thailand Chulalongkorn University Krist Wongsuphasawat / @kristw
  4. 4. Programming & Football Computer Engineer Bangkok, Thailand Krist Wongsuphasawat / @kristw
  5. 5. Krist Wongsuphasawat / @kristw Programming & Football Computer Engineer Bangkok, Thailand
  6. 6. (P.S. These are actually not my robots, but our competitors’.) Krist Wongsuphasawat / @kristw Computer Engineer Bangkok, Thailand
  7. 7. PhD in Computer Science Information Visualization Univ. of Maryland Krist Wongsuphasawat / @kristw Computer Engineer Bangkok, Thailand
  8. 8. IBM Data Scientist Analytics, Experiment Twitter Microsoft Krist Wongsuphasawat / @kristw PhD in Computer Science Information Visualization Univ. of Maryland Computer Engineer Bangkok, Thailand
  9. 9. IBM Krist Wongsuphasawat / @kristw Engineering Manager Data Experience Airbnb Microsoft Twitter PhD in Computer Science Information Visualization Univ. of Maryland Computer Engineer Bangkok, Thailand
  10. 10. Public-facing visualizations Open-source projects Visual Analytics Tools interactive.twitter.com Apache Superset (31,000+ ⭐) labella.js (3000+ ⭐) visx (10000+ ⭐) react-vega, encodable Internal tools Academic papers kristw.yellowpigz.com
  11. 11. DATA = ME + VIS
  12. 12. Data, I’m ready!
  13. 13. Data, I’m ready! Here I come!
  14. 14. WHAT TO EXPECT?
  15. 15. 1. EXPECT TO FIND THE REAL NEED
  16. 16. INPUT (DATA) What clients think they have
  17. 17. INPUT (DATA) What clients think they have What they usually have
  18. 18. YOU What clients think you are
  19. 19. YOU What clients think you are What they will get
  20. 20. OUTPUT (VIS) What clients ask for
  21. 21. OUTPUT (VIS) What clients ask for What they really need
  22. 22. COMMUNICATE
  23. 23. GOALS Present data Communicate information effectively Analyze data Exploratory data analysis Tools to analyze data Reusable tools for exploration Enjoy Combination of above
  24. 24. GOALS Present data Communicate information effectively Analyze data Exploratory data analysis Tools to analyze data Reusable tools for exploration Enjoy Combination of above Who are the audience? What do you want to tell? What are the questions? Who will use this? What would they use this for? Who are the audience?
  25. 25. I need this. Take this.
  26. 26. I need this. Here you are. I need this. Take this.
  27. 27. & COMPROMISE
  28. 28. 2. EXPECT TO CLEAN DATA
  29. 29. 2. EXPECT TO CLEAN DATA A LOT
  30. 30. 70-80% of time cleaning data “DATA JANITOR”
  31. 31. Collect + Clean + Transform DATA WRANGLING
  32. 32. WHY DOES IT TAKE SO MUCH TIME?
  33. 33. 2.1 Many sources and formats
  34. 34. DATA SOURCES Open data Publicly available Internal data Private, owned by clients’ organization Self-collected data Manual, site scraping, etc. Combine the above
  35. 35. DATA FORMAT Standalone files txt, csv, tsv, json, Google Docs, …, pdf* Databases doesn’t necessary mean they are organized API better quality with more overhead Big data*
  36. 36. NEED TO… Change format e.g. tsv => json Combine data Resolve multiple sources of truth
  37. 37. 2.2 Data transformation
  38. 38. EXAMPLES Convert latitude/longitude into country Change country code from 3-letter (USA) to 2-letter (US) Correct time of day based on users’ timezone etc.
  39. 39. 2.3 Data collection issues
  40. 40. EXAMPLES Typos Incorrect values Incorrect timestamps Missing data
  41. 41. 2.4 Definition of “clean” data
  42. 42. IS THIS CLEAN? USER RESTAURANT RATING ======================== A MCDONALD’S 3 B MCDONALDS 3 C MCDONALD 4 D MCDONALDS 5 E IHOP 4 F SUBWAY 4
  43. 43. IS THIS CLEAN? USER RESTAURANT RATING ======================== A MCDONALD’S 3 B MCDONALDS 3 C MCDONALD 4 D MCDONALDS 5 E IHOP 4 F SUBWAY 4 How many reviews are there? Clean. How many restaurants are there? Not clean. McDonald, McDonald’s, McDonalds
  44. 44. 2.5 Bigger data, bigger problems
  45. 45. HAVING ALL TWEETS How people think I feel.
  46. 46. How people think I feel. How I really feel. HAVING ALL TWEETS
  47. 47. Lots of machines GETTING BIG DATA Data Warehouse
  48. 48. Spark, Hadoop, etc. (slow) GETTING BIG DATA Tool Lots of machines Data Warehouse
  49. 49. GETTING BIG DATA Tool Your laptop Smaller dataset Spark, Hadoop, etc. (slow) Lots of machines Data Warehouse
  50. 50. Tool Final dataset Tool node.js / python / excel (fast) Your laptop GETTING BIG DATA Smaller dataset Spark, Hadoop, etc. (slow) Lots of machines Data Warehouse
  51. 51. CHALLENGES Slow Long processing time (hours) Get relevant Tweets keywords: “parasite” (movie name) Too big Need to aggregate & reduce size Harder to spot problems
  52. 52. CHALLENGES Slow Long processing time (hours) Get relevant Tweets keywords: “parasite” (movie name) Too big Need to aggregate & reduce size Harder to spot problems
  53. 53. 2.6 Issues can show up any time.
  54. 54. RECOMMENDATIONS Always think that you will have to do it again document the process, automation Reusable scripts break large script into smaller ones Reusable data keep for future project
  55. 55. 3. PREPARE TO ITERATE AGAIN & AGAIN
  56. 56. It was a great idea … until I actually tried it.
  57. 57. Celebrate your failures #D3BrokeAndMadeArt https://twitter.com/enjalot/status/1313159226995466240?s=20
  58. 58. TIPS Don’t give up. If stuck, take a break. Look for inspirations. The vis that gives you insights may or may not be the vis for sharing. Exploration vs. Communication Keep it as simple as possible but not simpler.
  59. 59. “Necessity is the mother of invention.” — Old proverb
  60. 60. “Necessity is the mother of invention.” DEADLINE — Old proverb
  61. 61. TIPS Don’t give up. If stuck, take a break. Look for inspirations. The vis that gives you insights may or may not be the vis for sharing. Exploration vs. Communication Keep it as simple as possible but not simpler. Set deadlines
  62. 62. BOBA SCIENCE PROJECT /
  63. 63. https://medium.com/s/story/boba-science- how-can-i-drink-a-bubble-tea-to-ensure-that- i-dont-finish-the-tea-before-the- bobas-7fc5fd0e442d
  64. 64. LOTS OF ITERATIONS
  65. 65. GAME OF THRONES PROJECT /
  66. 66. Reveal the talking points of every episode of from fans’ conversations
  67. 67. PROBLEM Understand what the audience talk about a TV show from Tweets
  68. 68. HBO’S GAME OF THRONES Based on a book series “A Song of Ice and Fire” Medieval Fantasy. Knights, magic and dragons.
  69. 69. HBO’S GAME OF THRONES Based on a book series “A Song of Ice and Fire” Medieval Fantasy. Knights, magic and dragons. Many characters. Anybody can die. 8 seasons Multiple storylines in each episode
  70. 70. IDEAS Common words Too much noise
  71. 71. IDEAS Common words Too much noise Characters How often each character were mentioned?
  72. 72. PROTOTYPING Pull sample data from Twitter API Count characters naive approach
  73. 73. LIST OF NAMES Daenerys Targaryen,Khaleesi Jon Snow Sansa Stark Tyrion Lannister Arya Stark Cersei Lannister Khal Drogo Gregor Clegane,Mountain Margaery Tyrell Joffrey Baratheon Bran Stark Theon Greyjoy Jaime Lannister Brienne Eddard Stark,Ned Stark Ramsay Bolton Sandor Clegane,Hound Ygritte Stannis Baratheon Petyr Baelish,Little Finger Robb Stark Bronn Varys Catelyn Stark Oberyn Martell Daario Naharis Davos Seaworth Jorah Mormont Melisandre Myrcella Baratheon Tywin Lannister Tommen Baratheon Grey Worm Tyene Sand Rickon Stark Missandei Roose Bolton Robert Baratheon Jojen Reed Jeor Mormont Tormund Giantsbane Lysa Arryn Yara Greyjoy,Asha Greyjoy Samwell Tarly,Sam Hodor Victarion Greyjoy High Sparrow Dragon Winter Dothraki
  74. 74. SAMPLE TWEET
  75. 75. SAMPLE TWEET
  76. 76. SAMPLE DATA Character Count Hodor 10000 Jon Snow 5000 Daenerys 4000 Bran Stark 3000 … … *These numbers are made up for presentation, not real data.
  77. 77. WHERE TO GO FROM HERE?
  78. 78. + EMOTION
  79. 79. + CONNECTIONS
  80. 80. + CONNECTIONS
  81. 81. FOCUS ON EMOTION & CONNECTIONS WITHIN EPISODE
  82. 82. SAMPLE DATA Character Count Jon Snow+Sansa 1000 Tormund+Brienne 500 Bran Stark+Hodor 300 … … Character Count Hodor 10000 Jon Snow 5000 Daenerys 4000 … … INDIVIDUALS CONNECTIONS + top emojis + top emojis *These numbers are made up for presentation, not real data.
  83. 83. GRAPH NODES LINKS + top emojis + top emojis Character Count Jon Snow+Sansa 1000 Tormund+Brienne 500 Bran Stark+Hodor 300 … … Character Count Hodor 10000 Jon Snow 5000 Daenerys 4000 … … *These numbers are made up for presentation, not real data.
  84. 84. ISSUE: HAIRBALL
  85. 85. TRIED: MANUAL POSITIONS
  86. 86. + COLLISION DETECTION http://blockbuilder.org/kristw/2850f65d6329c5fef6d5c9118f1de6e6
  87. 87. + COMMUNITY DETECTION https://github.com/upphiminn/jLouvain
  88. 88. + COLLISION DETECTION (WITH CLUSTERS) https://bl.ocks.org/mbostock/7881887
  89. 89. LET’S GET OTHER EPISODES.
  90. 90. MORE DATA Hadoop Rewrite the scripts to get archived data
  91. 91. HOW MUCH DATA DO WE NEED? Whole week? 5 days? 2 days? A day? etc.
  92. 92. HOW MUCH DATA DO WE NEED?
  93. 93. THE VIS IS NOT ENOUGH.
  94. 94. Legend
  95. 95. Navigation
  96. 96. Top 3
  97. 97. Adjust threshold
  98. 98. Recap
  99. 99. Filtered Recap Tooltip
  100. 100. DEMO https:/ /interactive.twitter.com/game-of-thrones
  101. 101. MOBILE SUPPORT
  102. 102. 4. RESERVE TIME FOR REFINEMENT
  103. 103. “The first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time.” — Tom Cargill, Bell Labs
  104. 104. REFINE & POLISH Color UX / UI + Mobile Support Animation / Transition Metadata for SEO Social media preview images Performance Loading time, Data file size
  105. 105. FANDOM MAPS PROJECT /
  106. 106. FAN MAP - NFL interactive.twitter.com/
  107. 107. FAN MAP - NBA interactive.twitter.com
  108. 108. FAN MAP - ENGLISH PREMIER LEAGUE interactive.twitter.com
  109. 109. FAN MAP - ENGLISH PREMIER LEAGUE
  110. 110. INTERACTIVE.TWITTER.COM
  111. 111. GAME OF THRONES PROJECT /
  112. 112. TRANSITIONS
  113. 113. CHANGING EPISODE (BEFORE)
  114. 114. CHANGING EPISODE (AFTER)
  115. 115. EXAMPLE
  116. 116. ISSUE: CONVEX HULL http://bl.ocks.org/mbostock/4341699
  117. 117. X & Y ONLY, NO RADIUS
  118. 118. FIX IT
  119. 119. 5. PLAN FOR FEEDBACK
  120. 120. “Feedback is the breakfast of champion.” — Ken Blanchard
  121. 121. FEEDBACK During development Feedback sessions with clients/potential users After release Logging User study Office hours
  122. 122. FEEDBACK
  123. 123. FEEDBACK
  124. 124. 6. LOOK BACK TO MOVE FORWARD
  125. 125. WHAT COULD HAVE BEEN BETTER? If I knew how to do XXX… Learning opportunities If I had someone who can do XXX… Look for help Grow the team If I did not have to do the same tasks again… Reusable components Automate repetitive tasks
  126. 126. LABELLA.JS PROJECT /
  127. 127. VISX = REACT + D3 PROJECT /
  128. 128. SUMMARY
  129. 129. WHAT I TELL MYSELF BEFORE VISUALIZING 1. 2. 3. 4. 5. 6. Krist Wongsuphasawat / @kristw kristw.yellowpigz.com Expect to find the real need Expect to clean data a lot Prepare to iterate again & again Reserve time for refinement Plan for feedback Look back to move forward
  130. 130. My former and current colleagues at Twitter and Airbnb for their collaboration and support in these projects; and my wife for taking care of our two kids while I make these slides. ACKNOWLEDGEMENT
  131. 131. WHAT I TELL MYSELF BEFORE VISUALIZING 1. 2. 3. 4. 5. 6. Krist Wongsuphasawat / @kristw kristw.yellowpigz.com Expect to find the real need Expect to clean data a lot Prepare to iterate again & again Reserve time for refinement Plan for feedback Look back to move forward
  132. 132. THANK YOU

×