Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ร้อยเรื่องราวจากข้อมูล / Storytelling with Data

261 views

Published on

A talk given at Chulalongkorn University in an event hosted by Faculty of Communication Arts and Faculty of Engineering on Sep 29, 2016.

Published in: Data & Analytics
  • Be the first to comment

ร้อยเรื่องราวจากข้อมูล / Storytelling with Data

  1. 1. Krist Wongsuphasawat / @kristw ร้อยเรื่องราวจากข้อมูล STORYTELLING WITH DATA
  2. 2. แนะนําตัวก่อน
  3. 3. Computer Engineer Chulalongkorn University PhD in Computer Science Information Visualization Univ. of Maryland IBM Microsoft Data Scientist Twitter Krist Wongsuphasawat / @kristw
  4. 4. ข้อมูล
  5. 5. ประมง
  6. 6. 400
  7. 7. เก็บข้อมูล Time Location Type 12:00 Paragon Magikarp 12:05 Siam Dis Magikarp 12:40 CTW Magikarp … … …
  8. 8. เวลา 00:00 12:00 00:006:00 18:00 จำนวนปลา เวลา
  9. 9. DATA VISUALIZATION การแปลงข้อมูลเป็นภาพ
  10. 10. ประวัติศาสตร์
  11. 11. data
  12. 12. Number of Napoleon's troops, Distance, Temperature, Latitude and Longitude, Direction of travel, Location (relative to specific dates) 2 dimensions 6 types of data
  13. 13. DATA VISUALIZATION Explanatory Communicate known information Exploratory Explore data to reveal insights
  14. 14. ข้อมูลมาจากไหน?
  15. 15. DATA SOURCES Open data Publicly available Private data owned by organization, not available to public Self-collected data Manual, site scraping, etc. Combination of the above
  16. 16. OPEN DATA
  17. 17. OPEN DATA
  18. 18. เก็บเองก็ได้
  19. 19. ข้อมูลที่ทวิตเตอร์ Tweets Text, Time, Location, Media User information Age, Country, etc. Follows User interactions Navigation, Views
  20. 20. MANY FORMS OF DATA Standalone files txt, csv, tsv, json, excel, Google Docs, …, pdf* APIs better quality with more overhead Databases doesn’t necessary mean they are organized Big data bigger pain
  21. 21. HAVING ALL TWEETS How people think I feel.
  22. 22. How people think I feel. How I really feel. HAVING ALL TWEETS
  23. 23. CHALLENGES Get relevant Tweets hashtag: #oscars keywords: “goal” (football) Too big Need to aggregate & reduce size Slow Long processing time (hours)
  24. 24. Hadoop Cluster GETTING BIG DATA Data Storage
  25. 25. Pig / Scalding (slow) GETTING BIG DATA Hadoop Cluster Data Storage Tool
  26. 26. Hadoop Cluster Pig / Scalding (slow) GETTING BIG DATA Data Storage Tool
  27. 27. Pig / Scalding (slow) GETTING BIG DATA Hadoop Cluster Data Storage Tool Your laptop Smaller dataset
  28. 28. Hadoop Cluster Pig / Scalding (slow) Data Storage Tool Final dataset Tool node.js / python / excel (fast) Your laptop GETTING BIG DATA Smaller dataset
  29. 29. เอาข้อมูลไปทําอะไร?
  30. 30. APPLICATIONS OF DATA Personal analytics Anyone Product analytics Product Manager, Engineer Data Journalism News, Magazine, Company’s Public Relations …
  31. 31. NEW YORK TIMES GRAPHICS http://www.nytimes.com/interactive/2014/08/13/upshot/where-people-in-each-state-were-born.html?abt=0002&abg=0#New_York
  32. 32. THE GUARDIAN
  33. 33. NEWS New York Times The Guardian Washington Post Wall Street Journal FiveThirtyEight etc.
  34. 34. GOOGLE TRENDS https://www.google.com/trends/story/US_cu_XRyhKlcBAACrtM_en
  35. 35. GOOGLE TRENDS https://www.google.com/trends/story/US_cu_XRyhKlcBAACrtM_en
  36. 36. UBER https://newsroom.uber.com/a-day-in-the-life-of-uber/
  37. 37. ตัวอย่างงาน
  38. 38. ทวีตอะไร?
  39. 39. โปเกมอนที่ถูกพูดถึงมากที่สุด
  40. 40. ทวีตเมื่อไหร่?
  41. 41. ทวีตต่อนาที
  42. 42. ทวีตต่อนาที interactive.twitter.com/euro2016
  43. 43. ทวีตที่ไหน?
  44. 44. LOCATION Low density High density by Miguel Rios
  45. 45. LOCATION Low density High density by Miguel Rios
  46. 46. LOCATION flickr.com/photos/twitteroffice/8798020541 San Francisco Low density High density by Miguel Rios
  47. 47. Rebuild the world based on tweet density twitter.github.io/interactive/andes/ by Nicolas Garcia Belmonte
  48. 48. ทวีตอะไร? ที่ไหน? เมื่อไหร่?
  49. 49. HAPPY NEW YEAR สวัสดีปีใหม่
  50. 50. ปีใหม่ 2013 twitter.github.io/interactive/newyear2014/
  51. 51. USER อยู่ที่ไหน?
  52. 52. USER + LOCATION : FAN MAP interactive.twitter.com/nfl_followers2014
  53. 53. USER + LOCATION : FAN MAP interactive.twitter.com/nba_followers
  54. 54. USER + LOCATION : FAN MAP interactive.twitter.com/premierleague
  55. 55. interactive.twitter.com
  56. 56. มีขั้นตอนอะไรบ้าง?
  57. 57. ขั้นตอนวิเคราะห์ข้อมูล Collect Clean Explore* Analyze Present*
  58. 58. ขั้นตอนวิเคราะห์ข้อมูล Collect Clean Explore* Analyze Present*
  59. 59. CASE STUDY: GAME OF THRONES
  60. 60. Problem is coming. CHAPTER I
  61. 61. “Problem first, not solution backward” — Brian Caffo (via Ron Brookmeyer)
  62. 62. “If all you have is a hammer, everything looks like a nail.” — Abraham Maslow
  63. 63. Problem Want to know what the audience talk about a TV show
  64. 64. Problem Want to know what the audience talk about a TV show from Tweets
  65. 65. HBO’s Game of Thrones Based on a book series “A Song of Ice and Fire” Medieval Fantasy. Knights, magic and dragons.
  66. 66. Brief Story
  67. 67. A King dies.  A lot of contenders wage a war to reclaim the throne.
  68. 68. Minor characters with no claim to the throne set their own plans in action to gain power when all the major characters end up killing each other.
  69. 69. Brave/Honest/Honorable characters die. Intelligent but shady characters and characters who know nothing continue to live.
  70. 70. While humans are busy killing each other, ice zombies “White walkers” are invading from the North. The only group who seems to care about this is neutral group called the Night’s Watch.
  71. 71. HBO’s Game of Thrones Based on a book series “A Song of Ice and Fire” Medieval Fantasy. Knights, magic and dragons. Many characters. Anybody can die. 6 seasons (57 episodes) so far Multiple storylines in each episode
  72. 72. Problem Want to know what the audience talk about a TV show from Tweets
  73. 73. Ideas Common words Too much noise
  74. 74. Ideas Common words Too much noise Characters How o"en each character were mentioned?
  75. 75. I demand a trial by prototyping. CHAPTER II
  76. 76. Prototyping Pull sample data from Twitter API Character recognition and counting naive approach
  77. 77. Sample Tweet
  78. 78. Sample Tweet
  79. 79. List of names Daenerys Targaryen,Khaleesi Jon Snow Sansa Stark Tyrion Lannister Arya Stark Cersei Lannister Khal Drogo Gregor Clegane,Mountain Margaery Tyrell Joffrey Baratheon Bran Stark Theon Greyjoy Jaime Lannister Brienne Eddard Stark,Ned Stark Ramsay Bolton Sandor Clegane,Hound Ygritte Stannis Baratheon Petyr Baelish,Little Finger Robb Stark Bronn Varys Catelyn Stark Oberyn Martell Daario Naharis Davos Seaworth Jorah Mormont Melisandre Myrcella Baratheon Tywin Lannister Tommen Baratheon Grey Worm Tyene Sand Rickon Stark Missandei Roose Bolton Robert Baratheon Jojen Reed Jeor Mormont Tormund Giantsbane Lysa Arryn Yara Greyjoy,Asha Greyjoy Samwell Tarly,Sam Hodor Victarion Greyjoy High Sparrow Dragon Winter Dothraki
  80. 80. Sample data Character Count Hodor 10000 Jon Snow 5000 Daenerys 4000 Bran Stark 3000 … … *These numbers are made up for presentation, not real data.
  81. 81. When you play the game of vis, you iterate or you die. CHAPTER III
  82. 82. Where to go from here?
  83. 83. + emotion
  84. 84. + connections
  85. 85. + connections
  86. 86. Gain insights from a single episode emotion & connections
  87. 87. Sample data Character Count Jon Snow+Sansa 1000 Tormund+Brienne 500 Bran Stark+Hodor 300 … … Character Count Hodor 10000 Jon Snow 5000 Daenerys 4000 … … INDIVIDUALS CONNECTIONS + top emojis + top emojis *These numbers are made up for presentation, not real data.
  88. 88. Graph NODES EDGES + top emojis + top emojis Character Count Jon Snow+Sansa 1000 Tormund+Brienne 500 Bran Stark+Hodor 300 … … Character Count Hodor 10000 Jon Snow 5000 Daenerys 4000 … … *These numbers are made up for presentation, not real data.
  89. 89. Network Visualization Node-link diagram Force-directed layout http://blockbuilder.org/kristw/762b680690e4b2b2666dfec15838a384
  90. 90. + Collision Detection http://blockbuilder.org/kristw/2850f65d6329c5fef6d5c9118f1de6e6
  91. 91. + Community Detection https://github.com/upphiminn/jLouvain
  92. 92. + Collision Detection (with clusters) https://bl.ocks.org/mbostock/7881887
  93. 93. Let’s get other episodes.
  94. 94. (More) data are coming. CHAPTER IV
  95. 95. More data 1 episode (1 day) => all episodes (6 years) Rewrite the scripts to get archived data
  96. 96. How much data do we need? Whole week? 5 days? 2 days? A day? etc.
  97. 97. How much data do we need?
  98. 98. Hold the vis. CHAPTER V
  99. 99. The vis is not enough.
  100. 100. Legend
  101. 101. Navigation
  102. 102. Top 3
  103. 103. Adjust threshold
  104. 104. Recap
  105. 105. Filtered Recap Tooltip
  106. 106. Demo https://interactive.twitter.com/game-of-thrones
  107. 107. Mobile Support
  108. 108. A visualizer always evaluates his work. CHAPTER VI
  109. 109. “Feedback is the breakfast of champion.” — Ken Blanchard
  110. 110. Self & Peer Does it solve the problem?
  111. 111. Tormund + Brienne
  112. 112. Google Analytics Pageviews Visitors Actions Referrals Sites/Social
  113. 113. Feedback
  114. 114. Feedback
  115. 115. สรุป Data are around us and come from many sources. Open data are valuable. Telling story from data is one possible application. News, Magazine, Company PR. Takes time and iterations with many trials and errors. Start with a problem, collect the data, explore, find a story and present it. Krist Wongsuphasawat / @kristw kristw.yellowpigz.com
  116. 116. The Reading Room 2 Silom Soi 19, Bangkok, Thailand 10500
  117. 117. ขอบคุณครับ

×