Krist Wongsuphasawat / @kristw
ร้อยเรื่องราวจากข้อมูล
STORYTELLING WITH DATA
แนะนําตัวก่อน
Computer Engineer
Chulalongkorn University
PhD in Computer Science
Information Visualization
Univ. of Maryland
IBM
Microsoft
Data Scientist
Twitter
Krist Wongsuphasawat / @kristw
ข้อมูล
ประมง
400
เก็บข้อมูล
Time Location Type
12:00 Paragon Magikarp
12:05 Siam Dis Magikarp
12:40 CTW Magikarp
… … …
เวลา
00:00 12:00 00:006:00 18:00
จำนวนปลา
เวลา
DATA VISUALIZATION
การแปลงข้อมูลเป็นภาพ
ประวัติศาสตร์
data
Number of Napoleon's troops,
Distance, Temperature,
Latitude and Longitude,
Direction of travel,
Location (relative to specific dates)
2 dimensions
6 types of data
DATA VISUALIZATION
Explanatory
Communicate known information
Exploratory
Explore data to reveal insights
ข้อมูลมาจากไหน?
DATA SOURCES
Open data
Publicly available
Private data
owned by organization, not available to public
Self-collected data
Manual, site scraping, etc.
Combination of the above
OPEN DATA
OPEN DATA
เก็บเองก็ได้
ข้อมูลที่ทวิตเตอร์
Tweets
Text, Time, Location, Media
User information
Age, Country, etc.
Follows
User interactions
Navigation, Views
MANY FORMS OF DATA
Standalone files
txt, csv, tsv, json, excel, Google Docs, …, pdf*
APIs
better quality with more overhead
Databases
doesn’t necessary mean they are organized
Big data
bigger pain
HAVING ALL TWEETS
How people think I feel.
How people think I feel. How I really feel.
HAVING ALL TWEETS
CHALLENGES
Get relevant Tweets
hashtag: #oscars
keywords: “goal” (football)
Too big
Need to aggregate & reduce size
Slow
Long processing time (hours)
Hadoop Cluster
GETTING BIG DATA
Data Storage
Pig / Scalding (slow)
GETTING BIG DATA
Hadoop Cluster
Data Storage
Tool
Hadoop Cluster
Pig / Scalding (slow)
GETTING BIG DATA
Data Storage
Tool
Pig / Scalding (slow)
GETTING BIG DATA
Hadoop Cluster
Data Storage
Tool
Your laptop Smaller dataset
Hadoop Cluster
Pig / Scalding (slow)
Data Storage
Tool
Final dataset
Tool node.js / python / excel (fast)
Your laptop
GETTING BIG DATA
Smaller dataset
เอาข้อมูลไปทําอะไร?
APPLICATIONS OF DATA
Personal analytics
Anyone
Product analytics
Product Manager, Engineer
Data Journalism
News, Magazine, Company’s Public Relations
…
NEW YORK TIMES GRAPHICS
http://www.nytimes.com/interactive/2014/08/13/upshot/where-people-in-each-state-were-born.html?abt=0002&abg=0#New_York
THE GUARDIAN
NEWS
New York Times
The Guardian
Washington Post
Wall Street Journal
FiveThirtyEight
etc.
GOOGLE TRENDS
https://www.google.com/trends/story/US_cu_XRyhKlcBAACrtM_en
GOOGLE TRENDS
https://www.google.com/trends/story/US_cu_XRyhKlcBAACrtM_en
UBER
https://newsroom.uber.com/a-day-in-the-life-of-uber/
ตัวอย่างงาน
ทวีตอะไร?
โปเกมอนที่ถูกพูดถึงมากที่สุด
ทวีตเมื่อไหร่?
ทวีตต่อนาที
ทวีตต่อนาที
interactive.twitter.com/euro2016
ทวีตที่ไหน?
LOCATION
Low density
High density
by Miguel Rios
LOCATION
Low density
High density
by Miguel Rios
LOCATION
flickr.com/photos/twitteroffice/8798020541
San Francisco
Low density
High density
by Miguel Rios
Rebuild the world
based on
tweet density
twitter.github.io/interactive/andes/
by Nicolas Garcia Belmonte
ทวีตอะไร? ที่ไหน? เมื่อไหร่?
HAPPY NEW YEAR
สวัสดีปีใหม่
ปีใหม่ 2013
twitter.github.io/interactive/newyear2014/
USER อยู่ที่ไหน?
USER + LOCATION : FAN MAP
interactive.twitter.com/nfl_followers2014
USER + LOCATION : FAN MAP
interactive.twitter.com/nba_followers
USER + LOCATION : FAN MAP
interactive.twitter.com/premierleague
interactive.twitter.com
มีขั้นตอนอะไรบ้าง?
ขั้นตอนวิเคราะห์ข้อมูล
Collect
Clean
Explore*
Analyze
Present*
ขั้นตอนวิเคราะห์ข้อมูล
Collect
Clean
Explore*
Analyze
Present*
CASE STUDY:
GAME OF THRONES
Problem is coming.
CHAPTER I
“Problem first, not solution backward”
— Brian Caffo (via Ron Brookmeyer)
“If all you have is a hammer,
everything looks like a nail.”
— Abraham Maslow
Problem
Want to know what the audience
talk about a TV show
Problem
Want to know what the audience
talk about a TV show
from Tweets
HBO’s Game of Thrones
Based on a book series “A Song of Ice and Fire”
Medieval Fantasy. Knights, magic and dragons.
Brief Story
A King dies. 
A lot of contenders wage a war
to reclaim the throne.
Minor characters with no claim to the throne
set their own plans in action to gain power
when all the major characters end up killing each other.
Brave/Honest/Honorable characters die.
Intelligent but shady characters
and characters who know nothing
continue to live.
While humans are busy killing each other,
ice zombies “White walkers” are invading from the North.
The only group who seems to care about this
is neutral group called the Night’s Watch.
HBO’s Game of Thrones
Based on a book series “A Song of Ice and Fire”
Medieval Fantasy. Knights, magic and dragons.
Many characters.
Anybody can die.
6 seasons (57 episodes) so far
Multiple storylines in each episode
Problem
Want to know what the audience
talk about a TV show
from Tweets
Ideas
Common words
Too much noise
Ideas
Common words
Too much noise
Characters
How o"en each character were mentioned?
I demand a trial by prototyping.
CHAPTER II
Prototyping
Pull sample data
from Twitter API
Character recognition and counting
naive approach
Sample Tweet
Sample Tweet
List of names
Daenerys Targaryen,Khaleesi
Jon Snow
Sansa Stark
Tyrion Lannister
Arya Stark
Cersei Lannister
Khal Drogo
Gregor Clegane,Mountain
Margaery Tyrell
Joffrey Baratheon
Bran Stark
Theon Greyjoy
Jaime Lannister
Brienne
Eddard Stark,Ned Stark
Ramsay Bolton
Sandor Clegane,Hound
Ygritte
Stannis Baratheon
Petyr Baelish,Little Finger
Robb Stark
Bronn
Varys
Catelyn Stark
Oberyn Martell
Daario Naharis
Davos Seaworth
Jorah Mormont
Melisandre
Myrcella Baratheon
Tywin Lannister
Tommen Baratheon
Grey Worm
Tyene Sand
Rickon Stark
Missandei
Roose Bolton
Robert Baratheon
Jojen Reed
Jeor Mormont
Tormund Giantsbane
Lysa Arryn
Yara Greyjoy,Asha Greyjoy
Samwell Tarly,Sam
Hodor
Victarion Greyjoy
High Sparrow
Dragon
Winter
Dothraki
Sample data
Character Count
Hodor 10000
Jon Snow 5000
Daenerys 4000
Bran Stark 3000
… …
*These numbers are made up for presentation, not real data.
When you play the game of vis,
you iterate or you die.
CHAPTER III
Where to go from here?
+ emotion
+ connections
+ connections
Gain insights from a single episode
emotion & connections
Sample data
Character Count
Jon Snow+Sansa 1000
Tormund+Brienne 500
Bran Stark+Hodor 300
… …
Character Count
Hodor 10000
Jon Snow 5000
Daenerys 4000
… …
INDIVIDUALS CONNECTIONS
+ top emojis + top emojis
*These numbers are made up for presentation, not real data.
Graph
NODES EDGES
+ top emojis + top emojis
Character Count
Jon Snow+Sansa 1000
Tormund+Brienne 500
Bran Stark+Hodor 300
… …
Character Count
Hodor 10000
Jon Snow 5000
Daenerys 4000
… …
*These numbers are made up for presentation, not real data.
Network Visualization
Node-link diagram
Force-directed layout
http://blockbuilder.org/kristw/762b680690e4b2b2666dfec15838a384
+ Collision Detection
http://blockbuilder.org/kristw/2850f65d6329c5fef6d5c9118f1de6e6
+ Community Detection
https://github.com/upphiminn/jLouvain
+ Collision Detection (with clusters)
https://bl.ocks.org/mbostock/7881887
Let’s get other episodes.
(More) data are coming.
CHAPTER IV
More data
1 episode (1 day) => all episodes (6 years)
Rewrite the scripts
to get archived data
How much data do we need?
Whole week?
5 days?
2 days?
A day?
etc.
How much data do we need?
Hold the vis.
CHAPTER V
The vis is not enough.
Legend
Navigation
Top 3
Adjust threshold
Recap
Filtered Recap
Tooltip
Demo
https://interactive.twitter.com/game-of-thrones
Mobile Support
A visualizer always evaluates his work.
CHAPTER VI
“Feedback is the breakfast of champion.”
— Ken Blanchard
Self & Peer
Does it solve the problem?
Tormund + Brienne
Google Analytics
Pageviews
Visitors
Actions
Referrals
Sites/Social
Feedback
Feedback
สรุป
Data are around us and come from many sources.
Open data are valuable.
Telling story from data is one possible application.
News, Magazine, Company PR.
Takes time and iterations
with many trials and errors.
Start with a problem, collect the data, explore, find a story and present it.
Krist Wongsuphasawat / @kristw
kristw.yellowpigz.com
The Reading Room
2 Silom Soi 19,
Bangkok, Thailand 10500
ขอบคุณครับ

ร้อยเรื่องราวจากข้อมูล / Storytelling with Data