WHAT TO EXPECT
WHEN YOU ARE
VISUALIZING
Krist Wongsuphasawat / @kristw
Based on true stories
Forever querying
Never-ending cleaning
Hopelessly prototyping
Last minute coding
and many more…
Computer Engineer
Bangkok, Thailand
PhD in Computer Science
Information Visualization
Univ. of Maryland
IBM
Microsoft
Data Visualization Scientist
Twitter
Krist Wongsuphasawat / @kristw
VISUALIZE DATA
INPUT
(DATA)
=YOU+ OUTPUT
(VIS)
EXPECT THE MISMATCHES
INPUT (DATA)
What clients think they have
INPUT (DATA)
What clients think they have What they usually have
YOU
What clients think you are
YOU
What clients think you are What they will get
OUTPUT (VIS)
What clients ask for
OUTPUT (VIS)
What clients ask for What they really need
I need this. Take this.
I need this. Here you are.
I need this. Take this.
EXPECT THESE TASKS
INPUT
(DATA)
=YOU+ OUTPUT
(VIS)
INPUT
(DATA)
=YOU+ OUTPUT
(VIS)
+
Get data
& Wrangle
1
+
Analyze
& Visualize
2
GET DATA & WRANGLE1
DATA SOURCES
Open data
Publicly available
Internal data
Private, owned by clients’ organization
Self-collected data
Manual, site scraping, etc.
Combine the above
MANY FORMS OF DATA
Standalone files
txt, csv, tsv, json, Google Docs, …, pdf*
APIs
better quality with more overhead
Databases
doesn’t necessary mean they are organized
Big data
bigger pain
HAVING ALL TWEETS
How people think I feel.
How people think I feel. How I really feel.
HAVING ALL TWEETS
CHALLENGES
Get relevant Tweets
hashtag: #oscars
keywords: “spotlight” (movie name)
Too big
Need to aggregate & reduce size
Slow
Long processing time (hours)
Hadoop Cluster
GETTING BIG DATA
Data Storage
Pig / Scalding (slow)
GETTING BIG DATA
Hadoop Cluster
Data Storage
Tool
Hadoop Cluster
Pig / Scalding (slow)
GETTING BIG DATA
Data Storage
Tool
Pig / Scalding (slow)
GETTING BIG DATA
Hadoop Cluster
Data Storage
Tool
Your laptop Smaller dataset
Hadoop Cluster
Pig / Scalding (slow)
Data Storage
Tool
Final dataset
Tool node.js / python / excel (fast)
Your laptop
GETTING BIG DATA
Smaller dataset
EXPECT TO WAIT FOR (BIG) DATA
DATA WRANGLING
Clean
A clean dataset? Joking, right?
Filter
Less is more
Parse, Format, Correct, etc.
Change country code from 3-letter to 2-letter
Correct time of day based on users’ timezone
etc.
EXPECT A LOT OF TIME
WITH DATA WRANGLING
70-80% of time
“Data Janitor”
RECOMMENDATIONS
Always think that you will have to do it again
document the process, automation
Reusable scripts
break a gigantic do-it-all function into smaller ones
Reusable data
keep for future project
ANALYZE & VISUALIZE2
EXPECT DIFFERENT REQUIREMENTS
TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
So many things
we could learn
from Twitter data
Give us interesting vis
about xxxx by Nov 10
STORYTELLING : WHAT TO EXPECT
timely
Deadline is strict. Also can be unexpected events.
wide audience
easy to explain and understand, multi-device support
one-off projects
content screening
WHO/WHAT
STORYTELLING
WHERE WHEN
location time
user/content
WHO/WHAT
STORYTELLING
WHERE WHEN
location time
user/content
TIME : TWEETS/SECOND
by Miguel Rios
TIME : TWEETS/SECOND
by Miguel Rios
TIME : TWEETS/SECOND + ANNOTATION
http://www.flickr.com/photos/twitteroffice/5681263084/
by Miguel Rios
IT DOESN’T HAVE TO BE COMPLEX.
WHO/WHAT
STORYTELLING
WHERE WHEN
location time
user/content
LOCATION
Low density
High density
by Miguel Rios
LOCATION
flickr.com/photos/twitteroffice/8798020541
San Francisco
Low density
High density
by Miguel Rios
Rebuild the world
based on
tweet density
twitter.github.io/interactive/andes/
by Nicolas Garcia Belmonte
WHO/WHAT
STORYTELLING
WHERE WHEN
location time
user/content
CONTENT : US ELECTION 2016
CONTENT : #MUSEUMWEEK
CONTENT : #MUSEUMWEEK
WHO/WHAT
STORYTELLING
WHERE WHEN
location time
user/content
TIME + LOCATION : TWEET TIME BY CITY
Night
Late night
Daytime
Night
Late night
Daytime
by Miguel Rios & Jimmy Lin
Night
Late night
Daytime
Night
Late night
Daytime
TIME + LOCATION : TWEET TIME BY CITY
by Miguel Rios & Jimmy Lin
Night
Late night
Daytime
Night
Late night
Daytime
TIME + LOCATION : TWEET TIME BY CITY
by Miguel Rios & Jimmy Lin
TIME + LOCATION : TWEET TIME BY CITY
Night
Late night
Daytime
Night
Late night
Daytime
by Miguel Rios & Jimmy Lin
WHO/WHAT
STORYTELLING
WHERE WHEN
location time
user/content
CONTENT + LOCATION : TWEET MAP
by Robert Harris
CONTENT + LOCATION : TWEET MAP
by Robert Harris
most
frequent
term
CONTENT + LOCATION : TWEET MAP
by Robert Harris
Gmail was down
Jan 24, 2014
CONTENT + LOCATION : TWEET MAP
by Robert Harris
USER + LOCATION : FAN MAP
interactive.twitter.com/nfl_followers2014
USER + LOCATION : FAN MAP
interactive.twitter.com/nba_followers
USER + LOCATION : FAN MAP
interactive.twitter.com/premierleague
WHO/WHAT
STORYTELLING
WHERE WHEN
location time
user/content
CONTENT + TIME : STREAMGRAPH
CONTENT + TIME : MATCH SUMMARY
Biggest tournament for European soccer clubs
CONTENT + TIME : MATCH SUMMARY
Count Tweets mentioning the teams every minute
Dortmund Bayern Munich
Team 1 Team 2
time
begin
end
CONTENT + TIME : MATCH SUMMARY
CONTENT + TIME : MATCH SUMMARY
+ goals
CONTENT + TIME : MATCH SUMMARY
+ goals
+ players
CONTENT + TIME : COMPETITION SUMMARY
A B C D
A C
C
vs vs
vs + =
uclfinal.twitter.com
WHO/WHAT
STORYTELLING
WHERE WHEN
location time
user/content
CONTENT + TIME + LOCATION : NEW YEAR 2014
twitter.github.io/interactive/newyear2014/
BEHIND THE SCENE
https://interactive.twitter.com/tenyears
Project / Twitter 10 years
REQUEST
EXPECT FUNNY REQUESTS
DESIGN & PROTOTYPE
Engagements
First Minute First Hour First Day First Week
0 24h 0 7d0 60s 0 60m
EXPECT REVISIONS
Visualization is an important piece, but not the entire experience.
DON’T FORGET THE BIG PICTURE.
https://interactive.twitter.com/tenyears
Demo / Twitter 10 years
WORKFLOW
Requested / Identify needs
Design & Prototype
Refine
Mobile, Embed
Logging
Release
EXPECT THE UNEXPECTED
WORKFLOW
Requested / Identify needs
Design & Prototype
Refine
Mobile, Embed
Logging
Translations
Release
TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
Data sources
Output
explore
analyze
present
get
*
*
Data sources
Output
explore
analyze
present
get
*
*
ad-hoc scripts
Data sources
Output
explore
analyze
present
get
*
*
ad-hoc scripts tools for exploration
ANALYTICS TOOLS : WHAT TO EXPECT
richer, more features
to support exploration of complex data
more technical audience
product managers, engineers, data scientists
accuracy
designed for dynamic input
long-term projects
USER ACTIVITY LOGS
UsersUseTwitter
UsersUse
Product Managers
Curious
Twitter
UsersUse
Curious
Engineers
Log data
in Hadoop
Write Twitter
Instrument
Product Managers
WHAT ARE BEING LOGGED?
tweet
activities
WHAT ARE BEING LOGGED?
tweet from home timeline on twitter.com
tweet from search page on iPhone
activities
WHAT ARE BEING LOGGED?
tweet from home timeline on twitter.com
tweet from search page on iPhone
sign up
log in
retweet
etc.
activities
ORGANIZE?
LOG EVENT A.K.A. “CLIENT EVENT”
[Lee et al. 2012]
LOG EVENT A.K.A. “CLIENT EVENT”
client : page : section : component : element : action
web : home : timeline : tweet_box : button : tweet
1) User ID
2) Timestamp
3) Event name
4) Event detail
[Lee et al. 2012]
LOG DATA
UsersUse
Curious
Engineers
Log data
in Hadoop
Twitter
Instrument
Write
Product Managers
bigger than
Tweet data
UsersUse
Curious
Engineers
Log data
in Hadoop
Data Scientists
Ask
Twitter
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log data
in Hadoop
Data Scientists
Find
Ask
Twitter
Instrument
Write
Product Managers
LOG DATA
UsersUse
Curious
Engineers
Log data
in Hadoop
Data Scientists
Find, Clean
Ask
Twitter
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log data
in Hadoop
Data Scientists
Find, Clean
Ask
Monitor
Twitter
Instrument
Write
Product Managers
UsersUse
Curious
Engineers
Log data
in Hadoop
Data Scientists
Find, Clean, Analyze
Ask
Monitor
Twitter
Instrument
Write
Product Managers
Log data
EngineersData Scientists
Usersin Hadoop
Find, Clean, Analyze
Use
Monitor
Ask
Curious
1 2
Twitter
Instrument
Write
Product Managers
Scribe Radar
Project / Find & Monitor client events
Log data
in Hadoop
Engineers & Data Scientists
billions of rows
Log data
in Hadoop
Aggregate
Client events count
Engineers & Data Scientists
Log data
in Hadoop
Aggregate
Find
client page section component element action
Search
Client events count
Engineers & Data Scientists
Log data
in Hadoop
Aggregate
Find
client page section component element action
Search
Client events count
Engineers & Data Scientists
SECTION?
COMPONENT?
ELEMENT?
client page section component element action
Search
Find
Log data
in Hadoop
Aggregate
web home * * impression*
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
web home * * impression*
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
10,000+ event types
search can be better
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
one graph / event
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
search can be better
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
one graph / event
x 10,000
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
search can be better
GOALS
Search for client events
Explore client event collection
Monitor changes
DESIGN
Client event collection
Engineers & Data Scientists
See
Client event collection
Engineers & Data Scientists
See
Client event collection
Engineers & Data Scientists
narrow down
Interactions
search box => filter
See
HOW TO VISUALIZE?
narrow down
Client event collection
Engineers & Data Scientists
Interactions
search box => filter
See
Client event collection
Engineers & Data Scientists
client : page : section : component : element : action
HOW TO VISUALIZE?
narrow down
Interactions
search box => filter
CLIENT EVENT HIERARCHY
iphone home -
- - impression
tweet tweet click
iphone:home:-:-:-:impression
iphone:home:-:tweet:tweet:click
DETECT CHANGES
iphone home -
- - impression
tweet tweet click
iphone home -
- - impression
tweet tweet click
TODAY
7 DAYS AGO
compared to
CALCULATE CHANGES
+5% +5% +5%
+10% +10% +10%
-5% -5% -5%
DIFF
DISPLAY CHANGES
iphone home -
- - impression
tweet tweet click
Map of the Market [Wattenberg 1999], StemView [Guerra-Gomez et al. 2013]
DISPLAY CHANGES
home -
- - impression
tweet tweet click
iphone
Demo Demo Demo
Demo / Scribe Radar
Twitter for Banana
Flying Sessions
Project / Funnel Analysis
COUNT PAGE VISITS
banana : home : - : - : - : impression
home page
FUNNEL
home page
profile page
FUNNEL ANALYSIS
1 jobhome page
profile page
1 hourbanana : home : - : - : - : impression
banana : profile : - : - : - : impression
FUNNEL ANALYSIS
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
2 jobs
2 hours
FUNNEL ANALYSIS
banana : home : - : - : - : impression
banana : profile : - : - : - : impression banana : search : - : - : - : impression
home page
profile page search page
Specify all funnels manually!
n jobs
Time to find a
new job
GOAL
banana : home : - : - : - : impression
… ……
1 job => all funnels, visualized
home page
USER SESSIONS
Session#1
A
B
end
Session#4
Start
end
A
Session#2
B
end
A
Session#3
C
end
A
StartStartStart
AGGREGATE
A
BB C
Start
end endend
A A
end
A
4 sessions
AGGREGATE
A
BB C
Start
end endend
end
4 sessions
AGGREGATE
C
Start
end endend
end
A
B
4 sessions
AGGREGATE
C
Start
end endend
end
A
B
4 sessions
AGGREGATE
C
Start
end endend
A
B end
4 sessions
AGGREGATE
C
Start
endend
A
B end
4 sessions
AGGREGATE
C
Start
endend
A
B end
4 sessions
AGGREGATE
Start
endend
A
CB end
4 sessions
AGGREGATE
endend
A
CB end
Start
4,000,000 sessions
(~millions sessions, 10,000+ event types)
TRY WITH SAMPLE DATA
FAIL…
Keep trying to make it work
EXPECT TRIALS AND ERRORS
Read the details in
Krist Wongsuphasawat and Jimmy Lin.
“Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging Infrastructure at Twitter “
Proc. IEEE Conference on Visual Analytics Science and Technology (VAST) 2014
HOW TO MAKE IT WORK?
Demo Demo Demo
Demo / Flying Sessions
WORKFLOW
Requested / Identify needs
Design & Prototype
Make it work for sample dataset
Refine & Generalize
Productionize
Document & Release
Maintain & Support
Keep it running, Feature requests & Bugs fix
TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
https://medium.com/@kristw/designing-the-game-of-tweets-7f87c30dc5a2
Project / Game of Tweets
EXPECT HARDWARE COMPLICATIONS
INPUT
(DATA)
=YOU+ OUTPUT
(VIS)
+
Get data
& Wrangle
1
+
Analyze
& Visualize
2
INPUT
(DATA)
=YOU+ OUTPUT
(VIS)
+
Get data
& Wrangle
1
+
Analyze
& Visualize
2
EXPECT TO IMPROVE
HOW TO BE BETTER?
Time is limited.

Grow the team
Expand skills
Improve tooling
Solve a problem once and for all
Automate repetitive tasks
http://twitter.github.io/labella.js
Demo / Labella.js
https://github.com/twitter/d3kit
Demo / d3Kit
http://www.slideshare.net/kristw/d3kit
yeoman.io
Demo / Yeoman
TO SUM UP
INPUT
(DATA)
=YOU+ OUTPUT
(VIS)
+
Get data
& Wrangle
1
+
Analyze
& Visualize
2
TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
TAKE-AWAY
Getting data and data wrangling are time-consuming.
Different projects, different requirements
Storytelling, Product insights, Art, etc.
Combine visualization with other skills
HCI, Design, Stats, ML, etc.
Expect the unexpected
Learn and improve
do more with less time
grow the team, expand skills, improve tooling
Krist Wongsuphasawat / @kristw
kristw.yellowpigz.com
Nicolas Garcia Belmonte, Robert Harris, Miguel Rios,
Simon Rogers, Jimmy Lin, Linus Lee, Chuang Liu,
and many colleagues at Twitter.
Lastly, to my wife for taking care of our 3 months old baby, so I had time to prepare these slides.
ACKNOWLEDGEMENT
RESOURCES
Images
Banana phone http://goo.gl/GmcMPq
Bar chart https://goo.gl/1G1GBg
Boss https://goo.gl/gcY8Kw
Champions League http://goo.gl/DjtNKE
Database http://goo.gl/5N7zZz
Fishing shark http://goo.gl/2fp4zW
Globe visualization http://goo.gl/UiGMMj
Harry Potter http://goo.gl/Q9Cy64
Holding phone http://goo.gl/It2TzH
Kiwi orange http://goo.gl/ejQ73y
Kiwi http://goo.gl/9yk7o5
Library https://goo.gl/HVeE6h
Library earthquake http://goo.gl/rBqBrs
Minion http://goo.gl/I19Ijg
NBA http://goo.gl/p7HBdG
NFL http://goo.gl/feQMZs
Orange & Apple http://goo.gl/NG6RIL
Pile of paper http://goo.gl/mGLQTx
Premier League http://goo.gl/AqIINO
Scrooge McDuck https://goo.gl/aKv8D7
The Sound of Music https://goo.gl/dqHlzj
Trash pile http://goo.gl/OsFfo3
Tyrion http://goo.gl/WaBonl
Watercolor Map by Stamen Design
THANK YOU

What to expect when you are visualizing