SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
In this talk, I reflect on the tasks commonly involved in crafting visualizations and show examples of different applications of information/data visualization. Along this ride I will share my workflow, point out the common pitfalls and provide recommendations.
These slides were from my guest lecture in InfoVis class at UC Berkeley iSchool on Apr 11, 2016. Thank you Prof. Marti Hearst for inviting.
In this talk, I reflect on the tasks commonly involved in crafting visualizations and show examples of different applications of information/data visualization. Along this ride I will share my workflow, point out the common pitfalls and provide recommendations.
These slides were from my guest lecture in InfoVis class at UC Berkeley iSchool on Apr 11, 2016. Thank you Prof. Marti Hearst for inviting.
1.
WHAT TO EXPECT
WHEN YOU ARE
VISUALIZING
Krist Wongsuphasawat / @kristw
Based on true stories
Forever querying
Never-ending cleaning
Hopelessly prototyping
Last minute coding
and many more…
2.
Computer Engineer
Bangkok, Thailand
PhD in Computer Science
Information Visualization
Univ. of Maryland
IBM
Microsoft
Data Visualization Scientist
Twitter
Krist Wongsuphasawat / @kristw
18.
DATA SOURCES
Open data
Publicly available
Internal data
Private, owned by clients’ organization
Self-collected data
Manual, site scraping, etc.
Combine the above
19.
MANY FORMS OF DATA
Standalone files
txt, csv, tsv, json, Google Docs, …, pdf*
APIs
better quality with more overhead
Databases
doesn’t necessary mean they are organized
Big data
bigger pain
21.
How people think I feel. How I really feel.
HAVING ALL TWEETS
22.
CHALLENGES
Get relevant Tweets
hashtag: #oscars
keywords: “spotlight” (movie name)
Too big
Need to aggregate & reduce size
Slow
Long processing time (hours)
29.
DATA WRANGLING
Clean
A clean dataset? Joking, right?
Filter
Less is more
Parse, Format, Correct, etc.
Change country code from 3-letter to 2-letter
Correct time of day based on users’ timezone
etc.
30.
EXPECT A LOT OF TIME
WITH DATA WRANGLING
70-80% of time
“Data Janitor”
31.
RECOMMENDATIONS
Always think that you will have to do it again
document the process, automation
Reusable scripts
break a gigantic do-it-all function into smaller ones
Reusable data
keep for future project
34.
TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
35.
TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
36.
So many things
we could learn
from Twitter data
38.
STORYTELLING : WHAT TO EXPECT
timely
Deadline is strict. Also can be unexpected events.
wide audience
easy to explain and understand, multi-device support
one-off projects
content screening
39.
WHO/WHAT
STORYTELLING
WHERE WHEN
location time
user/content
40.
WHO/WHAT
STORYTELLING
WHERE WHEN
location time
user/content
87.
TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
88.
Data sources
Output
explore
analyze
present
get
*
*
89.
Data sources
Output
explore
analyze
present
get
*
*
ad-hoc scripts
90.
Data sources
Output
explore
analyze
present
get
*
*
ad-hoc scripts tools for exploration
91.
ANALYTICS TOOLS : WHAT TO EXPECT
richer, more features
to support exploration of complex data
more technical audience
product managers, engineers, data scientists
accuracy
designed for dynamic input
long-term projects
117.
client page section component element action
Search
Find
Log data
in Hadoop
Aggregate
web home * * impression*
Client events count
Engineers & Data Scientists
118.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
web home * * impression*
Client events count
Engineers & Data Scientists
119.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
Client events count
Engineers & Data Scientists
120.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
10,000+ event types
search can be better
Client events count
Engineers & Data Scientists
121.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
web : home : wtf : - : - : impression
Aggregate
search can be better
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
122.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
one graph / event
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
search can be better
123.
client page section component element action
Search
Find
Query
Return
Log data
in Hadoop
Results
web : home : home : - : - : impression
Aggregate
one graph / event
x 10,000
10,000+ event types
not everybody knows
What are all sections under web:home?
Client events count
Engineers & Data Scientists
search can be better
126.
Client event collection
Engineers & Data Scientists
127.
See
Client event collection
Engineers & Data Scientists
128.
See
Client event collection
Engineers & Data Scientists
narrow down
Interactions
search box => filter
129.
See
HOW TO VISUALIZE?
narrow down
Client event collection
Engineers & Data Scientists
Interactions
search box => filter
130.
See
Client event collection
Engineers & Data Scientists
client : page : section : component : element : action
HOW TO VISUALIZE?
narrow down
Interactions
search box => filter
157.
Keep trying to make it work
EXPECT TRIALS AND ERRORS
158.
Read the details in
Krist Wongsuphasawat and Jimmy Lin.
“Using Visualizations to Monitor Changes and Harvest Insights from a Global-Scale Logging Infrastructure at Twitter “
Proc. IEEE Conference on Visual Analytics Science and Technology (VAST) 2014
HOW TO MAKE IT WORK?
160.
WORKFLOW
Requested / Identify needs
Design & Prototype
Make it work for sample dataset
Refine & Generalize
Productionize
Document & Release
Maintain & Support
Keep it running, Feature requests & Bugs fix
161.
TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
162.
https://medium.com/@kristw/designing-the-game-of-tweets-7f87c30dc5a2
Project / Game of Tweets
172.
INPUT
(DATA)
=YOU+ OUTPUT
(VIS)
+
Get data
& Wrangle
1
+
Analyze
& Visualize
2
173.
TYPE OF PROJECTS
Explanatory Exploratory
Storytelling Analytics Tools Inspirations
x x
PMs, Data ScientistsGeneral Public General Public
Understand
product usage
See what data
can tell us
Get inspired
174.
TAKE-AWAY
Getting data and data wrangling are time-consuming.
Different projects, different requirements
Storytelling, Product insights, Art, etc.
Combine visualization with other skills
HCI, Design, Stats, ML, etc.
Expect the unexpected
Learn and improve
do more with less time
grow the team, expand skills, improve tooling
Krist Wongsuphasawat / @kristw
kristw.yellowpigz.com
175.
Nicolas Garcia Belmonte, Robert Harris, Miguel Rios,
Simon Rogers, Jimmy Lin, Linus Lee, Chuang Liu,
and many colleagues at Twitter.
Lastly, to my wife for taking care of our 3 months old baby, so I had time to prepare these slides.
ACKNOWLEDGEMENT
176.
RESOURCES
Images
Banana phone http://goo.gl/GmcMPq
Bar chart https://goo.gl/1G1GBg
Boss https://goo.gl/gcY8Kw
Champions League http://goo.gl/DjtNKE
Database http://goo.gl/5N7zZz
Fishing shark http://goo.gl/2fp4zW
Globe visualization http://goo.gl/UiGMMj
Harry Potter http://goo.gl/Q9Cy64
Holding phone http://goo.gl/It2TzH
Kiwi orange http://goo.gl/ejQ73y
Kiwi http://goo.gl/9yk7o5
Library https://goo.gl/HVeE6h
Library earthquake http://goo.gl/rBqBrs
Minion http://goo.gl/I19Ijg
NBA http://goo.gl/p7HBdG
NFL http://goo.gl/feQMZs
Orange & Apple http://goo.gl/NG6RIL
Pile of paper http://goo.gl/mGLQTx
Premier League http://goo.gl/AqIINO
Scrooge McDuck https://goo.gl/aKv8D7
The Sound of Music https://goo.gl/dqHlzj
Trash pile http://goo.gl/OsFfo3
Tyrion http://goo.gl/WaBonl
Watercolor Map by Stamen Design