This document discusses expectations and challenges when visualizing data. The key points are:
1. Expect to find the real need by understanding the audience and goals better than the client. Expect to clean data, which can take a significant amount of time due to multiple sources and formats.
2. Prepare to iterate as the initial visualization may not meet needs or deadlines. Celebrate failures as learning opportunities.
3. Visualization projects include storytelling projects with strict deadlines and analytical tools to support data exploration by technical teams over the long term. The project lifecycle involves identifying needs, prototyping, refining, and maintaining the visualization.
Data - Internet -> Information: Doing effective research over the InternetKinshuk Sunil
A brief presentation for researchers at Center for Civil Society (New Delhi), on how they can use web based tools for effectvie and efficient data collection over the internet on June 10, 2011. Target Audience: Research Interns.
Quora ML Workshop: Sock Puppets and Hoaxes on the WebQuora
Presentation by Srijan Kumar, PhD, computer science researcher at Stanford University, at Quora ML Workshop: Protecting Online Spaces with Applied Machine Learning, on September 27, 2017.
This talk was prepared as a note to my future self when working on future projects. I reflect on the tasks commonly involved in crafting visualizations, point out the common things to expect, pitfalls and provide recommendations. Along the way I include examples of 3 different applications of information/data visualization and details on how each project was started and developed.
These slides were from my guest lecture in InfoVis class at
(1) InfoVis class at UC Berkeley iSchool on Feb 27, 2017. Thank you Prof. Marti Hearst for the invitation.
(2) DataVis class at GATech on Apr 5, 2017. Thank you Prof. Rahul C. Basole for the invitation.
In this talk, I reflect on the tasks commonly involved in crafting visualizations and show examples of different applications of information/data visualization. Along this ride I will share my workflow, point out the common pitfalls and provide recommendations.
These slides were from my guest lecture in InfoVis class at UC Berkeley iSchool on Apr 11, 2016. Thank you Prof. Marti Hearst for inviting.
Data - Internet -> Information: Doing effective research over the InternetKinshuk Sunil
A brief presentation for researchers at Center for Civil Society (New Delhi), on how they can use web based tools for effectvie and efficient data collection over the internet on June 10, 2011. Target Audience: Research Interns.
Quora ML Workshop: Sock Puppets and Hoaxes on the WebQuora
Presentation by Srijan Kumar, PhD, computer science researcher at Stanford University, at Quora ML Workshop: Protecting Online Spaces with Applied Machine Learning, on September 27, 2017.
This talk was prepared as a note to my future self when working on future projects. I reflect on the tasks commonly involved in crafting visualizations, point out the common things to expect, pitfalls and provide recommendations. Along the way I include examples of 3 different applications of information/data visualization and details on how each project was started and developed.
These slides were from my guest lecture in InfoVis class at
(1) InfoVis class at UC Berkeley iSchool on Feb 27, 2017. Thank you Prof. Marti Hearst for the invitation.
(2) DataVis class at GATech on Apr 5, 2017. Thank you Prof. Rahul C. Basole for the invitation.
In this talk, I reflect on the tasks commonly involved in crafting visualizations and show examples of different applications of information/data visualization. Along this ride I will share my workflow, point out the common pitfalls and provide recommendations.
These slides were from my guest lecture in InfoVis class at UC Berkeley iSchool on Apr 11, 2016. Thank you Prof. Marti Hearst for inviting.
A 1015 update to the 2012 "Data Big and Broad" talk - http://www.slideshare.net/jahendler/data-big-and-broad-oxford-2012 - extends coverage, brings more in context of recent "big data" work.
Booz Allen's experts define the science and art of Data Science in the ground breaking The Field Guide to Data Science. The work unlocks the potential data provides in improving every aspect of our lives by explaining how to ask the right questions from data.
What's the Value of Data Science for Organizations: Tips for Invincibility in...Ganes Kesari
This session was delivered as an Open Colloquium on Apr 30th 2020 for the Master in Information program students. It was organized by the Rutgers School of Communication & Information.
The session covers 3 themes:
- How do enterprises and not-for-profit organizations gain value from data science?
- What are the biggest challenges in data science that professionals are unaware of? How can students translate that into learnings, to make themselves indispensable in the industry
- What's the impact of COVID-19 and the recession on data science industry? How will the data jobs be impacted?
You've heard the news, Data Science is the cool new career opportunity sweeping the world. Come learn from Thinkful Mentors all about this new and exciting industry.
Data Science is the competitive advantage of the future for organizations interested in turning their data into a product through analytics. Industries from health, to national security, to finance, to energy can be improved by creating better data analytics through Data Science. The winners and the losers in the emerging data economy are going to be determined by their Data Science teams.
Booz Allen Hamilton created the Field Guide to Data Science to help organizations and missions understand how to make use of data as a resource. The Second Edition of the Field Guide, updated with new features and content, delivers our latest insights in a fast-changing field. http://bit.ly/1O78U42
The field-guide-to-data-science 2015 (second edition) By Booz | Allen | HamiltonArysha Channa
Foreword: Data science touches aspects of our lives on a daily basis. When we visit the doctor, drive our cars, get on an airplane, or shop for services, Data science is changing the way we interact with and explore our world.
Data Wrangling and the Art of Big Data DiscoveryInside Analysis
The Briefing Room with Dr. Robin Bloor, Trifacta and Zoomdata
Live Webcast March 10, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=dd9fed3c7c476ae3a0f881ae6b53dcc5
Square pegs and round holes don't get along, which is one reason why traditional data management approaches simply won't work for Big Data. The variety and velocity of data types flying at us today require a new strategy for identifying, streamlining and utilizing information assets and processes. Decades-old technology won’t cut it – a combination of new tools and techniques must be used to enable effective discovery of insights in a timely fashion.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why today's data landscape calls for a much different data management approach. He'll be briefed by Trifacta and Zoomdata, who will show how their technologies use a range of functionality – including machine learning – to help companies "wrangle" their data. They'll also demonstrate the optimal step-by-step process of working with new data types.
Visit InsideAnalysis.com for more information.
Today, we have data – lots of it. We can process information – in many ways. And with these two tools and a little bit of creativity, we are discovering the vast depths of human behavior and by extension, a way to accurately predict the future -- and our future happiness. In fact, we can quantify human movement, behaviors, desires, and even moods on a scale that wasn’t possible before a series of advances in processing power, developments in psychology and social network science, and most importantly, access to data.
In advertising, industry, and humanity, we have experienced the evolution from Web 1.0 (informational) to Web 2.0 (platform) to Web 3.0 (semantic) to elements of Web 4.0 (anticipatory) – In this anticipatory era, what can we dream of next? Beyond addressability and increasing ad relevance, how can businesses utilize these advances in product development and other market initiatives? Can we make the leap from inductive logic to human-paralleled intuition? Can this make up for our human brain mechanics that make predicting our own happiness so difficult?
In this talk we’ll cover the evolutions in data access, models for information processing, and the science of collaboration to see not only how they have been leveraged in businesses but also how they are used to better understand human behavior, and hopefully in the near future, a little bit of happiness.
10 Tips to Write an Essay and Actually Enjoy It. Essay Writing Tips That Will Make College a Breeze - LVDletters. Step-By-Step Guide to Essay Writing - ESL Buzz. Simple tips for writing essays in English: these steps will guide you .... How to Write an Essay in 9 Simple Steps 7ESL Essay writing, Essay .... How To Write an Essay - The steps to writing an essay This Instructable .... Essay Writing Examples - 21 Samples in PDF DOC Examples. How To Write Academic Essays Steps By Steps By Experts. Essay writing help by Helpmein Homework - Issuu. Simple Guide to Help You Write an Essay by BreeAndrea - Issuu. How to write an effective essay - Ten top tips for students. Helpful pieces of advice on how to write an effective essay Academic .... writing help description. Help with essay writing - College Homework Help and Online Tutoring.. Tips on How to Write Effective Essay and 7 Major Types in 2021 Types .... How To: Essay Types Essay writing skills, Essay tips, Essay writing tips. Essay assistance. Essay Writing Service: Online Solutions. 2022-11-25. 3 Ways to Write a Concluding Paragraph for a Persuasive Essay. Getting Essay Help. Essay Writing Help for Students by Experts. Guidelines of an Effective Essay Writing amp; Score A-Grade. Help with writing an essay - College Homework Help and Online Tutoring.. Best Essay Writing Service Online - USA, UK, Australia. Different Types of Essays Samples starting from Basic Essay. How To Write An Essay Examples - Ahern Scribble. Essay Writing Help Essay Writers My Online Assignment Help. Assignment Writing: 7 Steps to Complete Academic Papers. Essay Writing Help By UK Professional. Essay writing help uk Custom paper service.. Essay Writing Assignment Why Writing an Essay Is So Hard?. Essay, Essay writing, Essay writing help, Essay writing tips. Essay writing help guideline for students seeking success at school.. Help in essay writing. Help on essay - The Writing Center.. 7 Early Signs that You May Need Essay Writing Help Help On Essay Writing Help On Essay Writing
“Which visualization library should I use?” Typically, making this decision is not about whether one library is “better” than another, but whether the specific library is more suitable for what the developer is trying to achieve.To answer this question thoroughly, we need to better understand the design space of visualization libraries. The talk will give a tour of many kinds of visualization libraries on the web across the design space, while explaining the framework and design philosophy that the audience can learn along the way. The audience will expand their horizon and be more aware of the wide universe of libraries. The next time they come across a new package, they can use this framework as a lens to analyze its own offerings and how it is different from or similar to the libraries that they already know.
Encodable: Configurable Grammar for Visualization ComponentsKrist Wongsuphasawat
There are so many libraries of visualization components nowadays with their APIs often different from one another. Could these components be more similar, both in terms of the APIs and common functionalities? For someone who is developing a new visualization component, how should the API look like? This work drew inspiration from visualization grammar, decoupled the grammar from its rendering engine and adapted it into a configurable grammar for individual components called Encodable. Encodable helps component authors define grammar for their components, and parse encoding specifications from users into utility functions for the implementation.
More Related Content
Similar to 6 things to expect when you are visualizing (2020 Edition)
A 1015 update to the 2012 "Data Big and Broad" talk - http://www.slideshare.net/jahendler/data-big-and-broad-oxford-2012 - extends coverage, brings more in context of recent "big data" work.
Booz Allen's experts define the science and art of Data Science in the ground breaking The Field Guide to Data Science. The work unlocks the potential data provides in improving every aspect of our lives by explaining how to ask the right questions from data.
What's the Value of Data Science for Organizations: Tips for Invincibility in...Ganes Kesari
This session was delivered as an Open Colloquium on Apr 30th 2020 for the Master in Information program students. It was organized by the Rutgers School of Communication & Information.
The session covers 3 themes:
- How do enterprises and not-for-profit organizations gain value from data science?
- What are the biggest challenges in data science that professionals are unaware of? How can students translate that into learnings, to make themselves indispensable in the industry
- What's the impact of COVID-19 and the recession on data science industry? How will the data jobs be impacted?
You've heard the news, Data Science is the cool new career opportunity sweeping the world. Come learn from Thinkful Mentors all about this new and exciting industry.
Data Science is the competitive advantage of the future for organizations interested in turning their data into a product through analytics. Industries from health, to national security, to finance, to energy can be improved by creating better data analytics through Data Science. The winners and the losers in the emerging data economy are going to be determined by their Data Science teams.
Booz Allen Hamilton created the Field Guide to Data Science to help organizations and missions understand how to make use of data as a resource. The Second Edition of the Field Guide, updated with new features and content, delivers our latest insights in a fast-changing field. http://bit.ly/1O78U42
The field-guide-to-data-science 2015 (second edition) By Booz | Allen | HamiltonArysha Channa
Foreword: Data science touches aspects of our lives on a daily basis. When we visit the doctor, drive our cars, get on an airplane, or shop for services, Data science is changing the way we interact with and explore our world.
Data Wrangling and the Art of Big Data DiscoveryInside Analysis
The Briefing Room with Dr. Robin Bloor, Trifacta and Zoomdata
Live Webcast March 10, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=dd9fed3c7c476ae3a0f881ae6b53dcc5
Square pegs and round holes don't get along, which is one reason why traditional data management approaches simply won't work for Big Data. The variety and velocity of data types flying at us today require a new strategy for identifying, streamlining and utilizing information assets and processes. Decades-old technology won’t cut it – a combination of new tools and techniques must be used to enable effective discovery of insights in a timely fashion.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why today's data landscape calls for a much different data management approach. He'll be briefed by Trifacta and Zoomdata, who will show how their technologies use a range of functionality – including machine learning – to help companies "wrangle" their data. They'll also demonstrate the optimal step-by-step process of working with new data types.
Visit InsideAnalysis.com for more information.
Today, we have data – lots of it. We can process information – in many ways. And with these two tools and a little bit of creativity, we are discovering the vast depths of human behavior and by extension, a way to accurately predict the future -- and our future happiness. In fact, we can quantify human movement, behaviors, desires, and even moods on a scale that wasn’t possible before a series of advances in processing power, developments in psychology and social network science, and most importantly, access to data.
In advertising, industry, and humanity, we have experienced the evolution from Web 1.0 (informational) to Web 2.0 (platform) to Web 3.0 (semantic) to elements of Web 4.0 (anticipatory) – In this anticipatory era, what can we dream of next? Beyond addressability and increasing ad relevance, how can businesses utilize these advances in product development and other market initiatives? Can we make the leap from inductive logic to human-paralleled intuition? Can this make up for our human brain mechanics that make predicting our own happiness so difficult?
In this talk we’ll cover the evolutions in data access, models for information processing, and the science of collaboration to see not only how they have been leveraged in businesses but also how they are used to better understand human behavior, and hopefully in the near future, a little bit of happiness.
10 Tips to Write an Essay and Actually Enjoy It. Essay Writing Tips That Will Make College a Breeze - LVDletters. Step-By-Step Guide to Essay Writing - ESL Buzz. Simple tips for writing essays in English: these steps will guide you .... How to Write an Essay in 9 Simple Steps 7ESL Essay writing, Essay .... How To Write an Essay - The steps to writing an essay This Instructable .... Essay Writing Examples - 21 Samples in PDF DOC Examples. How To Write Academic Essays Steps By Steps By Experts. Essay writing help by Helpmein Homework - Issuu. Simple Guide to Help You Write an Essay by BreeAndrea - Issuu. How to write an effective essay - Ten top tips for students. Helpful pieces of advice on how to write an effective essay Academic .... writing help description. Help with essay writing - College Homework Help and Online Tutoring.. Tips on How to Write Effective Essay and 7 Major Types in 2021 Types .... How To: Essay Types Essay writing skills, Essay tips, Essay writing tips. Essay assistance. Essay Writing Service: Online Solutions. 2022-11-25. 3 Ways to Write a Concluding Paragraph for a Persuasive Essay. Getting Essay Help. Essay Writing Help for Students by Experts. Guidelines of an Effective Essay Writing amp; Score A-Grade. Help with writing an essay - College Homework Help and Online Tutoring.. Best Essay Writing Service Online - USA, UK, Australia. Different Types of Essays Samples starting from Basic Essay. How To Write An Essay Examples - Ahern Scribble. Essay Writing Help Essay Writers My Online Assignment Help. Assignment Writing: 7 Steps to Complete Academic Papers. Essay Writing Help By UK Professional. Essay writing help uk Custom paper service.. Essay Writing Assignment Why Writing an Essay Is So Hard?. Essay, Essay writing, Essay writing help, Essay writing tips. Essay writing help guideline for students seeking success at school.. Help in essay writing. Help on essay - The Writing Center.. 7 Early Signs that You May Need Essay Writing Help Help On Essay Writing Help On Essay Writing
“Which visualization library should I use?” Typically, making this decision is not about whether one library is “better” than another, but whether the specific library is more suitable for what the developer is trying to achieve.To answer this question thoroughly, we need to better understand the design space of visualization libraries. The talk will give a tour of many kinds of visualization libraries on the web across the design space, while explaining the framework and design philosophy that the audience can learn along the way. The audience will expand their horizon and be more aware of the wide universe of libraries. The next time they come across a new package, they can use this framework as a lens to analyze its own offerings and how it is different from or similar to the libraries that they already know.
Encodable: Configurable Grammar for Visualization ComponentsKrist Wongsuphasawat
There are so many libraries of visualization components nowadays with their APIs often different from one another. Could these components be more similar, both in terms of the APIs and common functionalities? For someone who is developing a new visualization component, how should the API look like? This work drew inspiration from visualization grammar, decoupled the grammar from its rendering engine and adapted it into a configurable grammar for individual components called Encodable. Encodable helps component authors define grammar for their components, and parse encoding specifications from users into utility functions for the implementation.
Slides from the VIS in practice panel "Increasing the Impact of Visualization Research" during IEEE VIS 2017 in Phoenix, AZ. http://www.visinpractice.rwth-aachen.de/panel.html
Reveal the talking points of every episode of Game of Thrones from fans' conv...Krist Wongsuphasawat
You may not be sure how Lord Varys collects information from his little birds, but in this talk you will hear how we can collect information from our little birds.
@kristw shares a behind-the-scenes view of his latest data visualization project, which shows how each #GameOfThrones episode was discussed on Twitter. Using data visualization, we can extract and reveal the stories of every episode from fans’ Tweets.
https://interactive.twitter.com/game-of-thrones
These slides are from a talk given at Bay Area d3 User Group meetup on June 9, 2016.
http://www.meetup.com/Bay-Area-d3-User-Group/events/231281298
Adventure in Data: A tour of visualization projects at TwitterKrist Wongsuphasawat
Guest lecture at Prof. David Gotz's UNC Chapel Hill INLS 690 Visual Analytics class (Given remotely) on Nov 10, 2015.
Many demos can also be accessed from interactive.twitter.com and kristw.yellowpigz.com
d3Kit is a set of tools to speed D3 related project development. It is a lightweight library to help you do the basic groundwork tasks you need when building visualization with d3.
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Krist Wongsuphasawat
Slides from my talk at the IEEE Conference on Visual Analytics Science and Technology (VAST) 2014 in Paris, France.
ABSTRACT
Logging user activities is essential to data analysis for internet products and services.
Twitter has built a unified logging infrastructure that captures user activities across all clients it owns, making it one of the largest datasets in the organization.
This paper describes challenges and opportunities in applying information visualization to log analysis at this massive scale, and shows how various visualization techniques can be adapted to help data scientists extract insights.
In particular, we focus on two scenarios:\ (1) monitoring and exploring a large collection of log events, and (2) performing visual funnel analysis on log data with tens of thousands of event types.
Two interactive visualizations were developed for these purposes:
we discuss design choices and the implementation of these systems, along with case studies of how they are being used in day-to-day operations at Twitter.
Making Sense of Millions of Thoughts: Finding Patterns in the TweetsKrist Wongsuphasawat
I gave this presentation at Workshop on Interactive Language Learning, Visualization, and Interfaces / ACL 2014 in Baltimore, MD on June 27, 2014.
http://nlp.stanford.edu/events/illvi2014/index.html
ABSTRACT
Everyday on Twitter, there are millions of thoughts that are captured and shared to the world in the form of 140-character messages, or Tweets. There are many things we could learn from these thoughts if we could figure out a way to digest this gigantic dataset. Visualization is one of the many ways to extract information from these Tweets. In this presentation, I will talk about several visualizations based on Tweets, as well as share experiences and challenges from working with Tweet data.
A talk at Data Visualization Summit 2014 in Santa Clara, CA
ABSTRACT: What is the thought process that transforms data into visualizations? In this presentation, I will talk about guidelines that will help you when starting with raw data, walk through standard techniques, and also discuss things to keep in mind when making design decisions.
My talk at the Data Visualization Summit in San Francisco April 11, 2013
http://theinnovationenterprise.com/summits/data-visualization-sf
----------------
Abstract
----------------
Many aspects of our lives can be captured and described as series of events, or event sequences. These event sequences can be keys to understanding many things: medical services, logistics, sports, user behavior, etc. In this presentation, I will talk about techniques for visualizing event sequences, from simple to advance, and also show examples that demonstrate the power of visualizations in exploring and understanding event sequences.
Outflow: Exploring Flow, Factors and Outcome of Temporal Event SequencesKrist Wongsuphasawat
My presentation at IEEE VisWeek 2012 in Seattle, WA
//// Abstract:
Event sequence data is common in many domains, ranging from electronic medical records (EMRs) to sports events. Moreover, such sequences often result in measurable outcomes (e.g., life or death, win or loss). Collections of event sequences can be aggregated together to form event progression pathways. These pathways can then be connected with outcomes to model how alternative chains of events may lead to different results. This paper describes the Outflow visualization technique, designed to (1) aggregate multiple event sequences, (2) display the aggregate pathways through different event states with timing and cardinality, (3) summarize the pathways’ corresponding outcomes, and (4) allow users to explore external factors that correlate with specific pathway state transitions. Results from a user study with twelve participants show that users were able to learn how to use Outflow easily with limited training and perform a range of tasks both accurately and rapidly.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
6. (P.S. These are actually not my robots, but our competitors’.)
Krist Wongsuphasawat / @kristw
Computer Engineer
Bangkok, Thailand
7. Krist Wongsuphasawat / @kristw
Computer Engineer
Bangkok, Thailand
PhD in Computer Science
Information Visualization
Univ. of Maryland
8. Krist Wongsuphasawat / @kristw
Computer Engineer
Bangkok, Thailand
IBM
Microsoft
PhD in Computer Science
Information Visualization
Univ. of Maryland
9. PhD in Computer Science
Information Visualization
Univ. of Maryland
IBM
Krist Wongsuphasawat / @kristw
Computer Engineer
Bangkok, Thailand
Data Scientist
Analytics, Experiment
Twitter
Microsoft
10. PhD in Computer Science
Information Visualization
Univ. of Maryland
IBM
Krist Wongsuphasawat / @kristw
Computer Engineer
Bangkok, Thailand
Engineering Manager
Data Experience
Airbnb
Microsoft
Twitter
24. GOALS
Present data
Communicate information effectively
Analyze data
Exploratory data analysis
Tools to analyze data
Reusable tools for exploration
Enjoy
Combination of above
25. GOALS
Present data
Communicate information effectively
Analyze data
Exploratory data analysis
Tools to analyze data
Reusable tools for exploration
Enjoy
Combination of above
Who are the audience?
What do you want to tell?
What are the questions?
Who will use this?
What would they use this for?
Who are the audience?
35. DATA SOURCES
Open data
Publicly available
Internal data
Private, owned by clients’ organization
Self-collected data
Manual, site scraping, etc.
Combine the above
36. DATA FORMAT
Standalone files
txt, csv, tsv, json, Google Docs, …, pdf*
Databases
doesn’t necessary mean they are organized
API
better quality with more overhead
Website
Big data*
43. IS THIS CLEAN?
USER RESTAURANT RATING
========================
A MCDONALD’S 3
B MCDONALDS 3
C MCDONALD 4
D MCDONALDS 5
E IHOP 4
F SUBWAY 4
44. IS THIS CLEAN?
USER RESTAURANT RATING
========================
A MCDONALD’S 3
B MCDONALDS 3
C MCDONALD 4
D MCDONALDS 5
E IHOP 4
F SUBWAY 4
How many reviews are there?
Clean.
How many restaurants are there?
Not clean.
McDonald, McDonald’s, McDonalds
51. Hadoop Cluster
Scalding (slow)
Data Storage
Tool
Final dataset
Tool node.js / python / excel (fast)
Your laptop
GETTING BIG DATA
Smaller dataset
52. CHALLENGES
Slow
Long processing time (hours)
Get relevant Tweets
hashtag: #oscars
keywords: “parasite” (movie name)
Too big
Need to aggregate & reduce size
Harder to spot problems
53. CHALLENGES
Slow
Long processing time (hours)
Get relevant Tweets
hashtag: #oscars
keywords: “parasite” (movie name)
Too big
Need to aggregate & reduce size
Harder to spot problems
57. RECOMMENDATIONS
Always think that you will have to do it again
document the process, automation
Reusable scripts
break a gigantic do-it-all function into smaller ones
Reusable data
keep for future project
61. TIPS
Don’t give up.
If stuck, look for inspirations.
The vis that gives you insights may or may not be the best vis for sharing.
Exploration vs. Communication
Keep it as simple as possible
but not simpler.
64. TIPS
Don’t give up.
If stuck, look for inspirations.
The vis that gives you insights may or may not be the best vis for sharing.
Exploration vs. Communication
Keep it as simple as possible
but not simpler.
Set milestones and deadline.
66. STORYTELLING PROJECTS
timely
Deadline is strict. Also can be unexpected events.
wide audience
easy to explain and understand, multi-device support
one-off project
scope
analyze data to find stories and find best way to present them
85. While humans are busy killing each other,
ice zombies “White walkers” are invading from the North.
The only group who seems to care about this
is neutral group called the Night’s Watch.
86. HBO’s Game of Thrones
Based on a book series “A Song of Ice and Fire”
Medieval Fantasy. Knights, magic and dragons.
Many characters.
Anybody can die.
8 seasons
Multiple storylines in each episode
94. Sample data
Character Count
Hodor 10000
Jon Snow 5000
Daenerys 4000
Bran Stark 3000
… …
*These numbers are made up for presentation, not real data.
96. + episodes
The Guardian & Google Trends
http://www.theguardian.com/news/datablog/ng-interactive/2016/apr/22/game-of-thrones-the-most-googled-characters-episode-by-episode
101. Sample data
Character Count
Jon Snow+Sansa 1000
Tormund+Brienne 500
Bran Stark+Hodor 300
… …
Character Count
Hodor 10000
Jon Snow 5000
Daenerys 4000
… …
INDIVIDUALS CONNECTIONS
+ top emojis + top emojis
*These numbers are made up for presentation, not real data.
102. Graph
NODES LINKS
+ top emojis + top emojis
Character Count
Jon Snow+Sansa 1000
Tormund+Brienne 500
Bran Stark+Hodor 300
… …
Character Count
Hodor 1000
Jon Snow 500
Daenerys 400
… …
*These numbers are made up for presentation, not real data.
119. Colors
Default: D3 category10
Distinct but nothing about the context
Custom palette
Colors related to the groups/houses.
Black = Night’s Watch
Blue = North
Red = Daenerys
Gold = Lannister
…
135. VISUAL ANALYTICS TOOL PROJECTS
richer, more features
to support exploration of complex data
more technical audience
product managers, engineers, data scientists
accuracy
designed for dynamic input
long-term projects
136. PROJECT LIFECYCLE
Identify needs
Design and prototype
Make it work for sample dataset
Refine, generalize and productionize
Make it work for other cases
Document and release
Maintain and support
Keep it running, Feature requests & Bugs fix
169. “The first 90% of the code
accounts for the first 90% of the development time.
The remaining 10% of the code
accounts for the other 90% of the development time.”
— Tom Cargill, Bell Labs
170. REFINE & POLISH
UX / UI + Mobile Support
Color
Animation / Transition
Metadata for SEO
Social media preview images
Performance
Loading time, Data file size
“The little of visualisation design” by Andy Kirk
http://www.visualisingdata.com/2016/03/little-visualisation-design/
177. THE ORIGIN
From a paper “Interim pre-
pandemic planning guidance:
community strategy for pandemic
influenza mitigation in the United
States: early, targeted, layered use
of nonpharmaceutical
interventions”
published in 2007 by the CDC
https://stacks.cdc.gov/view/cdc/11425
178. REVIVAL
Rosamund Pearce, a data journalist
at The Economist, rebuild it for a
piece about COVID-19.
Changed the labeling scheme to
assist colorblind readers.
https://www.economist.com/briefing/2020/02/29/covid-19-is-now-in-50-countries-and-things-will-get-worse
179. THE LINE
Drew Harris, an assistant
professor at the Thomas
Jefferson University, came across
the graphic in The Economist.
He recalled using it a decade
earlier as a pandemic
preparedness trainer.
So he added the dotted line
“healthcare system capacity”
https://www.nytimes.com/article/flatten-curve-coronavirus.html
188. HOW TO BE BETTER?
Retrospective
What could have been better?
Wishlist
Expand skillset
Learning opportunities
Get help
Grow the team
Improve tooling
Solve a problem once and for all
Automate repetitive tasks
200. 6 STEPS
1.
2.
3.
4.
5.
6.
Krist Wongsuphasawat / @kristw
kristw.yellowpigz.com
Expect to find the real need
Expect to clean data a lot
Prepare to iterate
Reserve time for refinement
Plan for feedback
Look back for improvement
201. My former and current colleagues at Twitter and Airbnb
for their collaboration and support in these projects;
and my wife for taking care of our two kids
while I make these slides.
ACKNOWLEDGEMENT