SlideShare a Scribd company logo
A Methodic Approach to
Good Data Visualization
Luca Candela - @luckymethod
Luca Candela
DataPad Inc. // UX Eye // @luckymethod
Men of great rank, or active business, can only
pay attention to particulars of use […] it is hoped
that with the assistance of these Charts,
information will be got, without the fatigue and
trouble of studying the particulars [...]
William Playfair - Commercial and Political Atlas, 1786
Data visualization is the art of
*reducing information in a data set while
preserving the knowledge contained in it.
*we can talk about what “reducing information” means in this case...
Data Preparation Data Visualization
Discovery of
knowledge
Conceptual data analysis workflow
Hadley Wickham popularized a concept called
split-apply-combine
as a way of thinking about data querying.
http://www.jstatsoft.org/v40/i01/paper
For the four most revenue generating
countries, what are the top three most
revenue generating categories?
Country Venue Type Sum Revenue
United States Fast Food $16
Street $10
Restaurant $9
France Cafe $18
Pub $12
Restaurant $2
Canada Cafe $10
Fast Food $4
Street $3
Japan Street $5
Fast Food $4
Pub $1
apply: Sum Revenue
Canada
United States
Germany
France
Japan
split by country
combine: sort descending by
Sum Revenue, limit 4
Country Sum Revenue
United States
France
Canada
Japan
$ 83
$ 42
$ 36
$ 18
data
Sum Revenue =
$ 36
Sum Revenue =
$ 83
Sum Revenue =
$ 8
Sum Revenue =
$ 42
Sum Revenue =
$ 18
The basics of split-apply-combine
Canada
United States
Germany
France
Japan
data
bus stop
fastfood
park
...
restaurant
hair saloon
pub
...
restaurant
street
cafe
...
park
pub
street
Country Sum Revenue
United States
France
Canada
Japan
$ 16
$ 10
$ 9
$ 18
$ 12
$ 2
$ 10
$ 4
$ 3
$ 5
$ 4
$ 1
Venue type
fastfood
street
restaurant
cafe
pub
restaurant
cafe
fastfood
park
street
fastfood
pub
...
The basics of split-apply-combine
Country Sum Revenue
United States
France
Canada
Japan
split by country,
combine by sorting
desc. on Sum
Revenue,
map to the vertical
axis using an ordinal
scale.
add labels
apply: sum revenue,
call it Sum Revenue,
plot rectangles and map
length to the horizontal
axis using a linear scale,
Color with #45808E.
Use `Country` as label
Split-apply-combine thinking translates to visualizations
1. split on state
apply sum population
combine: sort desc. by population; limit 6
Nested split-apply-combine underpins more complex visualizations
2. split on age (bin by 5 year)
combine: sort by age
apply sum population
Data Visualization can be thought as a
visual mapping function applied
during the *Apply and Combine steps.
*although it can be thought as applied exclusively during the combine step…
Name Operation Lines
Vadim Added 100
Luca Removed 34
Vadim Added 65
Vadim Removed 5
Luca Added 24
Vadim Removed 71
Luca Removed 45
Vadim Added 7
... ... ...
-960
LucaVadim
1531
-321
739
0
1k
2k
-2k
-1k
“plot”
AdditionsDeletions
Reduce information, preserve knowledge...
Question: Mapping of what, to what?
Types of data
ID Timestamp Location Name Operation Lines Pass Test?
0000001 11-05-2013 10.45 am San Francisco Vadim Added 100 Yes
0000002 11-05-2013 11.12 am San Bruno Luca Removed 34 Yes
0000003 11-05-2013 11.30 am San Francisco Vadim Added 65 Yes
0000004 11-05-2013 11.34 am San Francisco Vadim Removed 5 Yes
0000005 11-05-2013 11.43 am San Bruno Luca Added 24 No
0000006 11-05-2013 11.45 am San Francisco Vadim Removed 71 Yes
0000007 11-05-2013 12.51 pm San Francisco Luca Removed 45 Yes
0000008 11-05-2013 12.55 pm San Francisco Vadim Added 7 No
... ... ... ... ... ... ...
Categorical # Discrete
# Continuous# Discrete
Boolean
There are other ways to classify data,
but this one will get you very far.
pick up a good statistics book and just start reading...
Types of variables
1. Independent
a. a variable that isn't changed by the other
variables you are trying to measure. It
usually goes on the x axis.
2. Dependent
a. It is a variable that changes depending on
other variable(s). It usually goes on the y
axis.
-960
LucaVadim
1531
-321
739
0
1k
2k
-2k
-1k
AdditionsDeletions
Dependent Variable
Independent Variable
Variables of a visualization
1. Position (x,y)
2. Size (big, small…)
3. Value (bright, dark…)
4. Texture (hatched, dotted…)
5. Color (blue, red…)
6. Orientation (degree)
7. Shape (triangle, circle…)
y
x
# Discrete # Continuous Categorical Boolean
y
x
y
x
y
x
y
x
Optimal mappings by type
-960
LucaVadim
1531
-321
739
0
1k
2k
-2k
-1k
AddedRemoved
Name Operation Lines
Vadim Added 100
Luca Removed 34
Vadim Added 65
Vadim Removed 5
Luca Added 24
Vadim Removed 71
Luca Removed 45
Vadim Added 7
... ... ...
Split on Name
Split on Operation
Apply Sum(Added)
Apply Sum(Removed)
Combine -Removed map to
Red, value to size
Combine Added map to
Green, value to size
Combine Name map to x axis
Apply the minimum number of mappings
that illustrates the underlying question
you are trying to answer.
Choosing the right viz...
1. Label your axes
2. Include measurement units
3. Explain your encodings (add a legend)
4. Remove redundant information
5. Don’t fuck with distort the axis, especially with time series
Golden rules - Part 1
Golden rules - Part 2
1. If you are trying to visualize rate of change, then do it
2. Remove outliers, but know they are there
3. Tools have their own biases and quirks, know them.
4. The solution to 80% of your problems are bar charts and
histograms
5. Data Tables are visualizations too
...there are thousands of good rules, but the best one is still “keep it simple”
Some examples
this is going to be fun...
Example 1
Simple bar chart Linear scale
Missing bucket (4.8 - 4.9) Missing bucket (4.8 - 4.9)
Example 2
Example 2 - better
No - Human
Yes - Robot
Example 3
Example 4
Example 5
OK, this is comically bad, I was just going for a good collective giggle...
Books you should read
everybody knows about Tufte, so please don’t bring it up
The Semiology of Graphics, 1967
Jaques Bertin
The Elements of Graphing Data, 1985
&
Visualizing Data, 1993
William S. Cleveland
www.datapad.io
Thank you!
for questions, tweet me at @luckymethod

More Related Content

What's hot

SPECIAL PRODUCTS
SPECIAL PRODUCTSSPECIAL PRODUCTS
SPECIAL PRODUCTS
zanedomingo
 
Sparse Binary Zero Sum Games (ACML2014)
Sparse Binary Zero Sum Games (ACML2014)Sparse Binary Zero Sum Games (ACML2014)
Sparse Binary Zero Sum Games (ACML2014)
Jialin LIU
 
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
Kobkrit Viriyayudhakorn
 
Ch 3 rev trashketball exp logs
Ch 3 rev trashketball exp logsCh 3 rev trashketball exp logs
Ch 3 rev trashketball exp logsKristen Fouss
 
SPECIAL PRODUCTS
SPECIAL PRODUCTSSPECIAL PRODUCTS
SPECIAL PRODUCTS
PIA_xx
 
Chess board problem(divide and conquer)
Chess board problem(divide and conquer)Chess board problem(divide and conquer)
Chess board problem(divide and conquer)
RASHIARORA8
 
Second Quarter Group F Math Peta - Special Products (Sq. of Bi, Sq. of Tri, S...
Second Quarter Group F Math Peta - Special Products (Sq. of Bi, Sq. of Tri, S...Second Quarter Group F Math Peta - Special Products (Sq. of Bi, Sq. of Tri, S...
Second Quarter Group F Math Peta - Special Products (Sq. of Bi, Sq. of Tri, S...
GroupFMathPeta
 
8.4 mixed.ppt worked
8.4 mixed.ppt worked8.4 mixed.ppt worked
8.4 mixed.ppt worked
Jonna Ramsey
 
X factoring revised
X factoring revisedX factoring revised
X factoring revisedsgriffin01
 
Comuter graphics dda algorithm
Comuter graphics dda algorithm Comuter graphics dda algorithm
Comuter graphics dda algorithm
Rachana Marathe
 
Logic zoo ws 2013
Logic zoo ws 2013Logic zoo ws 2013
Logic zoo ws 2013dgbjdjg
 
Multiplication 3
Multiplication 3Multiplication 3
Multiplication 3Abha Arora
 
Mat0024 l8-16-sections 11-6-7
Mat0024 l8-16-sections 11-6-7Mat0024 l8-16-sections 11-6-7
Mat0024 l8-16-sections 11-6-7jheggo10
 
8th pre alg -l36--nov26
8th pre alg -l36--nov268th pre alg -l36--nov26
8th pre alg -l36--nov26jdurst65
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
Tokyo Tech (Tokyo Institute of Technology)
 
7th pre alg -l36--dec7
7th pre alg -l36--dec77th pre alg -l36--dec7
7th pre alg -l36--dec7jdurst65
 

What's hot (19)

SPECIAL PRODUCTS
SPECIAL PRODUCTSSPECIAL PRODUCTS
SPECIAL PRODUCTS
 
Sparse Binary Zero Sum Games (ACML2014)
Sparse Binary Zero Sum Games (ACML2014)Sparse Binary Zero Sum Games (ACML2014)
Sparse Binary Zero Sum Games (ACML2014)
 
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
 
Ch 3 rev trashketball exp logs
Ch 3 rev trashketball exp logsCh 3 rev trashketball exp logs
Ch 3 rev trashketball exp logs
 
Perfect square of Binomials
Perfect square of BinomialsPerfect square of Binomials
Perfect square of Binomials
 
SPECIAL PRODUCTS
SPECIAL PRODUCTSSPECIAL PRODUCTS
SPECIAL PRODUCTS
 
Chess board problem(divide and conquer)
Chess board problem(divide and conquer)Chess board problem(divide and conquer)
Chess board problem(divide and conquer)
 
Alg2 lesson 10-3
Alg2 lesson 10-3Alg2 lesson 10-3
Alg2 lesson 10-3
 
Second Quarter Group F Math Peta - Special Products (Sq. of Bi, Sq. of Tri, S...
Second Quarter Group F Math Peta - Special Products (Sq. of Bi, Sq. of Tri, S...Second Quarter Group F Math Peta - Special Products (Sq. of Bi, Sq. of Tri, S...
Second Quarter Group F Math Peta - Special Products (Sq. of Bi, Sq. of Tri, S...
 
karnaugh maps
karnaugh mapskarnaugh maps
karnaugh maps
 
8.4 mixed.ppt worked
8.4 mixed.ppt worked8.4 mixed.ppt worked
8.4 mixed.ppt worked
 
X factoring revised
X factoring revisedX factoring revised
X factoring revised
 
Comuter graphics dda algorithm
Comuter graphics dda algorithm Comuter graphics dda algorithm
Comuter graphics dda algorithm
 
Logic zoo ws 2013
Logic zoo ws 2013Logic zoo ws 2013
Logic zoo ws 2013
 
Multiplication 3
Multiplication 3Multiplication 3
Multiplication 3
 
Mat0024 l8-16-sections 11-6-7
Mat0024 l8-16-sections 11-6-7Mat0024 l8-16-sections 11-6-7
Mat0024 l8-16-sections 11-6-7
 
8th pre alg -l36--nov26
8th pre alg -l36--nov268th pre alg -l36--nov26
8th pre alg -l36--nov26
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
7th pre alg -l36--dec7
7th pre alg -l36--dec77th pre alg -l36--dec7
7th pre alg -l36--dec7
 

Viewers also liked

How to support content creators
How to support content creatorsHow to support content creators
How to support content creators
Martin Lindeskog
 
Operations Professionals and Social Media: Compliance Hurdles and Business Va...
Operations Professionals and Social Media: Compliance Hurdles and Business Va...Operations Professionals and Social Media: Compliance Hurdles and Business Va...
Operations Professionals and Social Media: Compliance Hurdles and Business Va...Hearsay Social
 
Mobile Influences on Managed Travel
Mobile Influences on Managed TravelMobile Influences on Managed Travel
Mobile Influences on Managed Travel
Tim Hines
 
Digital Marketing Championship
Digital Marketing ChampionshipDigital Marketing Championship
Digital Marketing Championship
Yogesh M. A.
 
World Economic Forum Tipping Points Report
World Economic Forum Tipping Points ReportWorld Economic Forum Tipping Points Report
World Economic Forum Tipping Points Report
Sergey Nazarov
 
Medienseminar TopSoft 2006
Medienseminar TopSoft 2006Medienseminar TopSoft 2006
Medienseminar TopSoft 2006
Erhard Ruettimann
 
Content Marketing Canvas
Content Marketing CanvasContent Marketing Canvas
Content Marketing Canvas
digitaleheimat GmbH
 
SlideShare 101
SlideShare 101SlideShare 101
SlideShare 101
Amit Ranjan
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
Natasha Murashev
 

Viewers also liked (9)

How to support content creators
How to support content creatorsHow to support content creators
How to support content creators
 
Operations Professionals and Social Media: Compliance Hurdles and Business Va...
Operations Professionals and Social Media: Compliance Hurdles and Business Va...Operations Professionals and Social Media: Compliance Hurdles and Business Va...
Operations Professionals and Social Media: Compliance Hurdles and Business Va...
 
Mobile Influences on Managed Travel
Mobile Influences on Managed TravelMobile Influences on Managed Travel
Mobile Influences on Managed Travel
 
Digital Marketing Championship
Digital Marketing ChampionshipDigital Marketing Championship
Digital Marketing Championship
 
World Economic Forum Tipping Points Report
World Economic Forum Tipping Points ReportWorld Economic Forum Tipping Points Report
World Economic Forum Tipping Points Report
 
Medienseminar TopSoft 2006
Medienseminar TopSoft 2006Medienseminar TopSoft 2006
Medienseminar TopSoft 2006
 
Content Marketing Canvas
Content Marketing CanvasContent Marketing Canvas
Content Marketing Canvas
 
SlideShare 101
SlideShare 101SlideShare 101
SlideShare 101
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 

Similar to Visualize data using the split-apply-combine approach

6 sigma introduction
6 sigma introduction6 sigma introduction
6 sigma introduction
Global Vision
 
20100119 mis
20100119 mis20100119 mis
20100119 misamikom
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
cookie1969
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Modern Data Stack France
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
Bharat Kalia
 
Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationAdvanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
DATAVERSITY
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew Ray
Spark Summit
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
Mydbops
 
RTB Update 4 - Dominic Trigg, RocketFuel
RTB Update 4 - Dominic Trigg, RocketFuelRTB Update 4 - Dominic Trigg, RocketFuel
RTB Update 4 - Dominic Trigg, RocketFuelHusetMarkedsforing
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden
 
4.Data-Visualization.pptx
4.Data-Visualization.pptx4.Data-Visualization.pptx
4.Data-Visualization.pptx
PratyushJain37
 
A Picture is Worth a Thousand Words
A Picture is Worth a Thousand WordsA Picture is Worth a Thousand Words
A Picture is Worth a Thousand Words
John Park
 
Access intro
Access introAccess intro
Access intro
Huang Yu-Wen
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden
 
2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchers2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchers
Samuel Harrold
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
Jim Stafford
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
Jim Stafford
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
Jim Stafford
 

Similar to Visualize data using the split-apply-combine approach (20)

6 sigma introduction
6 sigma introduction6 sigma introduction
6 sigma introduction
 
20100119 mis
20100119 mis20100119 mis
20100119 mis
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
 
Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationAdvanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew Ray
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
RTB Update 4 - Dominic Trigg, RocketFuel
RTB Update 4 - Dominic Trigg, RocketFuelRTB Update 4 - Dominic Trigg, RocketFuel
RTB Update 4 - Dominic Trigg, RocketFuel
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
4.Data-Visualization.pptx
4.Data-Visualization.pptx4.Data-Visualization.pptx
4.Data-Visualization.pptx
 
A Picture is Worth a Thousand Words
A Picture is Worth a Thousand WordsA Picture is Worth a Thousand Words
A Picture is Worth a Thousand Words
 
05 OLAP v6 weekend
05 OLAP  v6 weekend05 OLAP  v6 weekend
05 OLAP v6 weekend
 
Access intro
Access introAccess intro
Access intro
 
Chap12
Chap12Chap12
Chap12
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchers2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchers
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 

Recently uploaded

7 Alternatives to Bullet Points in PowerPoint
7 Alternatives to Bullet Points in PowerPoint7 Alternatives to Bullet Points in PowerPoint
7 Alternatives to Bullet Points in PowerPoint
Alvis Oh
 
Design Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinkingDesign Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinking
cy0krjxt
 
PORTFOLIO FABIANA VILLANI ARCHITECTURE.pdf
PORTFOLIO FABIANA VILLANI ARCHITECTURE.pdfPORTFOLIO FABIANA VILLANI ARCHITECTURE.pdf
PORTFOLIO FABIANA VILLANI ARCHITECTURE.pdf
fabianavillanib
 
projectreportnew-170307082323 nnnnnn(1).pdf
projectreportnew-170307082323 nnnnnn(1).pdfprojectreportnew-170307082323 nnnnnn(1).pdf
projectreportnew-170307082323 nnnnnn(1).pdf
farazahmadas6
 
原版定做(penn毕业证书)美国宾夕法尼亚大学毕业证文凭学历证书原版一模一样
原版定做(penn毕业证书)美国宾夕法尼亚大学毕业证文凭学历证书原版一模一样原版定做(penn毕业证书)美国宾夕法尼亚大学毕业证文凭学历证书原版一模一样
原版定做(penn毕业证书)美国宾夕法尼亚大学毕业证文凭学历证书原版一模一样
gpffo76j
 
Transforming Brand Perception and Boosting Profitability
Transforming Brand Perception and Boosting ProfitabilityTransforming Brand Perception and Boosting Profitability
Transforming Brand Perception and Boosting Profitability
aaryangarg12
 
Research 20 slides Amelia gavryliuks.pdf
Research 20 slides Amelia gavryliuks.pdfResearch 20 slides Amelia gavryliuks.pdf
Research 20 slides Amelia gavryliuks.pdf
ameli25062005
 
Design Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinkingDesign Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinking
cy0krjxt
 
一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理
一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理
一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理
h7j5io0
 
20 slides of research movie and artists .pdf
20 slides of research movie and artists .pdf20 slides of research movie and artists .pdf
20 slides of research movie and artists .pdf
ameli25062005
 
Design Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinkingDesign Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinking
cy0krjxt
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证成绩单如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证成绩单如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证成绩单如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证成绩单如何办理
n0tivyq
 
Exploring the Future of Smart Garages.pdf
Exploring the Future of Smart Garages.pdfExploring the Future of Smart Garages.pdf
Exploring the Future of Smart Garages.pdf
fastfixgaragedoor
 
一比一原版(UNUK毕业证书)诺丁汉大学毕业证如何办理
一比一原版(UNUK毕业证书)诺丁汉大学毕业证如何办理一比一原版(UNUK毕业证书)诺丁汉大学毕业证如何办理
一比一原版(UNUK毕业证书)诺丁汉大学毕业证如何办理
7sd8fier
 
vernacular architecture in response to climate.pdf
vernacular architecture in response to climate.pdfvernacular architecture in response to climate.pdf
vernacular architecture in response to climate.pdf
PrabhjeetSingh219035
 
Common Designing Mistakes and How to avoid them
Common Designing Mistakes and How to avoid themCommon Designing Mistakes and How to avoid them
Common Designing Mistakes and How to avoid them
madhavlakhanpal29
 
Expert Accessory Dwelling Unit (ADU) Drafting Services
Expert Accessory Dwelling Unit (ADU) Drafting ServicesExpert Accessory Dwelling Unit (ADU) Drafting Services
Expert Accessory Dwelling Unit (ADU) Drafting Services
ResDraft
 
Portfolio.pdf
Portfolio.pdfPortfolio.pdf
Portfolio.pdf
garcese
 
RTUYUIJKLDSADAGHBDJNKSMAL,D
RTUYUIJKLDSADAGHBDJNKSMAL,DRTUYUIJKLDSADAGHBDJNKSMAL,D
RTUYUIJKLDSADAGHBDJNKSMAL,D
cy0krjxt
 
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...
Mansi Shah
 

Recently uploaded (20)

7 Alternatives to Bullet Points in PowerPoint
7 Alternatives to Bullet Points in PowerPoint7 Alternatives to Bullet Points in PowerPoint
7 Alternatives to Bullet Points in PowerPoint
 
Design Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinkingDesign Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinking
 
PORTFOLIO FABIANA VILLANI ARCHITECTURE.pdf
PORTFOLIO FABIANA VILLANI ARCHITECTURE.pdfPORTFOLIO FABIANA VILLANI ARCHITECTURE.pdf
PORTFOLIO FABIANA VILLANI ARCHITECTURE.pdf
 
projectreportnew-170307082323 nnnnnn(1).pdf
projectreportnew-170307082323 nnnnnn(1).pdfprojectreportnew-170307082323 nnnnnn(1).pdf
projectreportnew-170307082323 nnnnnn(1).pdf
 
原版定做(penn毕业证书)美国宾夕法尼亚大学毕业证文凭学历证书原版一模一样
原版定做(penn毕业证书)美国宾夕法尼亚大学毕业证文凭学历证书原版一模一样原版定做(penn毕业证书)美国宾夕法尼亚大学毕业证文凭学历证书原版一模一样
原版定做(penn毕业证书)美国宾夕法尼亚大学毕业证文凭学历证书原版一模一样
 
Transforming Brand Perception and Boosting Profitability
Transforming Brand Perception and Boosting ProfitabilityTransforming Brand Perception and Boosting Profitability
Transforming Brand Perception and Boosting Profitability
 
Research 20 slides Amelia gavryliuks.pdf
Research 20 slides Amelia gavryliuks.pdfResearch 20 slides Amelia gavryliuks.pdf
Research 20 slides Amelia gavryliuks.pdf
 
Design Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinkingDesign Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinking
 
一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理
一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理
一比一原版(UCB毕业证书)伯明翰大学学院毕业证成绩单如何办理
 
20 slides of research movie and artists .pdf
20 slides of research movie and artists .pdf20 slides of research movie and artists .pdf
20 slides of research movie and artists .pdf
 
Design Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinkingDesign Thinking Design thinking Design thinking
Design Thinking Design thinking Design thinking
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证成绩单如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证成绩单如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证成绩单如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证成绩单如何办理
 
Exploring the Future of Smart Garages.pdf
Exploring the Future of Smart Garages.pdfExploring the Future of Smart Garages.pdf
Exploring the Future of Smart Garages.pdf
 
一比一原版(UNUK毕业证书)诺丁汉大学毕业证如何办理
一比一原版(UNUK毕业证书)诺丁汉大学毕业证如何办理一比一原版(UNUK毕业证书)诺丁汉大学毕业证如何办理
一比一原版(UNUK毕业证书)诺丁汉大学毕业证如何办理
 
vernacular architecture in response to climate.pdf
vernacular architecture in response to climate.pdfvernacular architecture in response to climate.pdf
vernacular architecture in response to climate.pdf
 
Common Designing Mistakes and How to avoid them
Common Designing Mistakes and How to avoid themCommon Designing Mistakes and How to avoid them
Common Designing Mistakes and How to avoid them
 
Expert Accessory Dwelling Unit (ADU) Drafting Services
Expert Accessory Dwelling Unit (ADU) Drafting ServicesExpert Accessory Dwelling Unit (ADU) Drafting Services
Expert Accessory Dwelling Unit (ADU) Drafting Services
 
Portfolio.pdf
Portfolio.pdfPortfolio.pdf
Portfolio.pdf
 
RTUYUIJKLDSADAGHBDJNKSMAL,D
RTUYUIJKLDSADAGHBDJNKSMAL,DRTUYUIJKLDSADAGHBDJNKSMAL,D
RTUYUIJKLDSADAGHBDJNKSMAL,D
 
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...
Between Filth and Fortune- Urban Cattle Foraging Realities by Devi S Nair, An...
 

Visualize data using the split-apply-combine approach

  • 1. A Methodic Approach to Good Data Visualization Luca Candela - @luckymethod
  • 2. Luca Candela DataPad Inc. // UX Eye // @luckymethod
  • 3. Men of great rank, or active business, can only pay attention to particulars of use […] it is hoped that with the assistance of these Charts, information will be got, without the fatigue and trouble of studying the particulars [...] William Playfair - Commercial and Political Atlas, 1786
  • 4. Data visualization is the art of *reducing information in a data set while preserving the knowledge contained in it. *we can talk about what “reducing information” means in this case...
  • 5. Data Preparation Data Visualization Discovery of knowledge Conceptual data analysis workflow
  • 6. Hadley Wickham popularized a concept called split-apply-combine as a way of thinking about data querying. http://www.jstatsoft.org/v40/i01/paper
  • 7. For the four most revenue generating countries, what are the top three most revenue generating categories? Country Venue Type Sum Revenue United States Fast Food $16 Street $10 Restaurant $9 France Cafe $18 Pub $12 Restaurant $2 Canada Cafe $10 Fast Food $4 Street $3 Japan Street $5 Fast Food $4 Pub $1
  • 8. apply: Sum Revenue Canada United States Germany France Japan split by country combine: sort descending by Sum Revenue, limit 4 Country Sum Revenue United States France Canada Japan $ 83 $ 42 $ 36 $ 18 data Sum Revenue = $ 36 Sum Revenue = $ 83 Sum Revenue = $ 8 Sum Revenue = $ 42 Sum Revenue = $ 18 The basics of split-apply-combine
  • 9. Canada United States Germany France Japan data bus stop fastfood park ... restaurant hair saloon pub ... restaurant street cafe ... park pub street Country Sum Revenue United States France Canada Japan $ 16 $ 10 $ 9 $ 18 $ 12 $ 2 $ 10 $ 4 $ 3 $ 5 $ 4 $ 1 Venue type fastfood street restaurant cafe pub restaurant cafe fastfood park street fastfood pub ... The basics of split-apply-combine
  • 10. Country Sum Revenue United States France Canada Japan split by country, combine by sorting desc. on Sum Revenue, map to the vertical axis using an ordinal scale. add labels apply: sum revenue, call it Sum Revenue, plot rectangles and map length to the horizontal axis using a linear scale, Color with #45808E. Use `Country` as label Split-apply-combine thinking translates to visualizations
  • 11. 1. split on state apply sum population combine: sort desc. by population; limit 6 Nested split-apply-combine underpins more complex visualizations 2. split on age (bin by 5 year) combine: sort by age apply sum population
  • 12. Data Visualization can be thought as a visual mapping function applied during the *Apply and Combine steps. *although it can be thought as applied exclusively during the combine step…
  • 13. Name Operation Lines Vadim Added 100 Luca Removed 34 Vadim Added 65 Vadim Removed 5 Luca Added 24 Vadim Removed 71 Luca Removed 45 Vadim Added 7 ... ... ... -960 LucaVadim 1531 -321 739 0 1k 2k -2k -1k “plot” AdditionsDeletions Reduce information, preserve knowledge...
  • 14. Question: Mapping of what, to what?
  • 15. Types of data ID Timestamp Location Name Operation Lines Pass Test? 0000001 11-05-2013 10.45 am San Francisco Vadim Added 100 Yes 0000002 11-05-2013 11.12 am San Bruno Luca Removed 34 Yes 0000003 11-05-2013 11.30 am San Francisco Vadim Added 65 Yes 0000004 11-05-2013 11.34 am San Francisco Vadim Removed 5 Yes 0000005 11-05-2013 11.43 am San Bruno Luca Added 24 No 0000006 11-05-2013 11.45 am San Francisco Vadim Removed 71 Yes 0000007 11-05-2013 12.51 pm San Francisco Luca Removed 45 Yes 0000008 11-05-2013 12.55 pm San Francisco Vadim Added 7 No ... ... ... ... ... ... ... Categorical # Discrete # Continuous# Discrete Boolean
  • 16. There are other ways to classify data, but this one will get you very far. pick up a good statistics book and just start reading...
  • 17. Types of variables 1. Independent a. a variable that isn't changed by the other variables you are trying to measure. It usually goes on the x axis. 2. Dependent a. It is a variable that changes depending on other variable(s). It usually goes on the y axis.
  • 19. Variables of a visualization 1. Position (x,y) 2. Size (big, small…) 3. Value (bright, dark…) 4. Texture (hatched, dotted…) 5. Color (blue, red…) 6. Orientation (degree) 7. Shape (triangle, circle…) y x
  • 20. # Discrete # Continuous Categorical Boolean y x y x y x y x Optimal mappings by type
  • 21. -960 LucaVadim 1531 -321 739 0 1k 2k -2k -1k AddedRemoved Name Operation Lines Vadim Added 100 Luca Removed 34 Vadim Added 65 Vadim Removed 5 Luca Added 24 Vadim Removed 71 Luca Removed 45 Vadim Added 7 ... ... ... Split on Name Split on Operation Apply Sum(Added) Apply Sum(Removed) Combine -Removed map to Red, value to size Combine Added map to Green, value to size Combine Name map to x axis
  • 22. Apply the minimum number of mappings that illustrates the underlying question you are trying to answer.
  • 24. 1. Label your axes 2. Include measurement units 3. Explain your encodings (add a legend) 4. Remove redundant information 5. Don’t fuck with distort the axis, especially with time series Golden rules - Part 1
  • 25. Golden rules - Part 2 1. If you are trying to visualize rate of change, then do it 2. Remove outliers, but know they are there 3. Tools have their own biases and quirks, know them. 4. The solution to 80% of your problems are bar charts and histograms 5. Data Tables are visualizations too ...there are thousands of good rules, but the best one is still “keep it simple”
  • 26. Some examples this is going to be fun...
  • 27. Example 1 Simple bar chart Linear scale Missing bucket (4.8 - 4.9) Missing bucket (4.8 - 4.9)
  • 29. Example 2 - better No - Human Yes - Robot
  • 32. Example 5 OK, this is comically bad, I was just going for a good collective giggle...
  • 33. Books you should read everybody knows about Tufte, so please don’t bring it up
  • 34. The Semiology of Graphics, 1967 Jaques Bertin
  • 35. The Elements of Graphing Data, 1985 & Visualizing Data, 1993 William S. Cleveland
  • 37. Thank you! for questions, tweet me at @luckymethod