SlideShare a Scribd company logo
Visualizing and
Communicating
High-dimensional Data
Dr. Stefan Kühn
Lead Data Scientist
Data Natives Berlin - 26.10.2016
Data Visualization Basics
2
The Modes of Perception
3
X
X
X
X
X
X
X
O
O
O O
O
O
O
O
X
X
X
X
O
O
O
O
X
X
X
X
X
X
X
X
O
O
O O
O
O
O
O
X
X X
X
X
O
O
O
Fast Slow
Find the outlier
The Modes of Perception
4
• Pre-attentive
• fast
• parallel processing
• effortless
• Pattern recognition
• semi-fast
• governed by laws of Gestalt
• Attentive
• slow
• sequential
• high effort (attention is a very limited resource)
Main Properties of Graphics
5
Category Example
Position
Shape
Size
Color
Orientation (Line)
Length (Line)
Type and Size (Line)
Brightness
Main Properties of Graphics Humans
6
Category Amount of pre-attentive information
Position very high
Shape ———
Size approx. 4
Color approx. 8
Orientation (Line) approx. 4
Length (Line) ———
Type and Size (Line) ———
Brightness approx. 8
Pre-attentive perception
7
• Position
• fast
• effective
• high number of different positions
• Color
• use with care
• Shape
• Orientation
Pre-attentive perception is effortless.
Exploit this as much as you can.
Pattern detection
8
„It is interesting to note that our brain […]
subconsciously always prefers meaningful
situations and objects.“
• Emergence
• Reiification
• Multi-stability
• Invariance
Pattern detection can be trained.
Exploit this for frequent visualizations.
What is this?
9
What is this?
10
What is this?
11
Laws of Gestalt
12
„It is interesting to note that our brain, in
accordance with the laws of Gestalt,
subconsciously always prefers meaningful
situations and objects.“
Accuracy of Graphics
13
Square Pie vs Stacked Bar vs Pie vs Donut
What do you think?
https://eagereyes.org/blog/2016/a-reanalysis-of-a-study-about-square-pie-charts-from-2009
Why do we visualize data?
14
• Explore
• use different techniques
• avoid „construction“ bias
• be careful with „aesthetics“
• challenge findings -> use attentive mode
• Explain
• focus on message or „story“
• use pre-attentive mode
Don’t trust graphics that you have not falsified yourself.
Beyond the Basics
15
Pair Plots
16
Correlation Plots
17
Fundamental Problems
• No accurate method in higher dimensions
• Approximation methods
• „Simulated“ dimensions (color, size, shape)
• Animations?
• No notion of quality or accuracy for
Visualizations
• Information Theory?
• „Stability“?
All Visualizations are wrong, but some are useful.
18
Approximation methods
• Pair Plots
• Axis-aligned projections
• Interpretable in terms of original variables
• Singular Value Decomposition
• Optimal with respect to 2-norm (Euclidean norm)
and supremum norm
• Comes with an error estimate
• Other methods
• Stochastic Neighbor Embedding ((t-)SNE)
• „Manifold Learning“
19
Manifold Learning Methods
• Locally Linear Embedding
• Neighborhood-preserving embedding
• Isomap
• quasi-isometric
• Multi-dimensional scaling
• quasi-isometric
• Spectral Embedding
• Spectral clustering based on similarity
• Stochastic Neighbor Embedding (SNE, t-SNE)
• preserves conditional probabilities for similarity
• Local Tangent Space Alignement (LTSA)
20
Manifold Learning
21
Manifold Learning
22
Manifold Learning
23
Local Tangent Space Alignement
24
Principal Components and Curves
• Principal Component Analysis
• orthogonal decomposition based on SVD
• linear in all variables
• tries to preserve variance
• Principal Curves
• minimize the Sum of Squared Errors with respect
to all variables (as PCA, preserve variance)
• nonlinear
• smooth
25
Principal Components and Curves
26
Parallel Coordinates
• Parallel Coordinates
• especially useful for high-dimensional data
• depends on ordering and scaling
27
The Grand Tour
• Animated sequence of 2-D projections
• https://en.wikipedia.org/wiki/
Grand_Tour_(data_visualisation)
• Asimov (1985): The grand tour: a tool for viewing
multidimensional data.
• Underlying idea
• Randomly generate 2-D projections (random
walk)
• Over time generate a dense subset of all
possible 2-D projections
• Optional: Follow a given path / guided tour
28
The Grand Tour
29
The Grand Tour
30
31
Thanks a lot!
www.codecentric.de
blog.codecentric.de
stefan.kuehn@codecentric.de
datascience@codecentric.de

More Related Content

Similar to Visualizing and Communicating High-dimensional Data

chi03-tutorial.ppt
chi03-tutorial.pptchi03-tutorial.ppt
chi03-tutorial.ppt
KumarVijay54
 
ODSC India 2018: Topological space creation & Clustering at BigData scale
ODSC India 2018: Topological space creation & Clustering at BigData scaleODSC India 2018: Topological space creation & Clustering at BigData scale
ODSC India 2018: Topological space creation & Clustering at BigData scale
Kuldeep Jiwani
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and Applications
Stefan Kühn
 
How Humans See Data - Google - November 2017
How Humans See Data  - Google - November 2017How Humans See Data  - Google - November 2017
How Humans See Data - Google - November 2017
John Rauser
 
04 data viz concepts
04 data viz concepts04 data viz concepts
04 data viz conceptsPaul Kahn
 
How Humans See Data - Amazon Cut
How Humans See Data - Amazon CutHow Humans See Data - Amazon Cut
How Humans See Data - Amazon Cut
John Rauser
 
How Humans See Data
How Humans See DataHow Humans See Data
How Humans See Data
John Rauser
 
Making sense of data visually: A modern look at datavisualization
Making sense of data visually: A modern look at datavisualizationMaking sense of data visually: A modern look at datavisualization
Making sense of data visually: A modern look at datavisualization
Vladimir Milev
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
Jan Aerts
 
Building maps with analysis
Building maps with analysisBuilding maps with analysis
Building maps with analysis
LindaBeale
 
AMIA 2015 Visual Analytics in Healthcare Tutorial Part 1
AMIA 2015 Visual Analytics in Healthcare Tutorial Part 1AMIA 2015 Visual Analytics in Healthcare Tutorial Part 1
AMIA 2015 Visual Analytics in Healthcare Tutorial Part 1
David Gotz
 
Data Visualization dataviz superpower
Data Visualization dataviz superpowerData Visualization dataviz superpower
Data Visualization dataviz superpower
Jen Stirrup
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
DamianMingle
 
Enhancing Parallel Coordinates with Curves
Enhancing Parallel Coordinates with CurvesEnhancing Parallel Coordinates with Curves
Enhancing Parallel Coordinates with Curves
martinjgraham
 
FaceRecognition.ppt
FaceRecognition.pptFaceRecognition.ppt
FaceRecognition.ppt
Swapnil Sahoo
 
FaceRecognition.ppt
FaceRecognition.pptFaceRecognition.ppt
FaceRecognition.ppt
RaviKiran911228
 
FaceRecognition.ppt
FaceRecognition.pptFaceRecognition.ppt
FaceRecognition.ppt
ssuserd2027d1
 
Manifold Learning and Data Visualization
Manifold Learning and Data VisualizationManifold Learning and Data Visualization
Manifold Learning and Data Visualization
Stefan Kühn
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
Maloy Manna, PMP®
 

Similar to Visualizing and Communicating High-dimensional Data (20)

chi03-tutorial.ppt
chi03-tutorial.pptchi03-tutorial.ppt
chi03-tutorial.ppt
 
ODSC India 2018: Topological space creation & Clustering at BigData scale
ODSC India 2018: Topological space creation & Clustering at BigData scaleODSC India 2018: Topological space creation & Clustering at BigData scale
ODSC India 2018: Topological space creation & Clustering at BigData scale
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and Applications
 
How Humans See Data - Google - November 2017
How Humans See Data  - Google - November 2017How Humans See Data  - Google - November 2017
How Humans See Data - Google - November 2017
 
Reza talk
Reza talkReza talk
Reza talk
 
04 data viz concepts
04 data viz concepts04 data viz concepts
04 data viz concepts
 
How Humans See Data - Amazon Cut
How Humans See Data - Amazon CutHow Humans See Data - Amazon Cut
How Humans See Data - Amazon Cut
 
How Humans See Data
How Humans See DataHow Humans See Data
How Humans See Data
 
Making sense of data visually: A modern look at datavisualization
Making sense of data visually: A modern look at datavisualizationMaking sense of data visually: A modern look at datavisualization
Making sense of data visually: A modern look at datavisualization
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
Building maps with analysis
Building maps with analysisBuilding maps with analysis
Building maps with analysis
 
AMIA 2015 Visual Analytics in Healthcare Tutorial Part 1
AMIA 2015 Visual Analytics in Healthcare Tutorial Part 1AMIA 2015 Visual Analytics in Healthcare Tutorial Part 1
AMIA 2015 Visual Analytics in Healthcare Tutorial Part 1
 
Data Visualization dataviz superpower
Data Visualization dataviz superpowerData Visualization dataviz superpower
Data Visualization dataviz superpower
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
 
Enhancing Parallel Coordinates with Curves
Enhancing Parallel Coordinates with CurvesEnhancing Parallel Coordinates with Curves
Enhancing Parallel Coordinates with Curves
 
FaceRecognition.ppt
FaceRecognition.pptFaceRecognition.ppt
FaceRecognition.ppt
 
FaceRecognition.ppt
FaceRecognition.pptFaceRecognition.ppt
FaceRecognition.ppt
 
FaceRecognition.ppt
FaceRecognition.pptFaceRecognition.ppt
FaceRecognition.ppt
 
Manifold Learning and Data Visualization
Manifold Learning and Data VisualizationManifold Learning and Data Visualization
Manifold Learning and Data Visualization
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
 

More from Stefan Kühn

data2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdfdata2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdf
Stefan Kühn
 
data2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdfdata2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdf
Stefan Kühn
 
Data Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational ChangeData Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational Change
Stefan Kühn
 
Interactive Dashboards with R
Interactive Dashboards with RInteractive Dashboards with R
Interactive Dashboards with R
Stefan Kühn
 
Talk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and ApplicationsTalk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and Applications
Stefan Kühn
 
Bridging the gap
Bridging the gapBridging the gap
Bridging the gap
Stefan Kühn
 
The Machinery behind Deep Learning
The Machinery behind Deep LearningThe Machinery behind Deep Learning
The Machinery behind Deep Learning
Stefan Kühn
 
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing SolutionsBecoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Stefan Kühn
 
Learning To Rank data2day 2017
Learning To Rank data2day 2017Learning To Rank data2day 2017
Learning To Rank data2day 2017
Stefan Kühn
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization Methods
Stefan Kühn
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
Stefan Kühn
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015
Stefan Kühn
 
SKuehn_Talk_FootballAnalytics_data2day2015
SKuehn_Talk_FootballAnalytics_data2day2015SKuehn_Talk_FootballAnalytics_data2day2015
SKuehn_Talk_FootballAnalytics_data2day2015
Stefan Kühn
 

More from Stefan Kühn (13)

data2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdfdata2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdf
 
data2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdfdata2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdf
 
Data Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational ChangeData Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational Change
 
Interactive Dashboards with R
Interactive Dashboards with RInteractive Dashboards with R
Interactive Dashboards with R
 
Talk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and ApplicationsTalk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and Applications
 
Bridging the gap
Bridging the gapBridging the gap
Bridging the gap
 
The Machinery behind Deep Learning
The Machinery behind Deep LearningThe Machinery behind Deep Learning
The Machinery behind Deep Learning
 
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing SolutionsBecoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
 
Learning To Rank data2day 2017
Learning To Rank data2day 2017Learning To Rank data2day 2017
Learning To Rank data2day 2017
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization Methods
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015
 
SKuehn_Talk_FootballAnalytics_data2day2015
SKuehn_Talk_FootballAnalytics_data2day2015SKuehn_Talk_FootballAnalytics_data2day2015
SKuehn_Talk_FootballAnalytics_data2day2015
 

Recently uploaded

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 

Recently uploaded (20)

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 

Visualizing and Communicating High-dimensional Data

  • 1. Visualizing and Communicating High-dimensional Data Dr. Stefan Kühn Lead Data Scientist Data Natives Berlin - 26.10.2016
  • 3. The Modes of Perception 3 X X X X X X X O O O O O O O O X X X X O O O O X X X X X X X X O O O O O O O O X X X X X O O O Fast Slow Find the outlier
  • 4. The Modes of Perception 4 • Pre-attentive • fast • parallel processing • effortless • Pattern recognition • semi-fast • governed by laws of Gestalt • Attentive • slow • sequential • high effort (attention is a very limited resource)
  • 5. Main Properties of Graphics 5 Category Example Position Shape Size Color Orientation (Line) Length (Line) Type and Size (Line) Brightness
  • 6. Main Properties of Graphics Humans 6 Category Amount of pre-attentive information Position very high Shape ——— Size approx. 4 Color approx. 8 Orientation (Line) approx. 4 Length (Line) ——— Type and Size (Line) ——— Brightness approx. 8
  • 7. Pre-attentive perception 7 • Position • fast • effective • high number of different positions • Color • use with care • Shape • Orientation Pre-attentive perception is effortless. Exploit this as much as you can.
  • 8. Pattern detection 8 „It is interesting to note that our brain […] subconsciously always prefers meaningful situations and objects.“ • Emergence • Reiification • Multi-stability • Invariance Pattern detection can be trained. Exploit this for frequent visualizations.
  • 12. Laws of Gestalt 12 „It is interesting to note that our brain, in accordance with the laws of Gestalt, subconsciously always prefers meaningful situations and objects.“
  • 13. Accuracy of Graphics 13 Square Pie vs Stacked Bar vs Pie vs Donut What do you think? https://eagereyes.org/blog/2016/a-reanalysis-of-a-study-about-square-pie-charts-from-2009
  • 14. Why do we visualize data? 14 • Explore • use different techniques • avoid „construction“ bias • be careful with „aesthetics“ • challenge findings -> use attentive mode • Explain • focus on message or „story“ • use pre-attentive mode Don’t trust graphics that you have not falsified yourself.
  • 18. Fundamental Problems • No accurate method in higher dimensions • Approximation methods • „Simulated“ dimensions (color, size, shape) • Animations? • No notion of quality or accuracy for Visualizations • Information Theory? • „Stability“? All Visualizations are wrong, but some are useful. 18
  • 19. Approximation methods • Pair Plots • Axis-aligned projections • Interpretable in terms of original variables • Singular Value Decomposition • Optimal with respect to 2-norm (Euclidean norm) and supremum norm • Comes with an error estimate • Other methods • Stochastic Neighbor Embedding ((t-)SNE) • „Manifold Learning“ 19
  • 20. Manifold Learning Methods • Locally Linear Embedding • Neighborhood-preserving embedding • Isomap • quasi-isometric • Multi-dimensional scaling • quasi-isometric • Spectral Embedding • Spectral clustering based on similarity • Stochastic Neighbor Embedding (SNE, t-SNE) • preserves conditional probabilities for similarity • Local Tangent Space Alignement (LTSA) 20
  • 24. Local Tangent Space Alignement 24
  • 25. Principal Components and Curves • Principal Component Analysis • orthogonal decomposition based on SVD • linear in all variables • tries to preserve variance • Principal Curves • minimize the Sum of Squared Errors with respect to all variables (as PCA, preserve variance) • nonlinear • smooth 25
  • 27. Parallel Coordinates • Parallel Coordinates • especially useful for high-dimensional data • depends on ordering and scaling 27
  • 28. The Grand Tour • Animated sequence of 2-D projections • https://en.wikipedia.org/wiki/ Grand_Tour_(data_visualisation) • Asimov (1985): The grand tour: a tool for viewing multidimensional data. • Underlying idea • Randomly generate 2-D projections (random walk) • Over time generate a dense subset of all possible 2-D projections • Optional: Follow a given path / guided tour 28