SlideShare a Scribd company logo
Visualizing
High-dimensional
Data
Dr. Stefan Kühn
Lead Data Scientist
code.talks - 30.09.2016
Overview
• Data Visualization as you might know it
• Main Properties of Graphics (and Humans)
• A short story about Charts
• Pair Plots and Correlations
• Data Visualization as you might not know it
• Fundamental Problems
• SVD, t-SNE and other approximations
• Principal Components and Principal Curves
• Parallel Coordinates
• The Grand Tour
2
Takeaways - hopefully ;-)
• Data Visualization is complicated
• There is always an approximation
• There is always a bias
• There is always a misinterpretation
• Data Visualization is simple
• Lots of packages available
• Lots of studies + literature
• Lots of examples to learn from
3
Data Visualization Basics
4
The Modes of Perception
5
X
X
X
X
X
X
X
O
O
O O
O
O
O
O
X
X
X
X
O
O
O
O
X
X
X
X
X
X
X
X
O
O
O O
O
O
O
O
X
X X
X
X
O
O
O
Fast Slow
Find the outlier
The Modes of Perception
6
• Pre-attentive
• fast
• parallel processing
• effortless
• Pattern recognition
• semi-fast
• governed by laws of per by
• Attentive
• slow
• sequential
• high effort (attention a very limited resource)
Main Properties of Graphics
7
Category Example
Position
Shape
Size
Color
Orientation (Line)
Length (Line)
Type and Size (Line)
Brightness
Main Properties of Graphics Humans
8
Category Amount of pre-attentive information
Position very high
Shape ———
Size approx. 4
Color approx. 8
Orientation (Line) approx. 4
Length (Line) ———
Type and Size (Line) ———
Brightness approx. 8
Pre-attentive perception
9
• Position
• fast
• effective
• high number of different positions
• Color
• use with care
• Shape
• Orientation
Pre-attentive perception is effortless.
Exploit this as much as you can.
Pattern detection
10
„It is interesting to note that our brain […]
subconsciously always prefers meaningful
situations and objects.“
• Emergence
• Reiification
• Multi-stability
• Invariance
Pattern detection can be trained.
Exploit this for frequent visualizations.
What is this?
11
What is this?
12
What is this?
13
Laws of Gestalt
14
„It is interesting to note that our brain, in
accordance with the laws of Gestalt,
subconsciously always prefers meaningful
situations and objects.“
Accuracy of Graphics
15
Square Pie vs Stacked Bar vs Pie vs Donut
What do you think?
https://eagereyes.org/blog/2016/a-reanalysis-of-a-study-about-square-pie-charts-from-2009
Demo
16
Beyond the Basics
17
Fundamental Problems
• No accurate method in higher dimensions
• Approximations methods
• „Simulated“ dimensions (color, size, shape)
• Animations?
• No notion of quality or accuracy for
Visualizations
• Information Theory?
• „Stability“?
All Visualizations are wrong, but some are useful.
18
Approximation methods
• Pair Plots
• Axis-aligned projections
• Interpretable in terms of original variables
• Singular Value Decomposition
• Optimal with respect to 2-norm (Euclidean norm)
and supremum norm
• Comes with an error estimate
• Other methods
• Stochastic Neighborhood Embedding (t-SNE)
• „Manifold Learning“
19
Demo
20
Manifold Learning Methods
• Locally Linear Embedding
• Neighborhood-preserving embedding
• Isomap
• quasi-isometric
• Multi-dimensional scaling
• quasi-isometric
• Spectral Embedding
• Spectral clustering based on similarity
• Stochastic Neighborhood Embedding (SNE, t-SNE)
• preserves probabilities
• Local Tangent Space Alignement (LTSA)
21
Local Tangent Space Alignement
22
Principal Components and Curves
• Principal Component Analysis
• orthogonal decomposition based on SVD
• linear in all variables
• tries to preserve variance
• Principal Curves
• minimize the Sum of Squared Errors with respect
to all variables (as PCA, preserve variance)
• nonlinear
• smooth
23
Principal Components and Curves
24
Parallel Coordinates
• Parallel Coordinates
• especially useful for high-dimensional data
• depends on ordering and scaling
25
Demo
26
The Grand Tour
• Animated sequence of 2-D projections
• https://en.wikipedia.org/wiki/
Grand_Tour_(data_visualisation)
• Asimov (1985): The grand tour: a tool for viewing
multidimensional data.
• Underlying idea
• Randomly generate 2-D projections (random
walk)
• Over time generate a dense subset of all
possible 2-D projections
• Optional: Follow a given path / guided tour
27
The Grand Tour
28
The Grand Tour
29
Demo
30
31
Thanks a lot!
www.codecentric.de
blog.codecentric.de
stefan.kuehn@codecentric.de
datascience@codecentric.de

More Related Content

Viewers also liked

勾配法
勾配法勾配法
勾配法
貴之 八木
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
Kai-Wen Zhao
 
主成分分析
主成分分析主成分分析
主成分分析
貴之 八木
 
トピックモデル
トピックモデルトピックモデル
トピックモデル
貴之 八木
 
t-SNE
t-SNEt-SNE
自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み
tm_2648
 
11 ak45b5 5
11 ak45b5 511 ak45b5 5
11 ak45b5 5
crom68
 
自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭
Yuya Unno
 
word2vec - From theory to practice
word2vec - From theory to practiceword2vec - From theory to practice
word2vec - From theory to practice
hen_drik
 
Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所
Ogushi Masaya
 
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
STAIR Lab, Chiba Institute of Technology
 
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
tmprcd12345
 
fastTextの実装を見てみた
fastTextの実装を見てみたfastTextの実装を見てみた
fastTextの実装を見てみた
Yoshihiko Shiraki
 
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
tmprcd12345
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
Tomoki Hayashi
 
Chainerの使い方と 自然言語処理への応用
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と 自然言語処理への応用
Yuya Unno
 
対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習
対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習
対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習
Takahiro Kubo
 
Chainerのテスト環境とDockerでのCUDAの利用
Chainerのテスト環境とDockerでのCUDAの利用Chainerのテスト環境とDockerでのCUDAの利用
Chainerのテスト環境とDockerでのCUDAの利用
Yuya Unno
 
Deep Learningと自然言語処理
Deep Learningと自然言語処理Deep Learningと自然言語処理
Deep Learningと自然言語処理
Preferred Networks
 
Convolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をするConvolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をする
Daiki Shimada
 

Viewers also liked (20)

勾配法
勾配法勾配法
勾配法
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
主成分分析
主成分分析主成分分析
主成分分析
 
トピックモデル
トピックモデルトピックモデル
トピックモデル
 
t-SNE
t-SNEt-SNE
t-SNE
 
自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み
 
11 ak45b5 5
11 ak45b5 511 ak45b5 5
11 ak45b5 5
 
自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭
 
word2vec - From theory to practice
word2vec - From theory to practiceword2vec - From theory to practice
word2vec - From theory to practice
 
Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所
 
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
 
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
 
fastTextの実装を見てみた
fastTextの実装を見てみたfastTextの実装を見てみた
fastTextの実装を見てみた
 
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
Chainerの使い方と 自然言語処理への応用
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と 自然言語処理への応用
 
対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習
対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習
対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習
 
Chainerのテスト環境とDockerでのCUDAの利用
Chainerのテスト環境とDockerでのCUDAの利用Chainerのテスト環境とDockerでのCUDAの利用
Chainerのテスト環境とDockerでのCUDAの利用
 
Deep Learningと自然言語処理
Deep Learningと自然言語処理Deep Learningと自然言語処理
Deep Learningと自然言語処理
 
Convolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をするConvolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をする
 

Similar to Data Visualization at codetalks 2016

Visualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional DataVisualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional Data
Stefan Kühn
 
CS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptxCS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptx
PrudhvirajEluri1
 
L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
Machine Learning Valencia
 
Introduction to Data Mining - A Beginner's Guide
Introduction to Data Mining - A Beginner's GuideIntroduction to Data Mining - A Beginner's Guide
Introduction to Data Mining - A Beginner's Guide
gokulprasath06
 
data mining
data miningdata mining
data mining
Rahul Rock
 
Part1
Part1Part1
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
huguk
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
Pier Luca Lanzi
 
data mining
data miningdata mining
data mining
nehaanand123
 
Part1
Part1Part1
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
Jan Aerts
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
Pier Luca Lanzi
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
Krishna Sankar
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
DamianMingle
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
Quinton Anderson
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Data
dr_jp_ebejer
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
tboubez
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
Maloy Manna, PMP®
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...
tboubez
 
Data in science
Data in science Data in science
Data in science
Sreejith Aravindakshan
 

Similar to Data Visualization at codetalks 2016 (20)

Visualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional DataVisualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional Data
 
CS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptxCS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptx
 
L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
 
Introduction to Data Mining - A Beginner's Guide
Introduction to Data Mining - A Beginner's GuideIntroduction to Data Mining - A Beginner's Guide
Introduction to Data Mining - A Beginner's Guide
 
data mining
data miningdata mining
data mining
 
Part1
Part1Part1
Part1
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
 
data mining
data miningdata mining
data mining
 
Part1
Part1Part1
Part1
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Data
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...
 
Data in science
Data in science Data in science
Data in science
 

More from Stefan Kühn

data2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdfdata2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdf
Stefan Kühn
 
data2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdfdata2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdf
Stefan Kühn
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and Applications
Stefan Kühn
 
Data Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational ChangeData Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational Change
Stefan Kühn
 
Interactive Dashboards with R
Interactive Dashboards with RInteractive Dashboards with R
Interactive Dashboards with R
Stefan Kühn
 
Talk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and ApplicationsTalk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and Applications
Stefan Kühn
 
Bridging the gap
Bridging the gapBridging the gap
Bridging the gap
Stefan Kühn
 
The Machinery behind Deep Learning
The Machinery behind Deep LearningThe Machinery behind Deep Learning
The Machinery behind Deep Learning
Stefan Kühn
 
Manifold Learning and Data Visualization
Manifold Learning and Data VisualizationManifold Learning and Data Visualization
Manifold Learning and Data Visualization
Stefan Kühn
 
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing SolutionsBecoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Stefan Kühn
 
Learning To Rank data2day 2017
Learning To Rank data2day 2017Learning To Rank data2day 2017
Learning To Rank data2day 2017
Stefan Kühn
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization Methods
Stefan Kühn
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
Stefan Kühn
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015
Stefan Kühn
 
SKuehn_Talk_FootballAnalytics_data2day2015
SKuehn_Talk_FootballAnalytics_data2day2015SKuehn_Talk_FootballAnalytics_data2day2015
SKuehn_Talk_FootballAnalytics_data2day2015
Stefan Kühn
 

More from Stefan Kühn (15)

data2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdfdata2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdf
 
data2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdfdata2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdf
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and Applications
 
Data Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational ChangeData Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational Change
 
Interactive Dashboards with R
Interactive Dashboards with RInteractive Dashboards with R
Interactive Dashboards with R
 
Talk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and ApplicationsTalk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and Applications
 
Bridging the gap
Bridging the gapBridging the gap
Bridging the gap
 
The Machinery behind Deep Learning
The Machinery behind Deep LearningThe Machinery behind Deep Learning
The Machinery behind Deep Learning
 
Manifold Learning and Data Visualization
Manifold Learning and Data VisualizationManifold Learning and Data Visualization
Manifold Learning and Data Visualization
 
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing SolutionsBecoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
 
Learning To Rank data2day 2017
Learning To Rank data2day 2017Learning To Rank data2day 2017
Learning To Rank data2day 2017
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization Methods
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015
 
SKuehn_Talk_FootballAnalytics_data2day2015
SKuehn_Talk_FootballAnalytics_data2day2015SKuehn_Talk_FootballAnalytics_data2day2015
SKuehn_Talk_FootballAnalytics_data2day2015
 

Recently uploaded

Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
gargnatasha985
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Samuel Jackson
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
kittycrispy617
 
ch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ssch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ss
MinThetLwin1
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
birajmohan012
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
satpalsheravatmumbai
 
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
sheetal singh$A17
 
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
vrvipin164
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
tanupasswan6
 
Data Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learnersData Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learners
mohamed Ibrahim
 
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
norina2645
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
fatima shekh$A17
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
Joel Ngushwai
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
revolutionary575
 
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdfCMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
IndranilDasgupta19
 
Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)
Alireza Kamrani
 
CHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptxCHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptx
girewiy968
 
History and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big DataHistory and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big Data
Jongwook Woo
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
erynsouthern
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
sharonblush
 

Recently uploaded (20)

Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in CityGirls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
Girls Call Vadodara 000XX00000 Provide Best And Top Girl Service And No1 in City
 
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion dataTowards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
Towards an Analysis-Ready, Cloud-Optimised service for FAIR fusion data
 
Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...Experience, Excellence & Commitment are the characteristics that describe Fla...
Experience, Excellence & Commitment are the characteristics that describe Fla...
 
ch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ssch8_multiplexing cs553 st07 slide share ss
ch8_multiplexing cs553 st07 slide share ss
 
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
Beautiful Girls Call Pune 000XX00000 Provide Best And Top Girl Service And No...
 
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
VIP Kanpur Girls Call Kanpur 0X0000000X Doorstep High-Profile Girl Service Ca...
 
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
Exclusive Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service...
 
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
Coimbatore Girls call Service 000XX00000 Provide Best And Top Girl Service An...
 
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
New Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And N...
 
Data Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learnersData Preprocessing Cheatsheet for learners
Data Preprocessing Cheatsheet for learners
 
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
Mumbai Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service And ...
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
 
Biometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdfBiometric Question Bank 2021 - 1 Soln-1.pdf
Biometric Question Bank 2021 - 1 Soln-1.pdf
 
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
Verified Girls Call Andheri 9930245274 Unlimited Short Providing Girls Servic...
 
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdfCMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
CMO MRM_May 2024 WITH BREAKDOWN AND IMPROVEMENTDATA.pdf
 
Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)Oracle Database Desupported Features on 23ai (Part B)
Oracle Database Desupported Features on 23ai (Part B)
 
CHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptxCHAPTER-1-Introduction-to-Marketing.pptx
CHAPTER-1-Introduction-to-Marketing.pptx
 
History and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big DataHistory and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big Data
 
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDeliveryBDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
BDSM Girls Call Mumbai 👀 9820252231 👀 Cash Payment With Room DeliveryDelivery
 
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
Best Girls Call Navi Mumbai 9930245274 Provide Best And Top Girl Service And ...
 

Data Visualization at codetalks 2016

  • 1. Visualizing High-dimensional Data Dr. Stefan Kühn Lead Data Scientist code.talks - 30.09.2016
  • 2. Overview • Data Visualization as you might know it • Main Properties of Graphics (and Humans) • A short story about Charts • Pair Plots and Correlations • Data Visualization as you might not know it • Fundamental Problems • SVD, t-SNE and other approximations • Principal Components and Principal Curves • Parallel Coordinates • The Grand Tour 2
  • 3. Takeaways - hopefully ;-) • Data Visualization is complicated • There is always an approximation • There is always a bias • There is always a misinterpretation • Data Visualization is simple • Lots of packages available • Lots of studies + literature • Lots of examples to learn from 3
  • 5. The Modes of Perception 5 X X X X X X X O O O O O O O O X X X X O O O O X X X X X X X X O O O O O O O O X X X X X O O O Fast Slow Find the outlier
  • 6. The Modes of Perception 6 • Pre-attentive • fast • parallel processing • effortless • Pattern recognition • semi-fast • governed by laws of per by • Attentive • slow • sequential • high effort (attention a very limited resource)
  • 7. Main Properties of Graphics 7 Category Example Position Shape Size Color Orientation (Line) Length (Line) Type and Size (Line) Brightness
  • 8. Main Properties of Graphics Humans 8 Category Amount of pre-attentive information Position very high Shape ——— Size approx. 4 Color approx. 8 Orientation (Line) approx. 4 Length (Line) ——— Type and Size (Line) ——— Brightness approx. 8
  • 9. Pre-attentive perception 9 • Position • fast • effective • high number of different positions • Color • use with care • Shape • Orientation Pre-attentive perception is effortless. Exploit this as much as you can.
  • 10. Pattern detection 10 „It is interesting to note that our brain […] subconsciously always prefers meaningful situations and objects.“ • Emergence • Reiification • Multi-stability • Invariance Pattern detection can be trained. Exploit this for frequent visualizations.
  • 14. Laws of Gestalt 14 „It is interesting to note that our brain, in accordance with the laws of Gestalt, subconsciously always prefers meaningful situations and objects.“
  • 15. Accuracy of Graphics 15 Square Pie vs Stacked Bar vs Pie vs Donut What do you think? https://eagereyes.org/blog/2016/a-reanalysis-of-a-study-about-square-pie-charts-from-2009
  • 18. Fundamental Problems • No accurate method in higher dimensions • Approximations methods • „Simulated“ dimensions (color, size, shape) • Animations? • No notion of quality or accuracy for Visualizations • Information Theory? • „Stability“? All Visualizations are wrong, but some are useful. 18
  • 19. Approximation methods • Pair Plots • Axis-aligned projections • Interpretable in terms of original variables • Singular Value Decomposition • Optimal with respect to 2-norm (Euclidean norm) and supremum norm • Comes with an error estimate • Other methods • Stochastic Neighborhood Embedding (t-SNE) • „Manifold Learning“ 19
  • 21. Manifold Learning Methods • Locally Linear Embedding • Neighborhood-preserving embedding • Isomap • quasi-isometric • Multi-dimensional scaling • quasi-isometric • Spectral Embedding • Spectral clustering based on similarity • Stochastic Neighborhood Embedding (SNE, t-SNE) • preserves probabilities • Local Tangent Space Alignement (LTSA) 21
  • 22. Local Tangent Space Alignement 22
  • 23. Principal Components and Curves • Principal Component Analysis • orthogonal decomposition based on SVD • linear in all variables • tries to preserve variance • Principal Curves • minimize the Sum of Squared Errors with respect to all variables (as PCA, preserve variance) • nonlinear • smooth 23
  • 25. Parallel Coordinates • Parallel Coordinates • especially useful for high-dimensional data • depends on ordering and scaling 25
  • 27. The Grand Tour • Animated sequence of 2-D projections • https://en.wikipedia.org/wiki/ Grand_Tour_(data_visualisation) • Asimov (1985): The grand tour: a tool for viewing multidimensional data. • Underlying idea • Randomly generate 2-D projections (random walk) • Over time generate a dense subset of all possible 2-D projections • Optional: Follow a given path / guided tour 27