SlideShare a Scribd company logo
Visualizing
High-dimensional
Data
Dr. Stefan Kühn
Lead Data Scientist
code.talks - 30.09.2016
Overview
• Data Visualization as you might know it
• Main Properties of Graphics (and Humans)
• A short story about Charts
• Pair Plots and Correlations
• Data Visualization as you might not know it
• Fundamental Problems
• SVD, t-SNE and other approximations
• Principal Components and Principal Curves
• Parallel Coordinates
• The Grand Tour
2
Takeaways - hopefully ;-)
• Data Visualization is complicated
• There is always an approximation
• There is always a bias
• There is always a misinterpretation
• Data Visualization is simple
• Lots of packages available
• Lots of studies + literature
• Lots of examples to learn from
3
Data Visualization Basics
4
The Modes of Perception
5
X
X
X
X
X
X
X
O
O
O O
O
O
O
O
X
X
X
X
O
O
O
O
X
X
X
X
X
X
X
X
O
O
O O
O
O
O
O
X
X X
X
X
O
O
O
Fast Slow
Find the outlier
The Modes of Perception
6
• Pre-attentive
• fast
• parallel processing
• effortless
• Pattern recognition
• semi-fast
• governed by laws of per by
• Attentive
• slow
• sequential
• high effort (attention a very limited resource)
Main Properties of Graphics
7
Category Example
Position
Shape
Size
Color
Orientation (Line)
Length (Line)
Type and Size (Line)
Brightness
Main Properties of Graphics Humans
8
Category Amount of pre-attentive information
Position very high
Shape ———
Size approx. 4
Color approx. 8
Orientation (Line) approx. 4
Length (Line) ———
Type and Size (Line) ———
Brightness approx. 8
Pre-attentive perception
9
• Position
• fast
• effective
• high number of different positions
• Color
• use with care
• Shape
• Orientation
Pre-attentive perception is effortless.
Exploit this as much as you can.
Pattern detection
10
„It is interesting to note that our brain […]
subconsciously always prefers meaningful
situations and objects.“
• Emergence
• Reiification
• Multi-stability
• Invariance
Pattern detection can be trained.
Exploit this for frequent visualizations.
What is this?
11
What is this?
12
What is this?
13
Laws of Gestalt
14
„It is interesting to note that our brain, in
accordance with the laws of Gestalt,
subconsciously always prefers meaningful
situations and objects.“
Accuracy of Graphics
15
Square Pie vs Stacked Bar vs Pie vs Donut
What do you think?
https://eagereyes.org/blog/2016/a-reanalysis-of-a-study-about-square-pie-charts-from-2009
Demo
16
Beyond the Basics
17
Fundamental Problems
• No accurate method in higher dimensions
• Approximations methods
• „Simulated“ dimensions (color, size, shape)
• Animations?
• No notion of quality or accuracy for
Visualizations
• Information Theory?
• „Stability“?
All Visualizations are wrong, but some are useful.
18
Approximation methods
• Pair Plots
• Axis-aligned projections
• Interpretable in terms of original variables
• Singular Value Decomposition
• Optimal with respect to 2-norm (Euclidean norm)
and supremum norm
• Comes with an error estimate
• Other methods
• Stochastic Neighborhood Embedding (t-SNE)
• „Manifold Learning“
19
Demo
20
Manifold Learning Methods
• Locally Linear Embedding
• Neighborhood-preserving embedding
• Isomap
• quasi-isometric
• Multi-dimensional scaling
• quasi-isometric
• Spectral Embedding
• Spectral clustering based on similarity
• Stochastic Neighborhood Embedding (SNE, t-SNE)
• preserves probabilities
• Local Tangent Space Alignement (LTSA)
21
Local Tangent Space Alignement
22
Principal Components and Curves
• Principal Component Analysis
• orthogonal decomposition based on SVD
• linear in all variables
• tries to preserve variance
• Principal Curves
• minimize the Sum of Squared Errors with respect
to all variables (as PCA, preserve variance)
• nonlinear
• smooth
23
Principal Components and Curves
24
Parallel Coordinates
• Parallel Coordinates
• especially useful for high-dimensional data
• depends on ordering and scaling
25
Demo
26
The Grand Tour
• Animated sequence of 2-D projections
• https://en.wikipedia.org/wiki/
Grand_Tour_(data_visualisation)
• Asimov (1985): The grand tour: a tool for viewing
multidimensional data.
• Underlying idea
• Randomly generate 2-D projections (random
walk)
• Over time generate a dense subset of all
possible 2-D projections
• Optional: Follow a given path / guided tour
27
The Grand Tour
28
The Grand Tour
29
Demo
30
31
Thanks a lot!
www.codecentric.de
blog.codecentric.de
stefan.kuehn@codecentric.de
datascience@codecentric.de

More Related Content

Viewers also liked

勾配法
勾配法勾配法
勾配法
貴之 八木
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
Kai-Wen Zhao
 
主成分分析
主成分分析主成分分析
主成分分析
貴之 八木
 
トピックモデル
トピックモデルトピックモデル
トピックモデル
貴之 八木
 
t-SNE
t-SNEt-SNE
自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み
tm_2648
 
11 ak45b5 5
11 ak45b5 511 ak45b5 5
11 ak45b5 5crom68
 
自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭
Yuya Unno
 
word2vec - From theory to practice
word2vec - From theory to practiceword2vec - From theory to practice
word2vec - From theory to practice
hen_drik
 
Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所
Ogushi Masaya
 
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
STAIR Lab, Chiba Institute of Technology
 
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
tmprcd12345
 
fastTextの実装を見てみた
fastTextの実装を見てみたfastTextの実装を見てみた
fastTextの実装を見てみた
Yoshihiko Shiraki
 
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
tmprcd12345
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
Tomoki Hayashi
 
Chainerの使い方と 自然言語処理への応用
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と 自然言語処理への応用
Yuya Unno
 
対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習
対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習
対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習
Takahiro Kubo
 
Chainerのテスト環境とDockerでのCUDAの利用
Chainerのテスト環境とDockerでのCUDAの利用Chainerのテスト環境とDockerでのCUDAの利用
Chainerのテスト環境とDockerでのCUDAの利用
Yuya Unno
 
Deep Learningと自然言語処理
Deep Learningと自然言語処理Deep Learningと自然言語処理
Deep Learningと自然言語処理
Preferred Networks
 
Convolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をするConvolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をする
Daiki Shimada
 

Viewers also liked (20)

勾配法
勾配法勾配法
勾配法
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
主成分分析
主成分分析主成分分析
主成分分析
 
トピックモデル
トピックモデルトピックモデル
トピックモデル
 
t-SNE
t-SNEt-SNE
t-SNE
 
自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み自然言語処理システムに想像力を与える試み
自然言語処理システムに想像力を与える試み
 
11 ak45b5 5
11 ak45b5 511 ak45b5 5
11 ak45b5 5
 
自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭自然言語処理@春の情報処理祭
自然言語処理@春の情報処理祭
 
word2vec - From theory to practice
word2vec - From theory to practiceword2vec - From theory to practice
word2vec - From theory to practice
 
Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所Step by Stepで学ぶ自然言語処理における深層学習の勘所
Step by Stepで学ぶ自然言語処理における深層学習の勘所
 
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
自然言語処理分野の最前線(ステアラボ人工知能シンポジウム2017)
 
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
新事業で目指す自然言語処理ビジネス、その未来 Machine Learning 15minutes! 発表資料
 
fastTextの実装を見てみた
fastTextの実装を見てみたfastTextの実装を見てみた
fastTextの実装を見てみた
 
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
最近の機械学習テクノロジーとビジネスの応用先 自然言語処理を中心に
 
Visualizing Data Using t-SNE
Visualizing Data Using t-SNEVisualizing Data Using t-SNE
Visualizing Data Using t-SNE
 
Chainerの使い方と 自然言語処理への応用
Chainerの使い方と自然言語処理への応用Chainerの使い方と自然言語処理への応用
Chainerの使い方と 自然言語処理への応用
 
対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習
対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習
対話破綻検出チャレンジ2016: NCMを用いた対話と破綻の同時学習
 
Chainerのテスト環境とDockerでのCUDAの利用
Chainerのテスト環境とDockerでのCUDAの利用Chainerのテスト環境とDockerでのCUDAの利用
Chainerのテスト環境とDockerでのCUDAの利用
 
Deep Learningと自然言語処理
Deep Learningと自然言語処理Deep Learningと自然言語処理
Deep Learningと自然言語処理
 
Convolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をするConvolutional Neural Netwoks で自然言語処理をする
Convolutional Neural Netwoks で自然言語処理をする
 

Similar to Data Visualization at codetalks 2016

Visualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional DataVisualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional Data
Stefan Kühn
 
CS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptxCS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptx
PrudhvirajEluri1
 
L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
Machine Learning Valencia
 
Introduction to Data Mining - A Beginner's Guide
Introduction to Data Mining - A Beginner's GuideIntroduction to Data Mining - A Beginner's Guide
Introduction to Data Mining - A Beginner's Guide
gokulprasath06
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
huguk
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
Pier Luca Lanzi
 
data mining
data miningdata mining
data mining
nehaanand123
 
Part1
Part1Part1
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
Jan Aerts
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
Pier Luca Lanzi
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
Krishna Sankar
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
DamianMingle
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
Quinton Anderson
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Data
dr_jp_ebejer
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
tboubez
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
Maloy Manna, PMP®
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...
tboubez
 
Data in science
Data in science Data in science
Data in science
Sreejith Aravindakshan
 

Similar to Data Visualization at codetalks 2016 (20)

Visualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional DataVisualizing and Communicating High-dimensional Data
Visualizing and Communicating High-dimensional Data
 
CS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptxCS194Lec0hbh6EDA.pptx
CS194Lec0hbh6EDA.pptx
 
L15. Machine Learning - Black Art
L15. Machine Learning - Black ArtL15. Machine Learning - Black Art
L15. Machine Learning - Black Art
 
Introduction to Data Mining - A Beginner's Guide
Introduction to Data Mining - A Beginner's GuideIntroduction to Data Mining - A Beginner's Guide
Introduction to Data Mining - A Beginner's Guide
 
data mining
data miningdata mining
data mining
 
Part1
Part1Part1
Part1
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
 
data mining
data miningdata mining
data mining
 
Part1
Part1Part1
Part1
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
Data Science Folk Knowledge
Data Science Folk KnowledgeData Science Folk Knowledge
Data Science Folk Knowledge
 
Creativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data ScienceCreativity and Curiosity - The Trial and Error of Data Science
Creativity and Curiosity - The Trial and Error of Data Science
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
 
Dive into the Data
Dive into the DataDive into the Data
Dive into the Data
 
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
Five Things I Learned While Building Anomaly Detection Tools - Toufic Boubez ...
 
Data Visualization in Data Science
Data Visualization in Data ScienceData Visualization in Data Science
Data Visualization in Data Science
 
Simple math for anomaly detection toufic boubez - metafor software - monito...
Simple math for anomaly detection   toufic boubez - metafor software - monito...Simple math for anomaly detection   toufic boubez - metafor software - monito...
Simple math for anomaly detection toufic boubez - metafor software - monito...
 
Data in science
Data in science Data in science
Data in science
 

More from Stefan Kühn

data2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdfdata2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdf
Stefan Kühn
 
data2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdfdata2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdf
Stefan Kühn
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and Applications
Stefan Kühn
 
Data Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational ChangeData Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational Change
Stefan Kühn
 
Interactive Dashboards with R
Interactive Dashboards with RInteractive Dashboards with R
Interactive Dashboards with R
Stefan Kühn
 
Talk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and ApplicationsTalk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and Applications
Stefan Kühn
 
Bridging the gap
Bridging the gapBridging the gap
Bridging the gap
Stefan Kühn
 
The Machinery behind Deep Learning
The Machinery behind Deep LearningThe Machinery behind Deep Learning
The Machinery behind Deep Learning
Stefan Kühn
 
Manifold Learning and Data Visualization
Manifold Learning and Data VisualizationManifold Learning and Data Visualization
Manifold Learning and Data Visualization
Stefan Kühn
 
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing SolutionsBecoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Stefan Kühn
 
Learning To Rank data2day 2017
Learning To Rank data2day 2017Learning To Rank data2day 2017
Learning To Rank data2day 2017
Stefan Kühn
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization Methods
Stefan Kühn
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
Stefan Kühn
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015
Stefan Kühn
 
SKuehn_Talk_FootballAnalytics_data2day2015
SKuehn_Talk_FootballAnalytics_data2day2015SKuehn_Talk_FootballAnalytics_data2day2015
SKuehn_Talk_FootballAnalytics_data2day2015
Stefan Kühn
 

More from Stefan Kühn (15)

data2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdfdata2day2023_SKuehn_DataPlatformFallacy.pdf
data2day2023_SKuehn_DataPlatformFallacy.pdf
 
data2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdfdata2day2022_SKuehn_DataValueChain.pdf
data2day2022_SKuehn_DataValueChain.pdf
 
Talk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and ApplicationsTalk at MCubed London about Manifold Learning and Applications
Talk at MCubed London about Manifold Learning and Applications
 
Data Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational ChangeData Science - Cargo Cult - Organizational Change
Data Science - Cargo Cult - Organizational Change
 
Interactive Dashboards with R
Interactive Dashboards with RInteractive Dashboards with R
Interactive Dashboards with R
 
Talk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and ApplicationsTalk at PyData Berlin about Manifold Learning and Applications
Talk at PyData Berlin about Manifold Learning and Applications
 
Bridging the gap
Bridging the gapBridging the gap
Bridging the gap
 
The Machinery behind Deep Learning
The Machinery behind Deep LearningThe Machinery behind Deep Learning
The Machinery behind Deep Learning
 
Manifold Learning and Data Visualization
Manifold Learning and Data VisualizationManifold Learning and Data Visualization
Manifold Learning and Data Visualization
 
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing SolutionsBecoming Data-driven - Machine Learning @ XING Marketing Solutions
Becoming Data-driven - Machine Learning @ XING Marketing Solutions
 
Learning To Rank data2day 2017
Learning To Rank data2day 2017Learning To Rank data2day 2017
Learning To Rank data2day 2017
 
Deep Learning and Optimization Methods
Deep Learning and Optimization MethodsDeep Learning and Optimization Methods
Deep Learning and Optimization Methods
 
Data quality - The True Big Data Challenge
Data quality - The True Big Data ChallengeData quality - The True Big Data Challenge
Data quality - The True Big Data Challenge
 
SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015SKuehn_MachineLearningAndOptimization_2015
SKuehn_MachineLearningAndOptimization_2015
 
SKuehn_Talk_FootballAnalytics_data2day2015
SKuehn_Talk_FootballAnalytics_data2day2015SKuehn_Talk_FootballAnalytics_data2day2015
SKuehn_Talk_FootballAnalytics_data2day2015
 

Recently uploaded

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 

Recently uploaded (20)

Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTESAdjusting OpenMP PageRank : SHORT REPORT / NOTES
Adjusting OpenMP PageRank : SHORT REPORT / NOTES
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 

Data Visualization at codetalks 2016

  • 1. Visualizing High-dimensional Data Dr. Stefan Kühn Lead Data Scientist code.talks - 30.09.2016
  • 2. Overview • Data Visualization as you might know it • Main Properties of Graphics (and Humans) • A short story about Charts • Pair Plots and Correlations • Data Visualization as you might not know it • Fundamental Problems • SVD, t-SNE and other approximations • Principal Components and Principal Curves • Parallel Coordinates • The Grand Tour 2
  • 3. Takeaways - hopefully ;-) • Data Visualization is complicated • There is always an approximation • There is always a bias • There is always a misinterpretation • Data Visualization is simple • Lots of packages available • Lots of studies + literature • Lots of examples to learn from 3
  • 5. The Modes of Perception 5 X X X X X X X O O O O O O O O X X X X O O O O X X X X X X X X O O O O O O O O X X X X X O O O Fast Slow Find the outlier
  • 6. The Modes of Perception 6 • Pre-attentive • fast • parallel processing • effortless • Pattern recognition • semi-fast • governed by laws of per by • Attentive • slow • sequential • high effort (attention a very limited resource)
  • 7. Main Properties of Graphics 7 Category Example Position Shape Size Color Orientation (Line) Length (Line) Type and Size (Line) Brightness
  • 8. Main Properties of Graphics Humans 8 Category Amount of pre-attentive information Position very high Shape ——— Size approx. 4 Color approx. 8 Orientation (Line) approx. 4 Length (Line) ——— Type and Size (Line) ——— Brightness approx. 8
  • 9. Pre-attentive perception 9 • Position • fast • effective • high number of different positions • Color • use with care • Shape • Orientation Pre-attentive perception is effortless. Exploit this as much as you can.
  • 10. Pattern detection 10 „It is interesting to note that our brain […] subconsciously always prefers meaningful situations and objects.“ • Emergence • Reiification • Multi-stability • Invariance Pattern detection can be trained. Exploit this for frequent visualizations.
  • 14. Laws of Gestalt 14 „It is interesting to note that our brain, in accordance with the laws of Gestalt, subconsciously always prefers meaningful situations and objects.“
  • 15. Accuracy of Graphics 15 Square Pie vs Stacked Bar vs Pie vs Donut What do you think? https://eagereyes.org/blog/2016/a-reanalysis-of-a-study-about-square-pie-charts-from-2009
  • 18. Fundamental Problems • No accurate method in higher dimensions • Approximations methods • „Simulated“ dimensions (color, size, shape) • Animations? • No notion of quality or accuracy for Visualizations • Information Theory? • „Stability“? All Visualizations are wrong, but some are useful. 18
  • 19. Approximation methods • Pair Plots • Axis-aligned projections • Interpretable in terms of original variables • Singular Value Decomposition • Optimal with respect to 2-norm (Euclidean norm) and supremum norm • Comes with an error estimate • Other methods • Stochastic Neighborhood Embedding (t-SNE) • „Manifold Learning“ 19
  • 21. Manifold Learning Methods • Locally Linear Embedding • Neighborhood-preserving embedding • Isomap • quasi-isometric • Multi-dimensional scaling • quasi-isometric • Spectral Embedding • Spectral clustering based on similarity • Stochastic Neighborhood Embedding (SNE, t-SNE) • preserves probabilities • Local Tangent Space Alignement (LTSA) 21
  • 22. Local Tangent Space Alignement 22
  • 23. Principal Components and Curves • Principal Component Analysis • orthogonal decomposition based on SVD • linear in all variables • tries to preserve variance • Principal Curves • minimize the Sum of Squared Errors with respect to all variables (as PCA, preserve variance) • nonlinear • smooth 23
  • 25. Parallel Coordinates • Parallel Coordinates • especially useful for high-dimensional data • depends on ordering and scaling 25
  • 27. The Grand Tour • Animated sequence of 2-D projections • https://en.wikipedia.org/wiki/ Grand_Tour_(data_visualisation) • Asimov (1985): The grand tour: a tool for viewing multidimensional data. • Underlying idea • Randomly generate 2-D projections (random walk) • Over time generate a dense subset of all possible 2-D projections • Optional: Follow a given path / guided tour 27