Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai

Sri Ambati
Sri AmbatiCEO & Founder at H2O.ai
Chief Scientist, H2O
leland@h2o.ai
www.cs.uic.edu/~wilkinson
Automatic Visualization
Leland Wilkinson
Visualizing Big Data
• Complexity: Many functions are polynomial or exponential
• Curse of Dimensionality: distances tend toward constant as
• Chokepoint: Cannot send big data over the wire
• Real Estate: Cannot plot big data on the client
• Cheesy solutions in 2D
• Pixelate (too complex for higher dimensions)
• Project (usually violates triangle inequality for )
• Image maps (OK for popups and simple links, not for EDA)
• Viable solutions
• Aggregate (big n) to a few thousand rows
• Project (big p) to a few dozen columns
Big Data
set cover (core sets)
Outliers
Outliers
Outliers
Outliers
• An anomaly is an observation inconsistent with a set of beliefs.
• The anomaly depends on these beliefs
• An outlier is an observation inconsistent with a set of points.
• The points are presumed generated by a probabilistic process in a vector space.
• All outliers are anomalies but not all anomalies are outliers
• Some anomalies are logical or mathematical
• Outliers are probabilistic
• Outlier detection has more than a 200 year history.
• The goal was to reduce bias in models
• The goal today is to learn interesting stuff from examining outliers
• Statisticians no longer delete outliers. They use robust methods.
Outliers
Outliers
• Barnett & Lewis (1994), Outliers in Statistical Data.
• Rousseeuw & Leroy (1987). Robust Regression & Outlier Detection.
• Hartigan (1975) Clustering Algorithms.
Beauty is truth, truth beauty,—that is all
Ye know on earth, and all ye need to know.
Outliers
• Univariate outliers
• Distance from Center Rule
• Gaps Rule
Outliers
• Multivariate outliers
• Distance from Center Rule
• Gaps Rule
Outliers
1. Map categorical variables to continuous values (SVD).
2. If p large, use random projections to reduce dimensionality.
3. Normalize columns on [0, 1]
4. If n large, aggregate
• If p = 2, you could use gridding or hex binning
• But general solution is based on Hartigan’s Leader algorithm
5. Compute nearest neighbor distances between points.
6. Fit exponential distribution to largest distances.
7. Reject points in upper tail of this distribution.
Outliers
• Low-dimensional projections are not reliable ways to discover
high-dimensional outliers.
Outliers
• Parallel coordinates, SPLOMs, and other multivariate visualizations
are not reliable ways to discover high-dimensional outliers.
A
-4 -2 0 2 4
1 2
3
4
5
6
12
3
4
5
6
-4 -2 0 2 4
1 2
3
4
5
6
12
3
4
5
6
-4 -2 0 2 4
-4-2024
1 2
3
4
5
6
-4-2024
1
2
3
4
5
6
B 1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6
1
2
345
6
1
2
345
6
C
1
2
34 5
6
1
2
34 5
6
-4-2024
1
2
34 5
6
-4-2024
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
5
6
D
1
2
3
4
5
6
1
2
3
4
5
6
1
2
3
4
56
1
2
3
4
5 6
1
2
3
4
5 6
1
2
3
4
5 6
E
-4-2024
1
2
3
4
56
-4 -2 0 2 4
-4-2024
1
2 3
4
5
6
1
23
4
5
6
-4 -2 0 2 4
1
2 3
4
5
6
1
2 3
4
5
6
-4 -2 0 2 4
1
2 3
4
5
6
F
66
6
6
6 6
666
6
6
6
6
6 6 6
6
6
6
6
6 6 6
6
6
6
6
66
6
Outliers
• Popular ML algorithms are not reliable ways to identify outliers.
Scagnostics
• We characterize a scatterplot (2D point set) with nine measures.
• We base our measures on three geometric graphs.
• Convex Hull
• Alpha Shape
• Minimum Spanning Tree
Scagnostics
• Each geometric graph is a subset of the Delaunay triangulation
Scagnostics
X
Shape
13
Shape
2) Convex: ratio of area of alpha shape to the area of convex hull.
3) Skinny: ratio of perimeter to area of the alpha shape.
4) Stringy: ratio of diameter of MST to length of MST. Similar to skinny.
The diameter of a graph is the longest shortest path between a pair of its vertices.
Convex: area of alpha shape divided by area of convex hull
Skinny: ratio of perimeter to area of the alpha shape
Stringy: ratio of 2-degree vertices in MST to number of vertices > 1-degree
Scagnostics
X
Density
Skewed: ratio of (Q90 - Q50) / (Q90 - Q10),
where quantiles are on MST edge lengths
15
Density
7) Skewed: ratio of (Q90 - Q50) / (Q90 - Q10), where the quantiles are taken from the
MST edge lengths.
8) Clumpy: 1 minus the ratio of the longest edge in the largest runt (blue) to the
length of runt cutting edge (red).
The Hartigan RUNT statistic for a node of a hierarchical clustering tree is the
smaller of the number of leaves owned by each of its two children. We derive this
for each vertex in the MST using an edge-cutting algorithm.
largest runt
longest edge
in runt
Clumpy: 1 minus the ratio of the longest edge in the largest runt (blue) to the
length of runt-cutting edge (red)
15
Density
7) Skewed: ratio of (Q90 - Q50) / (Q90 - Q10), where the quantiles are taken from the
MST edge lengths.
8) Clumpy: 1 minus the ratio of the longest edge in the largest runt (blue) to the
length of runt cutting edge (red).
The Hartigan RUNT statistic for a node of a hierarchical clustering tree is the
smaller of the number of leaves owned by each of its two children. We derive this
for each vertex in the MST using an edge-cutting algorithm.
largest runt
longest edge
in runt
Outlying: proportion of total MST length due to edges adjacent to outliers
Scagnostics
X
Density
Sparse: 90th percentile of distribution of edge lengths in MST
Striated: proportion of all vertices in the MST that are degree-2 and have a
cosine between adjacent edges less than -.75
Scagnostics
Scagnostics
Scagnostics
AutoVis
Graham Wills and Leland Wilkinson. 2010. AutoVis: automatic visualization.
Information Visualization 9, 1 (March 2010), 47-69.
H2O AutoViz
Future Plans
1. Add brushing to graphics
2. Create case-weight vector for DAI (0 = exclude)
3. Suggest additional features to pass to DAI
4. Animate visualizations
5. Add natural language explanations to graphics.
Thank You!
1 of 27

Recommended

Kaggle Higgs Boson Machine Learning Challenge by
Kaggle Higgs Boson Machine Learning ChallengeKaggle Higgs Boson Machine Learning Challenge
Kaggle Higgs Boson Machine Learning ChallengeBernard Ong
781 views51 slides
H2O World - Ensembles with Erin LeDell by
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellSri Ambati
3.2K views16 slides
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8 by
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
3.9K views89 slides
GBM theory code and parameters by
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parametersVenkata Reddy Konasani
3.4K views110 slides
Winning Kaggle 101: Introduction to Stacking by
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
8.8K views21 slides
Machine learning with scikitlearn by
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
3.9K views31 slides

More Related Content

What's hot

Neural Networks made easy by
Neural Networks made easyNeural Networks made easy
Neural Networks made easyVenkata Reddy Konasani
1.9K views146 slides
Ppt shuai by
Ppt shuaiPpt shuai
Ppt shuaiXiang Zhang
1.5K views30 slides
Ml7 bagging by
Ml7 baggingMl7 bagging
Ml7 baggingankit_ppt
191 views51 slides
Ot regularization and_gradient_descent by
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descentankit_ppt
226 views74 slides
Intro to Machine Learning for non-Data Scientists by
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsParinaz Ameri
224 views68 slides
Linear Regression, Machine learning term by
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning termS Rulez
652 views19 slides

What's hot(20)

Ml7 bagging by ankit_ppt
Ml7 baggingMl7 bagging
Ml7 bagging
ankit_ppt191 views
Ot regularization and_gradient_descent by ankit_ppt
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
ankit_ppt226 views
Intro to Machine Learning for non-Data Scientists by Parinaz Ameri
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
Parinaz Ameri224 views
Linear Regression, Machine learning term by S Rulez
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning term
S Rulez652 views
Machine learning and_nlp by ankit_ppt
Machine learning and_nlpMachine learning and_nlp
Machine learning and_nlp
ankit_ppt264 views
Meetup_Consumer_Credit_Default_Vers_2_All by Bernard Ong
Meetup_Consumer_Credit_Default_Vers_2_AllMeetup_Consumer_Credit_Default_Vers_2_All
Meetup_Consumer_Credit_Default_Vers_2_All
Bernard Ong419 views
Learning On The Border:Active Learning in Imbalanced classification Data by 萍華 楊
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
萍華 楊1.1K views
Feature Reduction Techniques by Vishal Patel
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
Vishal Patel5.6K views
Jan vitek distributedrandomforest_5-2-2013 by Sri Ambati
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
Sri Ambati69.2K views
Hacking Predictive Modeling - RoadSec 2018 by HJ van Veen
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
HJ van Veen1.2K views
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel by Sri Ambati
H2O World - Top 10 Deep Learning Tips & Tricks - Arno CandelH2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
H2O World - Top 10 Deep Learning Tips & Tricks - Arno Candel
Sri Ambati10.3K views
Kaggle presentation by HJ van Veen
Kaggle presentationKaggle presentation
Kaggle presentation
HJ van Veen13.4K views
GBM package in r by mark_landry
GBM package in rGBM package in r
GBM package in r
mark_landry41.4K views
Machine learning and linear regression programming by Soumya Mukherjee
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
Soumya Mukherjee315 views
Machine Learning using Support Vector Machine by Mohsin Ul Haq
Machine Learning using Support Vector MachineMachine Learning using Support Vector Machine
Machine Learning using Support Vector Machine
Mohsin Ul Haq2K views

Similar to Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai

Automatic Visualization by
Automatic VisualizationAutomatic Visualization
Automatic VisualizationSri Ambati
288 views28 slides
ODSC India 2018: Topological space creation & Clustering at BigData scale by
ODSC India 2018: Topological space creation & Clustering at BigData scaleODSC India 2018: Topological space creation & Clustering at BigData scale
ODSC India 2018: Topological space creation & Clustering at BigData scaleKuldeep Jiwani
292 views54 slides
Introduction geostatistic for_mineral_resources by
Introduction geostatistic for_mineral_resourcesIntroduction geostatistic for_mineral_resources
Introduction geostatistic for_mineral_resourcesAdi Handarbeni
3.2K views27 slides
NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ... by
NS-CUK Seminar:H.B.Kim,  Review on "Asymmetric transitivity preserving graph ...NS-CUK Seminar:H.B.Kim,  Review on "Asymmetric transitivity preserving graph ...
NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...ssuser4b1f48
73 views26 slides
T7 data analysis by
T7 data analysisT7 data analysis
T7 data analysiskompellark
1.6K views39 slides
Mean shift and Hierarchical clustering by
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
4.1K views38 slides

Similar to Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai(20)

Automatic Visualization by Sri Ambati
Automatic VisualizationAutomatic Visualization
Automatic Visualization
Sri Ambati288 views
ODSC India 2018: Topological space creation & Clustering at BigData scale by Kuldeep Jiwani
ODSC India 2018: Topological space creation & Clustering at BigData scaleODSC India 2018: Topological space creation & Clustering at BigData scale
ODSC India 2018: Topological space creation & Clustering at BigData scale
Kuldeep Jiwani292 views
Introduction geostatistic for_mineral_resources by Adi Handarbeni
Introduction geostatistic for_mineral_resourcesIntroduction geostatistic for_mineral_resources
Introduction geostatistic for_mineral_resources
Adi Handarbeni3.2K views
NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ... by ssuser4b1f48
NS-CUK Seminar:H.B.Kim,  Review on "Asymmetric transitivity preserving graph ...NS-CUK Seminar:H.B.Kim,  Review on "Asymmetric transitivity preserving graph ...
NS-CUK Seminar:H.B.Kim, Review on "Asymmetric transitivity preserving graph ...
ssuser4b1f4873 views
T7 data analysis by kompellark
T7 data analysisT7 data analysis
T7 data analysis
kompellark1.6K views
Mean shift and Hierarchical clustering by Yan Xu
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
Yan Xu4.1K views
Minicourse on Network Science by Pavel Loskot
Minicourse on Network ScienceMinicourse on Network Science
Minicourse on Network Science
Pavel Loskot4.2K views
Module-5-1_230523_171754 (1).pdf by vikasmittal92
Module-5-1_230523_171754 (1).pdfModule-5-1_230523_171754 (1).pdf
Module-5-1_230523_171754 (1).pdf
vikasmittal9218 views
Machine Learning Foundations for Professional Managers by Albert Y. C. Chen
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
Albert Y. C. Chen523 views
Oxford 05-oct-2012 by Ted Dunning
Oxford 05-oct-2012Oxford 05-oct-2012
Oxford 05-oct-2012
Ted Dunning618 views
Transition-based Dependency Parsing with Selectional Branching by Jinho Choi
Transition-based Dependency Parsing with Selectional BranchingTransition-based Dependency Parsing with Selectional Branching
Transition-based Dependency Parsing with Selectional Branching
Jinho Choi548 views
Lect 3 background mathematics by hktripathy
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
hktripathy1.2K views
clustering tendency by Amir Shokri
clustering tendencyclustering tendency
clustering tendency
Amir Shokri69 views

More from Sri Ambati

AI and the Future of Software Development: A Sneak Peek by
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
20 views12 slides
LLMOps: Match report from the top of the 5th by
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
9 views36 slides
Building, Evaluating, and Optimizing your RAG App for Production by
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
27 views30 slides
Building LLM Solutions using Open Source and Closed Source Solutions in Coher... by
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
8 views17 slides
Open-Source AI: Community is the Way by
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
7 views12 slides
Building Custom GenAI Apps at H2O by
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
17 views9 slides

More from Sri Ambati(20)

AI and the Future of Software Development: A Sneak Peek by Sri Ambati
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati20 views
LLMOps: Match report from the top of the 5th by Sri Ambati
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati9 views
Building, Evaluating, and Optimizing your RAG App for Production by Sri Ambati
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati27 views
Building LLM Solutions using Open Source and Closed Source Solutions in Coher... by Sri Ambati
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati8 views
Open-Source AI: Community is the Way by Sri Ambati
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati7 views
Building Custom GenAI Apps at H2O by Sri Ambati
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati17 views
Applied Gen AI for the Finance Vertical by Sri Ambati
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati13 views
Cutting Edge Tricks from LLM Papers by Sri Ambati
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati9 views
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren... by Sri Ambati
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati13 views
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and... by Sri Ambati
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati15 views
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C... by Sri Ambati
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati9 views
LLM Interpretability by Sri Ambati
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati7 views
Never Reply to an Email Again by Sri Ambati
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati9 views
Introducción al Aprendizaje Automatico con H2O-3 (1) by Sri Ambati
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati2.7K views
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use... by Sri Ambati
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati1.7K views
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo... by Sri Ambati
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati1.1K views
AI Foundations Course Module 1 - An AI Transformation Journey by Sri Ambati
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
Sri Ambati843 views
ML Model Deployment and Scoring on the Edge with Automatic ML & DF by Sri Ambati
ML Model Deployment and Scoring on the Edge with Automatic ML & DFML Model Deployment and Scoring on the Edge with Automatic ML & DF
ML Model Deployment and Scoring on the Edge with Automatic ML & DF
Sri Ambati683 views
Scaling & Managing Production Deployments with H2O ModelOps by Sri Ambati
Scaling & Managing Production Deployments with H2O ModelOpsScaling & Managing Production Deployments with H2O ModelOps
Scaling & Managing Production Deployments with H2O ModelOps
Sri Ambati556 views
Automatic Model Documentation with H2O by Sri Ambati
Automatic Model Documentation with H2OAutomatic Model Documentation with H2O
Automatic Model Documentation with H2O
Sri Ambati1.5K views

Recently uploaded

VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueShapeBlue
207 views54 slides
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...The Digital Insurer
91 views52 slides
The Power of Heat Decarbonisation Plans in the Built Environment by
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
84 views20 slides
Evaluation of Quality of Experience of ABR Schemes in Gaming Stream by
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamEvaluation of Quality of Experience of ABR Schemes in Gaming Stream
Evaluation of Quality of Experience of ABR Schemes in Gaming StreamAlpen-Adria-Universität
38 views34 slides
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...ShapeBlue
196 views62 slides
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITShapeBlue
208 views8 slides

Recently uploaded(20)

VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue by ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlueVNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
VNF Integration and Support in CloudStack - Wei Zhou - ShapeBlue
ShapeBlue207 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE84 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue196 views
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue208 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue265 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty65 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue164 views
Transcript: Redefining the book supply chain: A glimpse into the future - Tec... by BookNet Canada
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
Transcript: Redefining the book supply chain: A glimpse into the future - Tec...
BookNet Canada41 views
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... by ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue108 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
"Package management in monorepos", Zoltan Kochan by Fwdays
"Package management in monorepos", Zoltan Kochan"Package management in monorepos", Zoltan Kochan
"Package management in monorepos", Zoltan Kochan
Fwdays34 views
Business Analyst Series 2023 - Week 4 Session 8 by DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10145 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue120 views
Initiating and Advancing Your Strategic GIS Governance Strategy by Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software184 views
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 by BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada44 views

Automatic Visualization - Leland Wilkinson, Chief Scientist, H2O.ai

  • 2. Visualizing Big Data • Complexity: Many functions are polynomial or exponential • Curse of Dimensionality: distances tend toward constant as • Chokepoint: Cannot send big data over the wire • Real Estate: Cannot plot big data on the client • Cheesy solutions in 2D • Pixelate (too complex for higher dimensions) • Project (usually violates triangle inequality for ) • Image maps (OK for popups and simple links, not for EDA) • Viable solutions • Aggregate (big n) to a few thousand rows • Project (big p) to a few dozen columns
  • 3. Big Data set cover (core sets)
  • 8. • An anomaly is an observation inconsistent with a set of beliefs. • The anomaly depends on these beliefs • An outlier is an observation inconsistent with a set of points. • The points are presumed generated by a probabilistic process in a vector space. • All outliers are anomalies but not all anomalies are outliers • Some anomalies are logical or mathematical • Outliers are probabilistic • Outlier detection has more than a 200 year history. • The goal was to reduce bias in models • The goal today is to learn interesting stuff from examining outliers • Statisticians no longer delete outliers. They use robust methods. Outliers
  • 9. Outliers • Barnett & Lewis (1994), Outliers in Statistical Data. • Rousseeuw & Leroy (1987). Robust Regression & Outlier Detection. • Hartigan (1975) Clustering Algorithms. Beauty is truth, truth beauty,—that is all Ye know on earth, and all ye need to know.
  • 10. Outliers • Univariate outliers • Distance from Center Rule • Gaps Rule
  • 11. Outliers • Multivariate outliers • Distance from Center Rule • Gaps Rule
  • 12. Outliers 1. Map categorical variables to continuous values (SVD). 2. If p large, use random projections to reduce dimensionality. 3. Normalize columns on [0, 1] 4. If n large, aggregate • If p = 2, you could use gridding or hex binning • But general solution is based on Hartigan’s Leader algorithm 5. Compute nearest neighbor distances between points. 6. Fit exponential distribution to largest distances. 7. Reject points in upper tail of this distribution.
  • 13. Outliers • Low-dimensional projections are not reliable ways to discover high-dimensional outliers.
  • 14. Outliers • Parallel coordinates, SPLOMs, and other multivariate visualizations are not reliable ways to discover high-dimensional outliers. A -4 -2 0 2 4 1 2 3 4 5 6 12 3 4 5 6 -4 -2 0 2 4 1 2 3 4 5 6 12 3 4 5 6 -4 -2 0 2 4 -4-2024 1 2 3 4 5 6 -4-2024 1 2 3 4 5 6 B 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 1 2 345 6 1 2 345 6 C 1 2 34 5 6 1 2 34 5 6 -4-2024 1 2 34 5 6 -4-2024 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 D 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 56 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6 E -4-2024 1 2 3 4 56 -4 -2 0 2 4 -4-2024 1 2 3 4 5 6 1 23 4 5 6 -4 -2 0 2 4 1 2 3 4 5 6 1 2 3 4 5 6 -4 -2 0 2 4 1 2 3 4 5 6 F 66 6 6 6 6 666 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 66 6
  • 15. Outliers • Popular ML algorithms are not reliable ways to identify outliers.
  • 16. Scagnostics • We characterize a scatterplot (2D point set) with nine measures. • We base our measures on three geometric graphs. • Convex Hull • Alpha Shape • Minimum Spanning Tree
  • 17. Scagnostics • Each geometric graph is a subset of the Delaunay triangulation
  • 18. Scagnostics X Shape 13 Shape 2) Convex: ratio of area of alpha shape to the area of convex hull. 3) Skinny: ratio of perimeter to area of the alpha shape. 4) Stringy: ratio of diameter of MST to length of MST. Similar to skinny. The diameter of a graph is the longest shortest path between a pair of its vertices. Convex: area of alpha shape divided by area of convex hull Skinny: ratio of perimeter to area of the alpha shape Stringy: ratio of 2-degree vertices in MST to number of vertices > 1-degree
  • 19. Scagnostics X Density Skewed: ratio of (Q90 - Q50) / (Q90 - Q10), where quantiles are on MST edge lengths 15 Density 7) Skewed: ratio of (Q90 - Q50) / (Q90 - Q10), where the quantiles are taken from the MST edge lengths. 8) Clumpy: 1 minus the ratio of the longest edge in the largest runt (blue) to the length of runt cutting edge (red). The Hartigan RUNT statistic for a node of a hierarchical clustering tree is the smaller of the number of leaves owned by each of its two children. We derive this for each vertex in the MST using an edge-cutting algorithm. largest runt longest edge in runt Clumpy: 1 minus the ratio of the longest edge in the largest runt (blue) to the length of runt-cutting edge (red) 15 Density 7) Skewed: ratio of (Q90 - Q50) / (Q90 - Q10), where the quantiles are taken from the MST edge lengths. 8) Clumpy: 1 minus the ratio of the longest edge in the largest runt (blue) to the length of runt cutting edge (red). The Hartigan RUNT statistic for a node of a hierarchical clustering tree is the smaller of the number of leaves owned by each of its two children. We derive this for each vertex in the MST using an edge-cutting algorithm. largest runt longest edge in runt Outlying: proportion of total MST length due to edges adjacent to outliers
  • 20. Scagnostics X Density Sparse: 90th percentile of distribution of edge lengths in MST Striated: proportion of all vertices in the MST that are degree-2 and have a cosine between adjacent edges less than -.75
  • 24. AutoVis Graham Wills and Leland Wilkinson. 2010. AutoVis: automatic visualization. Information Visualization 9, 1 (March 2010), 47-69.
  • 26. Future Plans 1. Add brushing to graphics 2. Create case-weight vector for DAI (0 = exclude) 3. Suggest additional features to pass to DAI 4. Animate visualizations 5. Add natural language explanations to graphics.