SlideShare a Scribd company logo
Visual Diagnostics
at Scale
EuroSciPy 2019
Dr. Rebecca Bilbro
Chief Data Scientist, ICX Media
Co-creator, Scikit-Yellowbrick
Author, Applied Text Analysis with Python
@rebeccabilbro
A tale of
three datasets
Census Dataset
500K instances
50 features
(age, occupation,
education, sex, ethnicity
marital status)
Sarcasm Dataset
50K instances
5K features
(“love”, 🙄, “totally”, “best”,
“surprise”, “Sherlock”,
capitalization, timestamp)
Sensor Dataset
5M instances
15 features
(Ammonia, Acetaldehyde,
Acetone, Ethylene, Ethanol,
Toluene ppmv)
Scaling pain
points are
dataset-
specific
● Many features
● Many instances
● Feature variance
● Heteroskedasticity
● Covariance
● Noise
Logistic Regression Fit Times (seconds)
500 - 5M instances / 5 - 50 features
10 seconds
Multilayer Perceptron Fit Times (seconds)
500 - 5M instances / 5 - 50 features
5 min, 48
seconds
Support Vector Machine Fit Times (seconds)
500 - 500K instances / 5 - 50 features
5 hours, 24
seconds
Support Vector Machine Fit Times (seconds)
500 - 500K instances / 5 - 50 features
5 hours, 24
seconds
😵
How to
optimize?
● Be patient
● Be wrong
● Be rich
● Steer
Adventures in
Model Visualization
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from yellowbrick.features import ParallelCoordinates
data = load_iris()
oz = ParallelCoordinates(ax=axes[idx], fast=True)
oz.fit_transform(data.data, data.target)
oz.finalize()
Each point drawn individually
as connected line segment
With standardization
Points grouped by class, each class
drawn as single segment
Bumps
class Estimator(object):
def fit(self, X, y=None):
"""
Fits estimator to data.
"""
# set state of self
return self
def predict(self, X):
"""
Predict response of X
"""
# compute predictions pred
return pred
class Transformer(Estimator):
def transform(self, X):
"""
Transforms the input data.
"""
# transform X to X_prime
return X_prime
class Pipeline(Transfomer):
@property
def named_steps(self):
"""
Returns a sequence of estimators
"""
return self.steps
@property
def _final_estimator(self):
"""
Terminating estimator
"""
return self.steps[-1]
The scikit-learn API
self.X
class Visualizer(Estimator):
def draw(self):
"""
Draw called from scikit-learn methods.
"""
return self.ax
def finalize(self):
self.set_title()
self.legend()
def poof(self):
self.finalize()
plt.show()
import matplotlib.pyplot as plt
from yellowbrick.base import Visualizer
class MyVisualizer(Visualizer):
def __init__(self, ax=None, **kwargs):
super(MyVisualizer, self).__init__(ax, **kwargs)
def fit(self, X, y=None):
self.draw(X)
return self
def draw(self, X):
if self.ax is None:
self.ax = self.gca()
self.ax.plt(X)
def finalize(self):
self.set_title("My Visualizer")
The Yellowbrick API
Oneliners
from sklearn.linear_model import Lasso
from yellowbrick.regressor import ResidualsPlot
# Option 1: scikit-learn style
viz = ResidualsPlot(Lasso())
viz.fit(X_train, y_train)
viz.score(X_test, y_test)
viz.poof()
from sklearn.linear_model import Lasso
from yellowbrick.regressor import residuals_plot
# Option 2: Quick Method
viz = residuals_plot(
Lasso(), X_train, y_train, X_test, y_test
)
��
Visual Comparison Tests
=========================================== test session starts ============================================
platform darwin -- Python 3.7.1, pytest-5.0.0, py-1.8.0, pluggy-0.12.0
rootdir: /Users/rbilbro/pyjects/yb, inifile: setup.cfg
plugins: flakes-4.0.0, cov-2.7.1
collected 932 items
tests/__init__.py s... [ 0%]
tests/base.py s [ 0%]
tests/conftest.py s [ 0%]
tests/fixtures.py s [ 0%]
tests/images.py s [ 0%]
tests/rand.py s [ 0%]
tests/test_base.py s............ [ 2%]
...........................................................................................................
...........................................................................................................
...........................................................................................................
...........................................................................................................
tests/test_utils/test_target.py s............ [ 68%]
tests/test_utils/test_timer.py s..... [ 68%]
tests/test_utils/test_types.py s.................................................................... [ 70%]
....x................................x.............................................................. [ 72%]
.... [ 73%]
tests/test_utils/test_wrapper.py s....
===================== 854 passed, 72 skipped, 6 xfailed, 33 warnings in 225.96 seconds =====================
Roadmap
Brushing and Filtering
Ok for only 5 features Not good for 23 features
Machine-learning oriented aggregation
YB (current) Seaborn
Parallelization with joblib
Elbow Curve Validation Curve
from yellowbrick.features import Rank2D
from yellowbrick.pipeline import VisualPipeline
from yellowbrick.model_selection import CVScores
from yellowbrick.regressor import PredictionError
viz_pipe = VisualPipeline([
('rank2d', Rank2D(features=features, algorithm='covariance')),
('prederr', PredictionError(model)),
('cvscores', CVScores(model, cv=cv, scoring='r2'))
])
Visual
Pipelines
Models are aggregations,
so are visualizations...
…so use visualizations
to steer model selection!
scikit-yb.org

More Related Content

What's hot

AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
Renjith M P
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
Joaquin Vanschoren
 
Part 2 Python
Part 2 PythonPart 2 Python
Part 2 Python
Mohamed Essam
 
Java performance
Java performanceJava performance
Java performance
Rajesuwer P. Singaravelu
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
BigML, Inc
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning Overview
Mykhailo Koval
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
Joaquin Vanschoren
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine Learning
Joaquin Vanschoren
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Intro to programing with java-lecture 3
Intro to programing with java-lecture 3Intro to programing with java-lecture 3
Intro to programing with java-lecture 3
Mohamed Essam
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
Pratap Dangeti
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
Joaquin Vanschoren
 
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalParallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalGianmario Spacagna
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...
Gianmario Spacagna
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
Parinaz Ameri
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à Z
Alexia Audevart
 
Linear Regression, Machine learning term
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning term
S Rulez
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
Joaquin Vanschoren
 
Intro to scikit-learn
Intro to scikit-learnIntro to scikit-learn
Intro to scikit-learn
AWeber
 

What's hot (20)

AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Start machine learning in 5 simple steps
Start machine learning in 5 simple stepsStart machine learning in 5 simple steps
Start machine learning in 5 simple steps
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Part 2 Python
Part 2 PythonPart 2 Python
Part 2 Python
 
Java performance
Java performanceJava performance
Java performance
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning Overview
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine Learning
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Intro to programing with java-lecture 3
Intro to programing with java-lecture 3Intro to programing with java-lecture 3
Intro to programing with java-lecture 3
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
 
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalParallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...
 
Intro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data ScientistsIntro to Machine Learning for non-Data Scientists
Intro to Machine Learning for non-Data Scientists
 
Le Machine Learning de A à Z
Le Machine Learning de A à ZLe Machine Learning de A à Z
Le Machine Learning de A à Z
 
Linear Regression, Machine learning term
Linear Regression, Machine learning termLinear Regression, Machine learning term
Linear Regression, Machine learning term
 
Learning how to learn
Learning how to learnLearning how to learn
Learning how to learn
 
Intro to scikit-learn
Intro to scikit-learnIntro to scikit-learn
Intro to scikit-learn
 

Similar to EuroSciPy 2019: Visual diagnostics at scale

Visual diagnostics at scale
Visual diagnostics at scaleVisual diagnostics at scale
Visual diagnostics at scale
Rebecca Bilbro
 
Spark for Reactive Machine Learning: Building Intelligent Agents at Scale
Spark for Reactive Machine Learning: Building Intelligent Agents at ScaleSpark for Reactive Machine Learning: Building Intelligent Agents at Scale
Spark for Reactive Machine Learning: Building Intelligent Agents at Scale
Jeff Smith
 
Art & music vs Google App Engine
Art & music vs Google App EngineArt & music vs Google App Engine
Art & music vs Google App Engine
thomas alisi
 
ScalaCheck
ScalaCheckScalaCheck
ScalaCheckBeScala
 
обзор Python
обзор Pythonобзор Python
обзор Python
Yehor Nazarkin
 
Intro to computer vision in .net
Intro to computer vision in .netIntro to computer vision in .net
Intro to computer vision in .net
Stephen Lorello
 
Doubt Truth to be a Liar: Non Triviality of Type Safety for Machine Learning ...
Doubt Truth to be a Liar: Non Triviality of Type Safety for Machine Learning ...Doubt Truth to be a Liar: Non Triviality of Type Safety for Machine Learning ...
Doubt Truth to be a Liar: Non Triviality of Type Safety for Machine Learning ...
Matthew Tovbin
 
Information from pixels
Information from pixelsInformation from pixels
Information from pixels
Dave Snowdon
 
Pythonic Graphics
Pythonic GraphicsPythonic Graphics
Pythonic Graphics
Kirby Urner
 
Coding in Style
Coding in StyleCoding in Style
Coding in Style
scalaconfjp
 
Pythonで機械学習入門以前
Pythonで機械学習入門以前Pythonで機械学習入門以前
Pythonで機械学習入門以前
Kimikazu Kato
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflow
Databricks
 
Akka with Scala
Akka with ScalaAkka with Scala
Akka with Scala
Oto Brglez
 
Spock
SpockSpock
I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)
AZUG FR
 
Ember
EmberEmber
Ember
mrphilroth
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
Debarko De
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
Lviv Startup Club
 
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
PAPIs.io
 
Static code analysis: what? how? why?
Static code analysis: what? how? why?Static code analysis: what? how? why?
Static code analysis: what? how? why?
Andrey Karpov
 

Similar to EuroSciPy 2019: Visual diagnostics at scale (20)

Visual diagnostics at scale
Visual diagnostics at scaleVisual diagnostics at scale
Visual diagnostics at scale
 
Spark for Reactive Machine Learning: Building Intelligent Agents at Scale
Spark for Reactive Machine Learning: Building Intelligent Agents at ScaleSpark for Reactive Machine Learning: Building Intelligent Agents at Scale
Spark for Reactive Machine Learning: Building Intelligent Agents at Scale
 
Art & music vs Google App Engine
Art & music vs Google App EngineArt & music vs Google App Engine
Art & music vs Google App Engine
 
ScalaCheck
ScalaCheckScalaCheck
ScalaCheck
 
обзор Python
обзор Pythonобзор Python
обзор Python
 
Intro to computer vision in .net
Intro to computer vision in .netIntro to computer vision in .net
Intro to computer vision in .net
 
Doubt Truth to be a Liar: Non Triviality of Type Safety for Machine Learning ...
Doubt Truth to be a Liar: Non Triviality of Type Safety for Machine Learning ...Doubt Truth to be a Liar: Non Triviality of Type Safety for Machine Learning ...
Doubt Truth to be a Liar: Non Triviality of Type Safety for Machine Learning ...
 
Information from pixels
Information from pixelsInformation from pixels
Information from pixels
 
Pythonic Graphics
Pythonic GraphicsPythonic Graphics
Pythonic Graphics
 
Coding in Style
Coding in StyleCoding in Style
Coding in Style
 
Pythonで機械学習入門以前
Pythonで機械学習入門以前Pythonで機械学習入門以前
Pythonで機械学習入門以前
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflow
 
Akka with Scala
Akka with ScalaAkka with Scala
Akka with Scala
 
Spock
SpockSpock
Spock
 
I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)
 
Ember
EmberEmber
Ember
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
Automating Machine Learning Workflows: A Report from the Trenches - Jose A. O...
 
Static code analysis: what? how? why?
Static code analysis: what? how? why?Static code analysis: what? how? why?
Static code analysis: what? how? why?
 

More from Rebecca Bilbro

Data Structures for Data Privacy: Lessons Learned in Production
Data Structures for Data Privacy: Lessons Learned in ProductionData Structures for Data Privacy: Lessons Learned in Production
Data Structures for Data Privacy: Lessons Learned in Production
Rebecca Bilbro
 
Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)
Rebecca Bilbro
 
Anti-Entropy Replication for Cost-Effective Eventual Consistency
Anti-Entropy Replication for Cost-Effective Eventual ConsistencyAnti-Entropy Replication for Cost-Effective Eventual Consistency
Anti-Entropy Replication for Cost-Effective Eventual Consistency
Rebecca Bilbro
 
The Promise and Peril of Very Big Models
The Promise and Peril of Very Big ModelsThe Promise and Peril of Very Big Models
The Promise and Peril of Very Big Models
Rebecca Bilbro
 
Beyond Off the-Shelf Consensus
Beyond Off the-Shelf ConsensusBeyond Off the-Shelf Consensus
Beyond Off the-Shelf Consensus
Rebecca Bilbro
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Rebecca Bilbro
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
Rebecca Bilbro
 
Words in space
Words in spaceWords in space
Words in space
Rebecca Bilbro
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
Rebecca Bilbro
 
Camlis
CamlisCamlis
Learning machine learning with Yellowbrick
Learning machine learning with YellowbrickLearning machine learning with Yellowbrick
Learning machine learning with Yellowbrick
Rebecca Bilbro
 
Data Intelligence 2017 - Building a Gigaword Corpus
Data Intelligence 2017 - Building a Gigaword CorpusData Intelligence 2017 - Building a Gigaword Corpus
Data Intelligence 2017 - Building a Gigaword Corpus
Rebecca Bilbro
 
Building a Gigaword Corpus (PyCon 2017)
Building a Gigaword Corpus (PyCon 2017)Building a Gigaword Corpus (PyCon 2017)
Building a Gigaword Corpus (PyCon 2017)
Rebecca Bilbro
 
NLP for Everyday People
NLP for Everyday PeopleNLP for Everyday People
NLP for Everyday People
Rebecca Bilbro
 
Commerce Data Usability Project
Commerce Data Usability ProjectCommerce Data Usability Project
Commerce Data Usability Project
Rebecca Bilbro
 

More from Rebecca Bilbro (15)

Data Structures for Data Privacy: Lessons Learned in Production
Data Structures for Data Privacy: Lessons Learned in ProductionData Structures for Data Privacy: Lessons Learned in Production
Data Structures for Data Privacy: Lessons Learned in Production
 
Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)Conflict-Free Replicated Data Types (PyCon 2022)
Conflict-Free Replicated Data Types (PyCon 2022)
 
Anti-Entropy Replication for Cost-Effective Eventual Consistency
Anti-Entropy Replication for Cost-Effective Eventual ConsistencyAnti-Entropy Replication for Cost-Effective Eventual Consistency
Anti-Entropy Replication for Cost-Effective Eventual Consistency
 
The Promise and Peril of Very Big Models
The Promise and Peril of Very Big ModelsThe Promise and Peril of Very Big Models
The Promise and Peril of Very Big Models
 
Beyond Off the-Shelf Consensus
Beyond Off the-Shelf ConsensusBeyond Off the-Shelf Consensus
Beyond Off the-Shelf Consensus
 
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
Steering Model Selection with Visual Diagnostics: Women in Analytics 2019
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
 
Words in space
Words in spaceWords in space
Words in space
 
The Incredible Disappearing Data Scientist
The Incredible Disappearing Data ScientistThe Incredible Disappearing Data Scientist
The Incredible Disappearing Data Scientist
 
Camlis
CamlisCamlis
Camlis
 
Learning machine learning with Yellowbrick
Learning machine learning with YellowbrickLearning machine learning with Yellowbrick
Learning machine learning with Yellowbrick
 
Data Intelligence 2017 - Building a Gigaword Corpus
Data Intelligence 2017 - Building a Gigaword CorpusData Intelligence 2017 - Building a Gigaword Corpus
Data Intelligence 2017 - Building a Gigaword Corpus
 
Building a Gigaword Corpus (PyCon 2017)
Building a Gigaword Corpus (PyCon 2017)Building a Gigaword Corpus (PyCon 2017)
Building a Gigaword Corpus (PyCon 2017)
 
NLP for Everyday People
NLP for Everyday PeopleNLP for Everyday People
NLP for Everyday People
 
Commerce Data Usability Project
Commerce Data Usability ProjectCommerce Data Usability Project
Commerce Data Usability Project
 

Recently uploaded

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 

Recently uploaded (20)

The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 

EuroSciPy 2019: Visual diagnostics at scale

  • 2. Dr. Rebecca Bilbro Chief Data Scientist, ICX Media Co-creator, Scikit-Yellowbrick Author, Applied Text Analysis with Python @rebeccabilbro
  • 3. A tale of three datasets
  • 4. Census Dataset 500K instances 50 features (age, occupation, education, sex, ethnicity marital status) Sarcasm Dataset 50K instances 5K features (“love”, 🙄, “totally”, “best”, “surprise”, “Sherlock”, capitalization, timestamp) Sensor Dataset 5M instances 15 features (Ammonia, Acetaldehyde, Acetone, Ethylene, Ethanol, Toluene ppmv)
  • 5. Scaling pain points are dataset- specific ● Many features ● Many instances ● Feature variance ● Heteroskedasticity ● Covariance ● Noise
  • 6. Logistic Regression Fit Times (seconds) 500 - 5M instances / 5 - 50 features 10 seconds
  • 7. Multilayer Perceptron Fit Times (seconds) 500 - 5M instances / 5 - 50 features 5 min, 48 seconds
  • 8. Support Vector Machine Fit Times (seconds) 500 - 500K instances / 5 - 50 features 5 hours, 24 seconds
  • 9. Support Vector Machine Fit Times (seconds) 500 - 500K instances / 5 - 50 features 5 hours, 24 seconds 😵
  • 10. How to optimize? ● Be patient ● Be wrong ● Be rich ● Steer
  • 11.
  • 13.
  • 14.
  • 15. import matplotlib.pyplot as plt from sklearn.datasets import load_iris from yellowbrick.features import ParallelCoordinates data = load_iris() oz = ParallelCoordinates(ax=axes[idx], fast=True) oz.fit_transform(data.data, data.target) oz.finalize() Each point drawn individually as connected line segment With standardization Points grouped by class, each class drawn as single segment
  • 16.
  • 17. Bumps
  • 18. class Estimator(object): def fit(self, X, y=None): """ Fits estimator to data. """ # set state of self return self def predict(self, X): """ Predict response of X """ # compute predictions pred return pred class Transformer(Estimator): def transform(self, X): """ Transforms the input data. """ # transform X to X_prime return X_prime class Pipeline(Transfomer): @property def named_steps(self): """ Returns a sequence of estimators """ return self.steps @property def _final_estimator(self): """ Terminating estimator """ return self.steps[-1] The scikit-learn API self.X
  • 19. class Visualizer(Estimator): def draw(self): """ Draw called from scikit-learn methods. """ return self.ax def finalize(self): self.set_title() self.legend() def poof(self): self.finalize() plt.show() import matplotlib.pyplot as plt from yellowbrick.base import Visualizer class MyVisualizer(Visualizer): def __init__(self, ax=None, **kwargs): super(MyVisualizer, self).__init__(ax, **kwargs) def fit(self, X, y=None): self.draw(X) return self def draw(self, X): if self.ax is None: self.ax = self.gca() self.ax.plt(X) def finalize(self): self.set_title("My Visualizer") The Yellowbrick API
  • 20. Oneliners from sklearn.linear_model import Lasso from yellowbrick.regressor import ResidualsPlot # Option 1: scikit-learn style viz = ResidualsPlot(Lasso()) viz.fit(X_train, y_train) viz.score(X_test, y_test) viz.poof() from sklearn.linear_model import Lasso from yellowbrick.regressor import residuals_plot # Option 2: Quick Method viz = residuals_plot( Lasso(), X_train, y_train, X_test, y_test ) ��
  • 22. =========================================== test session starts ============================================ platform darwin -- Python 3.7.1, pytest-5.0.0, py-1.8.0, pluggy-0.12.0 rootdir: /Users/rbilbro/pyjects/yb, inifile: setup.cfg plugins: flakes-4.0.0, cov-2.7.1 collected 932 items tests/__init__.py s... [ 0%] tests/base.py s [ 0%] tests/conftest.py s [ 0%] tests/fixtures.py s [ 0%] tests/images.py s [ 0%] tests/rand.py s [ 0%] tests/test_base.py s............ [ 2%] ........................................................................................................... ........................................................................................................... ........................................................................................................... ........................................................................................................... tests/test_utils/test_target.py s............ [ 68%] tests/test_utils/test_timer.py s..... [ 68%] tests/test_utils/test_types.py s.................................................................... [ 70%] ....x................................x.............................................................. [ 72%] .... [ 73%] tests/test_utils/test_wrapper.py s.... ===================== 854 passed, 72 skipped, 6 xfailed, 33 warnings in 225.96 seconds =====================
  • 24. Brushing and Filtering Ok for only 5 features Not good for 23 features
  • 26. Parallelization with joblib Elbow Curve Validation Curve
  • 27. from yellowbrick.features import Rank2D from yellowbrick.pipeline import VisualPipeline from yellowbrick.model_selection import CVScores from yellowbrick.regressor import PredictionError viz_pipe = VisualPipeline([ ('rank2d', Rank2D(features=features, algorithm='covariance')), ('prederr', PredictionError(model)), ('cvscores', CVScores(model, cv=cv, scoring='r2')) ]) Visual Pipelines
  • 28. Models are aggregations, so are visualizations...
  • 29. …so use visualizations to steer model selection!