SlideShare a Scribd company logo
1 of 24
Common pitfalls leading to wrongly
estimated model performance
Jan van der Vegt
Cubonacci
PyData Eindhoven – November 30th 2019
Overview
▪ Problem statement
▪ Generalization
▪ Assumptions
▪ Validation schemes
▪ Other pitfalls
Problem statement
Generalization
How well does our model adapt to new, previously unseen data, drawn
from the same distribution as the one used to create the model.
How well does our model predict on unseen data
How well is our model going to predict in production
How much value is this model going to generate
Evaluation
How well does our model predict on unseen data
𝐼 𝑓𝑛 =
𝑋×𝑌
𝑉 𝑓𝑛 𝑥 , 𝑦 𝜌 𝑥, 𝑦 𝑑𝑥𝑑𝑦
Evaluation
Model (train) Unseen
(validation)
𝐼 𝑓𝑛 =
𝑋×𝑌
𝑉 𝑓𝑛 𝑥 , 𝑦 𝜌 𝑥, 𝑦 𝑑𝑥𝑑𝑦
𝐼𝑆 𝑓𝑛 =
1
𝑛
𝑖=1
𝑛
𝑉 𝑓𝑛 𝑥 , 𝑦 = 𝐼 𝑓𝑛 + 𝜺
𝔼𝜀 = 0
Thought experiment 𝑦 = 𝒂𝑥 + 𝒃
𝑦 − 𝑦 2
a b MSE
0.7 0.1 4.5
0.2 0.8 0.4
-0.6 1.6 2.6
0.1 0.4 1.9
Thought experiment
𝐼𝑆 𝑓𝑛 =
1
𝑛
𝑖=1
𝑛
𝑉 𝑓𝑛 𝑥 , 𝑦 = 𝐼 𝑓𝑛 + 𝜺
Best model
Fair evaluation
𝔼𝜀 < 0
Train | validate | test
Model (train) Unseen
(validation)
𝐼𝑆 𝑓𝑛 =
1
𝑛
𝑖=1
𝑛
𝑉 𝑓𝑛 𝑥 , 𝑦 = 𝐼 𝑓𝑛 + 𝜺
Unseen
(test)
Validation schemes
Nested crossvalidation
Model A
Model B
Model A
4.5
2.5
3.5
3.5
(Broken) assumptions
Identical, independently distributed
X1 X2 y
1 2 0
2 1 1
3 5 1
4 3 0
5 4 0
𝐹 𝑋𝑖 < 𝑥 = 𝐹(𝑋𝑗 < 𝑥) 𝐹 𝑋𝑖 < 𝑥 = 𝐹 𝑋𝑖 < 𝑥𝑖 𝑋𝑗 = 𝑥𝑗)
Validation schemes
Validation schemes
Validation schemes
Validation schemes
Training size bias
0% 80%
Duplication
X1 X2 y
2 3 0
3 4 1
1 1 1
4 2 0
4 2 0
Leakage
X1 X2 y
2 3 0
3 4 1
1 1 1
4 2 0
X1 X2 y
2 3 0
3 4 1
1 1 1
4 2 0
y
0
1
1
0
Leakage
X1 X2 y
2 Cat A 0
3 Cat A 1
1 Cat B 1
4 Cat B 1
X1 CatTargetAvg
2 0.5
3 0.5
1 1
4 1
y
0
1
1
1
Leakage
X1 X2 y
2 Cat A 0
3 Cat A 1
1 Cat B 1
4 Cat B 1
X1 CatTargetAvg
2 0.5
3 0.5
1 1
4 1
X1 CatTargetAvg
2 1
3 0
1 1
4 1
Takeaway
Estimating performance is about simulating real life
Think about:
▪ Assumptions
▪ Data generation
▪ Data availability
▪ The goal
Thank you for your time
https://www.linkedin.com/in/jan-van-der-vegt/
jan.vandervegt@cubonacci.com

More Related Content

Similar to Generalization PyData Jan van der Vegt

Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
Stéphane Canu
 

Similar to Generalization PyData Jan van der Vegt (20)

李宏毅课件-Regression.pdf
李宏毅课件-Regression.pdf李宏毅课件-Regression.pdf
李宏毅课件-Regression.pdf
 
Multimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-AnsweringMultimodal Residual Learning for Visual Question-Answering
Multimodal Residual Learning for Visual Question-Answering
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Business Application of Conjoint analysis
Business Application of  Conjoint analysisBusiness Application of  Conjoint analysis
Business Application of Conjoint analysis
 
Linear regression
Linear regressionLinear regression
Linear regression
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 
Mech ma6452 snm_notes
Mech ma6452 snm_notesMech ma6452 snm_notes
Mech ma6452 snm_notes
 
Inferences about Two Proportions
 Inferences about Two Proportions Inferences about Two Proportions
Inferences about Two Proportions
 
Repair dagstuhl jan2017
Repair dagstuhl jan2017Repair dagstuhl jan2017
Repair dagstuhl jan2017
 
Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
MLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptxMLU_DTE_Lecture_2.pptx
MLU_DTE_Lecture_2.pptx
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning Methods
 
MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...
MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...
MUMS: Transition & SPUQ Workshop - Stochastic Simulators: Issues, Methods, Un...
 
Unveiling the properties of structured grammatical evolution
Unveiling the properties of structured grammatical evolutionUnveiling the properties of structured grammatical evolution
Unveiling the properties of structured grammatical evolution
 
Common mistakes in measurement uncertainty calculations
Common mistakes in measurement uncertainty calculationsCommon mistakes in measurement uncertainty calculations
Common mistakes in measurement uncertainty calculations
 
Quality Assurance
Quality AssuranceQuality Assurance
Quality Assurance
 
Nss power point_machine_learning
Nss power point_machine_learningNss power point_machine_learning
Nss power point_machine_learning
 
Improving Spatiotemporal Stability for Object Detection and Classification
Improving Spatiotemporal Stability for Object Detection and ClassificationImproving Spatiotemporal Stability for Object Detection and Classification
Improving Spatiotemporal Stability for Object Detection and Classification
 
White Box testing by Pankaj Thakur, NITTTR Chandigarh
White Box testing by Pankaj Thakur, NITTTR ChandigarhWhite Box testing by Pankaj Thakur, NITTTR Chandigarh
White Box testing by Pankaj Thakur, NITTTR Chandigarh
 
A machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-opticsA machine learning method for efficient design optimization in nano-optics
A machine learning method for efficient design optimization in nano-optics
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Navigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern EnterpriseNavigating Identity and Access Management in the Modern Enterprise
Navigating Identity and Access Management in the Modern Enterprise
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Generalization PyData Jan van der Vegt