SlideShare a Scribd company logo
Model Selection
Techniques
- Swati (19013030)
Let’s start !
MODEL SELECTION TECHNIQUES
● Multiple models are fitted and evaluated to choose the best. The best approach to model selection requires
“sufficient” data, which may be nearly infinite depending on the complexity of the problem.
● There are three ways of selecting your ML model in which two are the fields of probability and sampling.
1. Random Train/Test Split
 Random Splits are used to randomly sample a percentage of data
into training, testing, and preferably validation sets.
 Data to be passed in model is divided into train and test, ratio wise.
This can be achieved using train_test_split function of scikit-learn
python library.
 On re-running of train_test_split data code, results come out to be
different on each run of code. So, you aren’t sure how exactly your
model will perform on unseen data.
 The advantage of this method is that there is a good chance that the
original population is well represented in all the three sets. In more
formal terms, random splitting will prevent a biased sampling of data.
2. Resampling
 In resampling technique of model selection, for a set of
iterations, data is resampled into train/test followed by
training on train and evaluation on test set.
 Model chosen from this technique is assessed based on
performance, not the model complexity.
 Performance is computed on out-of-sample data.
Resampling techniques estimate the error by evaluating
out-of-sample data aka unseen data.
- Strategies of resampling are such as K-Fold,
StratifiedK-Fold etc.
Types of Resampling
1. Time-Based Split 2. K-Fold Cross-
Validation
4. Bootstrap
3. Stratified K-Fold
Time – Based Split
 There are some types of data where random splits are
not possible.
 For example, if we have to train a model for weather
forecasting, we cannot randomly divide the data into
training and testing sets. This will jumble up the seasonal
pattern! Such data is often referred to by the term – Time
Series.
 In such cases, a time-wise split is used. The training set
can have data for the last three years and 10 months of
the present year. The last two months can be reserved
for the testing or validation set.
 However, the drawback of time-series data is that the
events or data points are not mutually independent.
One event might affect every data input that follows
after.
K-Fold Cross-Validation
 The cross-validation technique works by randomly shuffling the
dataset and then splitting it into k groups. Thereafter, on iterating
over each group, the group needs to be considered as a test set
while all other groups are clubbed together into the training set.
 The model is tested on the test group and the process continues for
k groups.
 Thus, by the end of the process, one has k different results on k
different test groups. The best model can then be selected easily by
choosing the one with the highest score.
Stratified K-Fold
 The process for stratified K-Fold is similar to that of K-
Fold cross-validation with one single point of difference –
unlike in k-fold cross-validation, the values of the
target variable is taken into consideration in
stratified k-fold.
 If for instance, the target variable is a categorical variable
with 2 classes, then stratified k-fold ensures that each
test fold gets an equal ratio of the two classes when
compared to the training set.
 This makes the model evaluation more accurate and the
model training less biased.
Bootstrap
 Bootstrap is one of the most powerful ways to obtain a stabilized model.
 The first step is to select a sample size (which is usually equal to the size of
the original dataset). Thereafter, a sample data point must be randomly
selected from the original dataset and added to the bootstrap sample. After
the addition, the sample needs to be put back into the original sample. This
process needs to be repeated for N times, where N is the sample size.
 Therefore, it is a resampling technique that creates the bootstrap sample
by sampling data points from the original dataset with replacement.
 The model is trained on the bootstrap sample and then evaluated on all
those data points that did not make it to the bootstrapped sample. These
are called the out-of-bag samples.
3. Probabilistic measures
 Probabilistic Measures do not just take into account the model
performance but also the model complexity.
 Model complexity is the measure of the model’s ability to capture
the variance in the data.
 For example, a highly biased model like the linear regression
algorithm is less complex and on the other hand, a neural network is
very high on complexity.
 A fair bit of disadvantage however lies in the fact that probabilistic
measures do not consider the uncertainty of the models and has a
chance of selecting simpler models over complex models.
e
o SRM tries to balance out the
model’s complexity against its
success at fitting on the data.
o MDL or the minimum description
length is the minimum number of
such bits required to represent the
model.
o BIC penalizes the model for its
complexity and is preferably used
when the size of the dataset is not
very small .
o AIC is the measure of information
loss.
2. Bayesian Information Criterion (BIC)
1. Akaike Information Criterion (AIC)
4. Structural Risk Minimization (SRM)
3. Minimum Description Length (MDL)
Types of Probabilistic measures
Thank You !!
For more Information visit on :
(https://neptune.ai/blog/the-ultimate-guide-to-evaluation-and-selection-of-
models-in-machine-learning)

More Related Content

What's hot

K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
SHUBHAM GUPTA
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Md. Main Uddin Rony
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
Sung Yub Kim
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
Krish_ver2
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
Krish_ver2
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
QuantUniversity
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
kishanthkumaar
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
Eng Teong Cheah
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
Hemant Chetwani
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
swapnac12
 
Data Preprocessing
Data PreprocessingData Preprocessing
Model selection
Model selectionModel selection
Model selection
Animesh Kumar
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
hktripathy
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
hina firdaus
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
Megha Sharma
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
Vishwas N
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
Fellowship at Vodafone FutureLab
 

What's hot (20)

K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
2.4 rule based classification
2.4 rule based classification2.4 rule based classification
2.4 rule based classification
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Model selection
Model selectionModel selection
Model selection
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
Association rule mining and Apriori algorithm
Association rule mining and Apriori algorithmAssociation rule mining and Apriori algorithm
Association rule mining and Apriori algorithm
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 

Similar to Model Selection Techniques

UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI ModelsUNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
JVSTHARUNSAI
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
eShikshak
 
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
YouKnowwho28
 
ENACTMENT RANKING OF SUPERVISED ALGORITHMS DEPENDENCE OF DATA SPLITTING ALGOR...
ENACTMENT RANKING OF SUPERVISED ALGORITHMS DEPENDENCE OF DATA SPLITTING ALGOR...ENACTMENT RANKING OF SUPERVISED ALGORITHMS DEPENDENCE OF DATA SPLITTING ALGOR...
ENACTMENT RANKING OF SUPERVISED ALGORITHMS DEPENDENCE OF DATA SPLITTING ALGOR...
ijnlc
 
Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algor...
Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algor...Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algor...
Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algor...
AIRCC Publishing Corporation
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
IRJET Journal
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
HaritikaChhatwal1
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
rajalakshmi5921
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
PriyadharshiniG41
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
Roger Barga
 
ML MODULE 5.pdf
ML MODULE 5.pdfML MODULE 5.pdf
ML MODULE 5.pdf
Shiwani Gupta
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
jagan477830
 
6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx
mohammedalherwi1
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
IRJET Journal
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
Gokulks007
 
Data Partitioning for Ensemble Model Building
Data Partitioning for Ensemble Model BuildingData Partitioning for Ensemble Model Building
Data Partitioning for Ensemble Model Building
neirew J
 
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDINGDATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
ijccsa
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
Saleesh Satheeshchandran
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
Marina Santini
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
IRJET Journal
 

Similar to Model Selection Techniques (20)

UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI ModelsUNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
 
Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
 
ENACTMENT RANKING OF SUPERVISED ALGORITHMS DEPENDENCE OF DATA SPLITTING ALGOR...
ENACTMENT RANKING OF SUPERVISED ALGORITHMS DEPENDENCE OF DATA SPLITTING ALGOR...ENACTMENT RANKING OF SUPERVISED ALGORITHMS DEPENDENCE OF DATA SPLITTING ALGOR...
ENACTMENT RANKING OF SUPERVISED ALGORITHMS DEPENDENCE OF DATA SPLITTING ALGOR...
 
Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algor...
Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algor...Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algor...
Enactment Ranking of Supervised Algorithms Dependence of Data Splitting Algor...
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
ML MODULE 5.pdf
ML MODULE 5.pdfML MODULE 5.pdf
ML MODULE 5.pdf
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
Data Partitioning for Ensemble Model Building
Data Partitioning for Ensemble Model BuildingData Partitioning for Ensemble Model Building
Data Partitioning for Ensemble Model Building
 
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDINGDATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
DATA PARTITIONING FOR ENSEMBLE MODEL BUILDING
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
 

Recently uploaded

HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
gdsczhcet
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
R&R Consult
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
Kamal Acharya
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
FluxPrime1
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
Jayaprasanna4
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 

Recently uploaded (20)

HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdfGen AI Study Jams _ For the GDSC Leads in India.pdf
Gen AI Study Jams _ For the GDSC Leads in India.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdfCOLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
COLLEGE BUS MANAGEMENT SYSTEM PROJECT REPORT.pdf
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
DESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docxDESIGN A COTTON SEED SEPARATION MACHINE.docx
DESIGN A COTTON SEED SEPARATION MACHINE.docx
 
ethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.pptethical hacking in wireless-hacking1.ppt
ethical hacking in wireless-hacking1.ppt
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 

Model Selection Techniques

  • 3. MODEL SELECTION TECHNIQUES ● Multiple models are fitted and evaluated to choose the best. The best approach to model selection requires “sufficient” data, which may be nearly infinite depending on the complexity of the problem. ● There are three ways of selecting your ML model in which two are the fields of probability and sampling.
  • 4. 1. Random Train/Test Split  Random Splits are used to randomly sample a percentage of data into training, testing, and preferably validation sets.  Data to be passed in model is divided into train and test, ratio wise. This can be achieved using train_test_split function of scikit-learn python library.  On re-running of train_test_split data code, results come out to be different on each run of code. So, you aren’t sure how exactly your model will perform on unseen data.  The advantage of this method is that there is a good chance that the original population is well represented in all the three sets. In more formal terms, random splitting will prevent a biased sampling of data.
  • 5. 2. Resampling  In resampling technique of model selection, for a set of iterations, data is resampled into train/test followed by training on train and evaluation on test set.  Model chosen from this technique is assessed based on performance, not the model complexity.  Performance is computed on out-of-sample data. Resampling techniques estimate the error by evaluating out-of-sample data aka unseen data. - Strategies of resampling are such as K-Fold, StratifiedK-Fold etc.
  • 6. Types of Resampling 1. Time-Based Split 2. K-Fold Cross- Validation 4. Bootstrap 3. Stratified K-Fold
  • 7. Time – Based Split  There are some types of data where random splits are not possible.  For example, if we have to train a model for weather forecasting, we cannot randomly divide the data into training and testing sets. This will jumble up the seasonal pattern! Such data is often referred to by the term – Time Series.  In such cases, a time-wise split is used. The training set can have data for the last three years and 10 months of the present year. The last two months can be reserved for the testing or validation set.  However, the drawback of time-series data is that the events or data points are not mutually independent. One event might affect every data input that follows after.
  • 8. K-Fold Cross-Validation  The cross-validation technique works by randomly shuffling the dataset and then splitting it into k groups. Thereafter, on iterating over each group, the group needs to be considered as a test set while all other groups are clubbed together into the training set.  The model is tested on the test group and the process continues for k groups.  Thus, by the end of the process, one has k different results on k different test groups. The best model can then be selected easily by choosing the one with the highest score.
  • 9. Stratified K-Fold  The process for stratified K-Fold is similar to that of K- Fold cross-validation with one single point of difference – unlike in k-fold cross-validation, the values of the target variable is taken into consideration in stratified k-fold.  If for instance, the target variable is a categorical variable with 2 classes, then stratified k-fold ensures that each test fold gets an equal ratio of the two classes when compared to the training set.  This makes the model evaluation more accurate and the model training less biased.
  • 10. Bootstrap  Bootstrap is one of the most powerful ways to obtain a stabilized model.  The first step is to select a sample size (which is usually equal to the size of the original dataset). Thereafter, a sample data point must be randomly selected from the original dataset and added to the bootstrap sample. After the addition, the sample needs to be put back into the original sample. This process needs to be repeated for N times, where N is the sample size.  Therefore, it is a resampling technique that creates the bootstrap sample by sampling data points from the original dataset with replacement.  The model is trained on the bootstrap sample and then evaluated on all those data points that did not make it to the bootstrapped sample. These are called the out-of-bag samples.
  • 11. 3. Probabilistic measures  Probabilistic Measures do not just take into account the model performance but also the model complexity.  Model complexity is the measure of the model’s ability to capture the variance in the data.  For example, a highly biased model like the linear regression algorithm is less complex and on the other hand, a neural network is very high on complexity.  A fair bit of disadvantage however lies in the fact that probabilistic measures do not consider the uncertainty of the models and has a chance of selecting simpler models over complex models. e
  • 12. o SRM tries to balance out the model’s complexity against its success at fitting on the data. o MDL or the minimum description length is the minimum number of such bits required to represent the model. o BIC penalizes the model for its complexity and is preferably used when the size of the dataset is not very small . o AIC is the measure of information loss. 2. Bayesian Information Criterion (BIC) 1. Akaike Information Criterion (AIC) 4. Structural Risk Minimization (SRM) 3. Minimum Description Length (MDL) Types of Probabilistic measures
  • 13. Thank You !! For more Information visit on : (https://neptune.ai/blog/the-ultimate-guide-to-evaluation-and-selection-of- models-in-machine-learning)