SlideShare a Scribd company logo
1 of 7
SMOTE
Synthetic Minority Over-sampling Technique – a
methodology to handle class imbalance problems.
• Synthesis of new minority class instances
• Over-sampling of minority class
• Under-sampling of majority class
• Tweaking the cost function to make misclassification of
minority instances more important than misclassification of
majority instances
SMOTE
SMOTE
The trade-off between recall (percent of truly positive instances that were classified as such) and
precision (percent of positive classifications that are truly positive). In situations where we want to
detect instances of a minority class, we are usually concerned more so with recall than precision
K-Fold Cross-validation
Cross-validation is a resampling procedure used to evaluate
machine learning models on a limited data sample.
1. Shuffle the dataset randomly
2. Split the dataset into k groups
3. For each unique group:
a. Take the group as a hold out or test data set
b. Take the remaining groups as a training data set
c. Fit a model on the training set and evaluate it on the test set
d. Retain the evaluation score and discard the model
4. Summarize the skill of the model using the sample of model
evaluation scores
Think of this as the leave one out technique
K-Fold Cross-validation
Choosing k
A poorly chosen value for k may result in a misrepresentative
idea of the skill of the model, such as a score with a high
variance (that may change a lot based on the data used to fit
the model), or a high bias, (such as an overestimate of the skill
of the model).
K-Fold Cross-validation
Basic example
Let’s assume we have six observations:
[1, 2, 3, 4, 5, 6]
Now, let’s use k = 3, so we need to create three random folds:
fold 1: [5, 2]
fold 2: [4, 6]
fold 3: [3, 1]
K-Fold Cross-validation
Basic example
So our train and test sets become:
train: [5, 2, 4, 6], test: [3, 1]
train: [4, 6, 3, 1], test: [5, 2]
train: [3, 1, 5, 2], test: [4, 6]
fold 1: [5, 2]
fold 2: [4, 6]
fold 3: [3, 1]

More Related Content

Similar to SMOTE and K-Fold Cross Validation-Presentation.pptx

Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
NYversity
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
Chirag Gupta
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
萍華 楊
 
slides.pptx
slides.pptxslides.pptx
slides.pptx
kunwarpratap8055
 

Similar to SMOTE and K-Fold Cross Validation-Presentation.pptx (20)

Modelling and evaluation
Modelling and evaluationModelling and evaluation
Modelling and evaluation
 
Machine learning (5)
Machine learning (5)Machine learning (5)
Machine learning (5)
 
evaluation and credibility-Part 1
evaluation and credibility-Part 1evaluation and credibility-Part 1
evaluation and credibility-Part 1
 
18.1 combining models
18.1 combining models18.1 combining models
18.1 combining models
 
Cmpe 255 cross validation
Cmpe 255 cross validationCmpe 255 cross validation
Cmpe 255 cross validation
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
Classification using L1-Penalized Logistic Regression
Classification using L1-Penalized Logistic RegressionClassification using L1-Penalized Logistic Regression
Classification using L1-Penalized Logistic Regression
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Learning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification DataLearning On The Border:Active Learning in Imbalanced classification Data
Learning On The Border:Active Learning in Imbalanced classification Data
 
Ensemble hybrid learning technique
Ensemble hybrid learning techniqueEnsemble hybrid learning technique
Ensemble hybrid learning technique
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
4.1.pptx
4.1.pptx4.1.pptx
4.1.pptx
 
in5490-classification (1).pptx
in5490-classification (1).pptxin5490-classification (1).pptx
in5490-classification (1).pptx
 
Ensemble methods in Machine learning technology
Ensemble methods in Machine learning technologyEnsemble methods in Machine learning technology
Ensemble methods in Machine learning technology
 
Cross-validation aggregation for forecasting
Cross-validation aggregation for forecastingCross-validation aggregation for forecasting
Cross-validation aggregation for forecasting
 
slides.pptx
slides.pptxslides.pptx
slides.pptx
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptxStatistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection module 2.pptx
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
 
LR 9 Estimation.pdf
LR 9 Estimation.pdfLR 9 Estimation.pdf
LR 9 Estimation.pdf
 

More from HaritikaChhatwal1

Epigeum at ntunive singapore MED900.pptx
Epigeum at ntunive singapore MED900.pptxEpigeum at ntunive singapore MED900.pptx
Epigeum at ntunive singapore MED900.pptx
HaritikaChhatwal1
 
MED 900 Correlational Studies online safety sake.pptx
MED 900 Correlational Studies online safety sake.pptxMED 900 Correlational Studies online safety sake.pptx
MED 900 Correlational Studies online safety sake.pptx
HaritikaChhatwal1
 

More from HaritikaChhatwal1 (20)

Visualization IN DATA ANALYTICS IN TIME SERIES
Visualization IN DATA ANALYTICS IN TIME SERIESVisualization IN DATA ANALYTICS IN TIME SERIES
Visualization IN DATA ANALYTICS IN TIME SERIES
 
TS Decomposition IN data Time Series Fore
TS Decomposition IN data Time Series ForeTS Decomposition IN data Time Series Fore
TS Decomposition IN data Time Series Fore
 
TIMES SERIES FORECASTING ON HISTORICAL DATA IN R
TIMES SERIES FORECASTING ON HISTORICAL DATA  IN RTIMES SERIES FORECASTING ON HISTORICAL DATA  IN R
TIMES SERIES FORECASTING ON HISTORICAL DATA IN R
 
Factor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICSFactor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICS
 
Additional Reading material-Probability.ppt
Additional Reading material-Probability.pptAdditional Reading material-Probability.ppt
Additional Reading material-Probability.ppt
 
Decision Tree_Loan Delinquent_Problem Statement.pdf
Decision Tree_Loan Delinquent_Problem Statement.pdfDecision Tree_Loan Delinquent_Problem Statement.pdf
Decision Tree_Loan Delinquent_Problem Statement.pdf
 
Frequency Based Classification Algorithms_ important
Frequency Based Classification Algorithms_ importantFrequency Based Classification Algorithms_ important
Frequency Based Classification Algorithms_ important
 
M2W1 - FBS - Descriptive Statistics_Mentoring Presentation.pptx
M2W1 - FBS - Descriptive Statistics_Mentoring Presentation.pptxM2W1 - FBS - Descriptive Statistics_Mentoring Presentation.pptx
M2W1 - FBS - Descriptive Statistics_Mentoring Presentation.pptx
 
Introduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICSIntroduction to R _IMPORTANT FOR DATA ANALYTICS
Introduction to R _IMPORTANT FOR DATA ANALYTICS
 
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIASTBUSINESS ANALYTICS WITH R SOFTWARE DIAST
BUSINESS ANALYTICS WITH R SOFTWARE DIAST
 
SESSION 1-2 [Autosaved] [Autosaved].pptx
SESSION 1-2 [Autosaved] [Autosaved].pptxSESSION 1-2 [Autosaved] [Autosaved].pptx
SESSION 1-2 [Autosaved] [Autosaved].pptx
 
WORKSHEET INTRO AND TVM _SESSION 1 AND 2.pdf
WORKSHEET INTRO AND TVM _SESSION 1 AND 2.pdfWORKSHEET INTRO AND TVM _SESSION 1 AND 2.pdf
WORKSHEET INTRO AND TVM _SESSION 1 AND 2.pdf
 
Epigeum at ntunive singapore MED900.pptx
Epigeum at ntunive singapore MED900.pptxEpigeum at ntunive singapore MED900.pptx
Epigeum at ntunive singapore MED900.pptx
 
MED 900 Correlational Studies online safety sake.pptx
MED 900 Correlational Studies online safety sake.pptxMED 900 Correlational Studies online safety sake.pptx
MED 900 Correlational Studies online safety sake.pptx
 
Nw Microsoft PowerPoint Presentation.pptx
Nw Microsoft PowerPoint Presentation.pptxNw Microsoft PowerPoint Presentation.pptx
Nw Microsoft PowerPoint Presentation.pptx
 
HOWs CORRELATIONAL STUDIES are performed
HOWs CORRELATIONAL STUDIES are performedHOWs CORRELATIONAL STUDIES are performed
HOWs CORRELATIONAL STUDIES are performed
 
DIAS PRESENTATION.pptx
DIAS PRESENTATION.pptxDIAS PRESENTATION.pptx
DIAS PRESENTATION.pptx
 
FULLTEXT01.pdf
FULLTEXT01.pdfFULLTEXT01.pdf
FULLTEXT01.pdf
 
CHAPTER 4 -TYPES OF BUSINESS.pptx
CHAPTER 4 -TYPES OF BUSINESS.pptxCHAPTER 4 -TYPES OF BUSINESS.pptx
CHAPTER 4 -TYPES OF BUSINESS.pptx
 
CHAPTER 3-ENTREPRENEURSHIP [Autosaved].pptx
CHAPTER 3-ENTREPRENEURSHIP [Autosaved].pptxCHAPTER 3-ENTREPRENEURSHIP [Autosaved].pptx
CHAPTER 3-ENTREPRENEURSHIP [Autosaved].pptx
 

Recently uploaded

Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
RafigAliyev2
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
pyhepag
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
cyebo
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
cyebo
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
DilipVasan
 

Recently uploaded (20)

Slip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp ClaimsSlip-and-fall Injuries: Top Workers' Comp Claims
Slip-and-fall Injuries: Top Workers' Comp Claims
 
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPsWebinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
Webinar One View, Multiple Systems No-Code Integration of Salesforce and ERPs
 
AI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdfAI Imagen for data-storytelling Infographics.pdf
AI Imagen for data-storytelling Infographics.pdf
 
Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)Atlantic Grupa Case Study (Mintec Data AI)
Atlantic Grupa Case Study (Mintec Data AI)
 
Fuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertaintyFuzzy Sets decision making under information of uncertainty
Fuzzy Sets decision making under information of uncertainty
 
How I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prisonHow I opened a fake bank account and didn't go to prison
How I opened a fake bank account and didn't go to prison
 
Pre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptxPre-ProductionImproveddsfjgndflghtgg.pptx
Pre-ProductionImproveddsfjgndflghtgg.pptx
 
2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting2024 Q2 Orange County (CA) Tableau User Group Meeting
2024 Q2 Orange County (CA) Tableau User Group Meeting
 
一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理一比一原版西悉尼大学毕业证成绩单如何办理
一比一原版西悉尼大学毕业证成绩单如何办理
 
Machine Learning for Accident Severity Prediction
Machine Learning for Accident Severity PredictionMachine Learning for Accident Severity Prediction
Machine Learning for Accident Severity Prediction
 
how can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoinhow can i exchange pi coins for others currency like Bitcoin
how can i exchange pi coins for others currency like Bitcoin
 
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理一比一原版纽卡斯尔大学毕业证成绩单如何办理
一比一原版纽卡斯尔大学毕业证成绩单如何办理
 
Easy and simple project file on mp online
Easy and simple project file on mp onlineEasy and simple project file on mp online
Easy and simple project file on mp online
 
Artificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdfArtificial_General_Intelligence__storm_gen_article.pdf
Artificial_General_Intelligence__storm_gen_article.pdf
 
一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理一比一原版麦考瑞大学毕业证成绩单如何办理
一比一原版麦考瑞大学毕业证成绩单如何办理
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 
2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call2024 Q1 Tableau User Group Leader Quarterly Call
2024 Q1 Tableau User Group Leader Quarterly Call
 
basics of data science with application areas.pdf
basics of data science with application areas.pdfbasics of data science with application areas.pdf
basics of data science with application areas.pdf
 
Exploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptxExploratory Data Analysis - Dilip S.pptx
Exploratory Data Analysis - Dilip S.pptx
 

SMOTE and K-Fold Cross Validation-Presentation.pptx

  • 1. SMOTE Synthetic Minority Over-sampling Technique – a methodology to handle class imbalance problems. • Synthesis of new minority class instances • Over-sampling of minority class • Under-sampling of majority class • Tweaking the cost function to make misclassification of minority instances more important than misclassification of majority instances
  • 3. SMOTE The trade-off between recall (percent of truly positive instances that were classified as such) and precision (percent of positive classifications that are truly positive). In situations where we want to detect instances of a minority class, we are usually concerned more so with recall than precision
  • 4. K-Fold Cross-validation Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample. 1. Shuffle the dataset randomly 2. Split the dataset into k groups 3. For each unique group: a. Take the group as a hold out or test data set b. Take the remaining groups as a training data set c. Fit a model on the training set and evaluate it on the test set d. Retain the evaluation score and discard the model 4. Summarize the skill of the model using the sample of model evaluation scores Think of this as the leave one out technique
  • 5. K-Fold Cross-validation Choosing k A poorly chosen value for k may result in a misrepresentative idea of the skill of the model, such as a score with a high variance (that may change a lot based on the data used to fit the model), or a high bias, (such as an overestimate of the skill of the model).
  • 6. K-Fold Cross-validation Basic example Let’s assume we have six observations: [1, 2, 3, 4, 5, 6] Now, let’s use k = 3, so we need to create three random folds: fold 1: [5, 2] fold 2: [4, 6] fold 3: [3, 1]
  • 7. K-Fold Cross-validation Basic example So our train and test sets become: train: [5, 2, 4, 6], test: [3, 1] train: [4, 6, 3, 1], test: [5, 2] train: [3, 1, 5, 2], test: [4, 6] fold 1: [5, 2] fold 2: [4, 6] fold 3: [3, 1]