SlideShare a Scribd company logo
1 of 4
Using the data collected from existing
customers, building of a model that will help
the marketing team to have high (or
increased) hit ratio
~Pranov Shobhan Mishra
Application of Ensemble Techniques: Predict likely
customers for a bank product
- A comparative study of linear models & tree
based models
Refer to github link for the code
https://github.com/Pranov1984/Application-of-Ensemble-Techniques-Predict-likely-customers-for-a
Executive Summary
Overview:
A bank is trying to increase the number of customers who subscribe to it’s term deposit product. The marketing team is interested in
utilizing it’s resources judiciously by targeting those customers who have high probability of subscribing. There is historical data of
the past campaigns along with the corresponding customer details existing with the bank.
Problem Statement:
Currently the effort to increase the number of subscribers to term deposit is manual. The hit ratio is abysmal and the resources
utilized is very high for very little gain. The marketing team requires help with identification of customers who have a higher chance
of subscribing when contacted.
Goal Statement:
Using the collected from existing customers, build a model that will help the marketing team identify potential customers who are
relatively more likely to subscribe term deposit and thus increase their hit ratio.
Approach:
•Exploration & Visualization
•Class imbalance noticed in
the dependent variable
EDA
•Train Decision Tree, GLM &
Ensembles on original data
•Tune the hyper parameters
•Use Minority Oversampling
and train models
Model
Approach
•Assign appropriate data
types
•One Hot encoding
•Minority Oversampling as
contrast
Data
Preparation
•Finalize appropriate
Evaluation metric
•AUC, TPR, TNR, F1score
Evaluation
Metric
Model Building
Decision Tree
Gradient Boosting
XGboost
Stacked Ensemble Models
 Extensive experimentation with more than 10 different models was done to identify the best model that could predict customer behaviour so that the bank can
take proactive steps to increase the number of subscriptions for term deposit.
 The data was slightly imbalanced (Majority class: Minority Class = 89:11) and hence appropriate model evaluation metric was required to be chosen. A
combination of Harmonic mean (F1 score), Sensitivity and Area Under the Curve (AUC) was used to finalize the best model.
 Models tried to arrive at the best are
 Simple Models like Logistic Regression different thresholds for classification
 Decision Tree followed by various ensemble models were tried on the original dataset and the results compared
 Since the data was imbalanced, minority oversampling was used to improve the ratio to 67:33.
Results:
 Default classification achieved through Logistic regression did not give satisfactory results as the recall (sensitivity/true positive rate) and F1-scores were
too poor to proceed with the model. However when the thresholds were tuned (to 25%) the performance improved to a great extent (recall=61% and F1-
score=57%). This was better than any of the tree based models tried.
 The tree based algorithms, on the original data, gave poor results in terms of recall (sensitivity/true positive rate) achieved. The results were poorer than
the linear model (Logistic Regression) attempted.
 The accuracy scores for the tree based models were better than logistic regression indicating the imbalance in target variable was impacting the
performance of tree based models. The tree based models needed more examples of the minority class to learn better and generalize well on unseen
data.
 The recall and F1-scores for Decision tree and bagging algorithms were the worst indicating individual trees and correlated trees can give
unstable/unreliable results. The ensemble models (with de-correlated trees) generalize better .
 When the class imbalance was treated by improving the ratio of Majority class: Minority Class to 67:33, all the tree based algorithms outperformed
logistic regression results by a big margin indicating the relationship between X and Y was not linear and when the tree based algorithms were provided
with adequate number of observations, the learning was robust.
 With the balanced data random forest and stacked ensemble models gave the best results. Check the comparison in the next slide and also in the jupyter
notebook.
Executive Summary
Executive Summary

More Related Content

What's hot

Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAbhishek Singh
 
Churn prediction
Churn predictionChurn prediction
Churn predictionGigi Lino
 
Data science workshop
Data science workshopData science workshop
Data science workshopHortonworks
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection MLMaatougSelim
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data miningkavitha muneeshwaran
 
Churn Prediction in Practice
Churn Prediction in PracticeChurn Prediction in Practice
Churn Prediction in PracticeBigData Republic
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regressionkishanthkumaar
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction PresentationPinintiHarishReddy
 
[1312.5602] Playing Atari with Deep Reinforcement Learning
[1312.5602] Playing Atari with Deep Reinforcement Learning[1312.5602] Playing Atari with Deep Reinforcement Learning
[1312.5602] Playing Atari with Deep Reinforcement LearningSeung Jae Lee
 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in BankingArul Bharathi
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomChris Chen
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 

What's hot (20)

Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
Churn prediction
Churn predictionChurn prediction
Churn prediction
 
Data science workshop
Data science workshopData science workshop
Data science workshop
 
Decision tree
Decision treeDecision tree
Decision tree
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
Churn Prediction in Practice
Churn Prediction in PracticeChurn Prediction in Practice
Churn Prediction in Practice
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction Presentation
 
Encodings
EncodingsEncodings
Encodings
 
[1312.5602] Playing Atari with Deep Reinforcement Learning
[1312.5602] Playing Atari with Deep Reinforcement Learning[1312.5602] Playing Atari with Deep Reinforcement Learning
[1312.5602] Playing Atari with Deep Reinforcement Learning
 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in Banking
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Customer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in TelecomCustomer Churn, A Data Science Use Case in Telecom
Customer Churn, A Data Science Use Case in Telecom
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer Segmentation
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 

Similar to Prediction of potential customers for term deposit

Campaign response modeling
Campaign response modelingCampaign response modeling
Campaign response modelingEsteban Ribero
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progressoveesingh
 
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...Shakas Technologies
 
Statistics applied to the interdisciplinary areas of marketing
Statistics applied to the interdisciplinary areas of marketingStatistics applied to the interdisciplinary areas of marketing
Statistics applied to the interdisciplinary areas of marketingCarol Hargreaves
 
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...MereoConsulting
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
Customer_Analysis.docx
Customer_Analysis.docxCustomer_Analysis.docx
Customer_Analysis.docxKevalKabariya
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learningFrancisco E. Figueroa-Nigaglioni
 
Machine learning - session 4
Machine learning - session 4Machine learning - session 4
Machine learning - session 4Luis Borbon
 
Prediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey dataPrediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey dataAlex Papageorgiou
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industryskewdlogix
 
SAS Analytics_Poster-Rafał Wojdan
SAS Analytics_Poster-Rafał WojdanSAS Analytics_Poster-Rafał Wojdan
SAS Analytics_Poster-Rafał WojdanRafal Wojdan
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with Regoodwintx
 
Recommender system
Recommender systemRecommender system
Recommender systemBhumi Patel
 
UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...
UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...
UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...ijaia
 

Similar to Prediction of potential customers for term deposit (20)

Campaign response modeling
Campaign response modelingCampaign response modeling
Campaign response modeling
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progresso
 
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...
 
Statistics applied to the interdisciplinary areas of marketing
Statistics applied to the interdisciplinary areas of marketingStatistics applied to the interdisciplinary areas of marketing
Statistics applied to the interdisciplinary areas of marketing
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...Maxmizing Profits with the Improvement in Product Composition  - ICIEOM - Mer...
Maxmizing Profits with the Improvement in Product Composition - ICIEOM - Mer...
 
Expedia Data Analysis
Expedia Data AnalysisExpedia Data Analysis
Expedia Data Analysis
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Customer_Analysis.docx
Customer_Analysis.docxCustomer_Analysis.docx
Customer_Analysis.docx
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Classification and decision tree classifier machine learning
Classification and decision tree classifier machine learningClassification and decision tree classifier machine learning
Classification and decision tree classifier machine learning
 
Machine learning - session 4
Machine learning - session 4Machine learning - session 4
Machine learning - session 4
 
Prediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey dataPrediciting happiness from mobile app survey data
Prediciting happiness from mobile app survey data
 
Churn in the Telecommunications Industry
Churn in the Telecommunications IndustryChurn in the Telecommunications Industry
Churn in the Telecommunications Industry
 
Feature selection
Feature selectionFeature selection
Feature selection
 
SAS Analytics_Poster-Rafał Wojdan
SAS Analytics_Poster-Rafał WojdanSAS Analytics_Poster-Rafał Wojdan
SAS Analytics_Poster-Rafał Wojdan
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
Recommender system
Recommender systemRecommender system
Recommender system
 
UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...
UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...
UTILIZING IMBALANCED DATA AND CLASSIFICATION COST MATRIX TO PREDICT MOVIE PRE...
 
JEDM_RR_JF_Final
JEDM_RR_JF_FinalJEDM_RR_JF_Final
JEDM_RR_JF_Final
 

More from Pranov Mishra

Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningPranov Mishra
 
Sales Performance Deep Dive and Forecast: A ML Driven Analytics Solution
Sales Performance Deep Dive and Forecast: A ML Driven Analytics SolutionSales Performance Deep Dive and Forecast: A ML Driven Analytics Solution
Sales Performance Deep Dive and Forecast: A ML Driven Analytics SolutionPranov Mishra
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
Impact of Macro-Economic Factors on Customer Behaviour in the US Insurance In...
Impact of Macro-Economic Factors on Customer Behaviour in the US Insurance In...Impact of Macro-Economic Factors on Customer Behaviour in the US Insurance In...
Impact of Macro-Economic Factors on Customer Behaviour in the US Insurance In...Pranov Mishra
 
Recommendations for Preventive Maintenance - A Machine Learning Project
Recommendations for Preventive Maintenance - A Machine Learning ProjectRecommendations for Preventive Maintenance - A Machine Learning Project
Recommendations for Preventive Maintenance - A Machine Learning ProjectPranov Mishra
 

More from Pranov Mishra (6)

Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep Learning
 
Sales Performance Deep Dive and Forecast: A ML Driven Analytics Solution
Sales Performance Deep Dive and Forecast: A ML Driven Analytics SolutionSales Performance Deep Dive and Forecast: A ML Driven Analytics Solution
Sales Performance Deep Dive and Forecast: A ML Driven Analytics Solution
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Impact of Macro-Economic Factors on Customer Behaviour in the US Insurance In...
Impact of Macro-Economic Factors on Customer Behaviour in the US Insurance In...Impact of Macro-Economic Factors on Customer Behaviour in the US Insurance In...
Impact of Macro-Economic Factors on Customer Behaviour in the US Insurance In...
 
Recommendations for Preventive Maintenance - A Machine Learning Project
Recommendations for Preventive Maintenance - A Machine Learning ProjectRecommendations for Preventive Maintenance - A Machine Learning Project
Recommendations for Preventive Maintenance - A Machine Learning Project
 

Recently uploaded

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptTanveerAhmed817946
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives23050636
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSSnehalVinod
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxronsairoathenadugay
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxAniqa Zai
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjaytendertech
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIf6x4zqzk86
 

Recently uploaded (20)

Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
DS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .pptDS Lecture-1 about discrete structure .ppt
DS Lecture-1 about discrete structure .ppt
 
Displacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second DerivativesDisplacement, Velocity, Acceleration, and Second Derivatives
Displacement, Velocity, Acceleration, and Second Derivatives
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted KitAbortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
Abortion pills in Riyadh Saudi Arabia| +966572737505 | Get Cytotec, Unwanted Kit
 
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTSDBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
DBMS UNIT 5 46 CONTAINS NOTES FOR THE STUDENTS
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
jll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdfjll-asia-pacific-capital-tracker-1q24.pdf
jll-asia-pacific-capital-tracker-1q24.pdf
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 

Prediction of potential customers for term deposit

  • 1. Using the data collected from existing customers, building of a model that will help the marketing team to have high (or increased) hit ratio ~Pranov Shobhan Mishra Application of Ensemble Techniques: Predict likely customers for a bank product - A comparative study of linear models & tree based models Refer to github link for the code https://github.com/Pranov1984/Application-of-Ensemble-Techniques-Predict-likely-customers-for-a
  • 2. Executive Summary Overview: A bank is trying to increase the number of customers who subscribe to it’s term deposit product. The marketing team is interested in utilizing it’s resources judiciously by targeting those customers who have high probability of subscribing. There is historical data of the past campaigns along with the corresponding customer details existing with the bank. Problem Statement: Currently the effort to increase the number of subscribers to term deposit is manual. The hit ratio is abysmal and the resources utilized is very high for very little gain. The marketing team requires help with identification of customers who have a higher chance of subscribing when contacted. Goal Statement: Using the collected from existing customers, build a model that will help the marketing team identify potential customers who are relatively more likely to subscribe term deposit and thus increase their hit ratio. Approach: •Exploration & Visualization •Class imbalance noticed in the dependent variable EDA •Train Decision Tree, GLM & Ensembles on original data •Tune the hyper parameters •Use Minority Oversampling and train models Model Approach •Assign appropriate data types •One Hot encoding •Minority Oversampling as contrast Data Preparation •Finalize appropriate Evaluation metric •AUC, TPR, TNR, F1score Evaluation Metric Model Building Decision Tree Gradient Boosting XGboost Stacked Ensemble Models
  • 3.  Extensive experimentation with more than 10 different models was done to identify the best model that could predict customer behaviour so that the bank can take proactive steps to increase the number of subscriptions for term deposit.  The data was slightly imbalanced (Majority class: Minority Class = 89:11) and hence appropriate model evaluation metric was required to be chosen. A combination of Harmonic mean (F1 score), Sensitivity and Area Under the Curve (AUC) was used to finalize the best model.  Models tried to arrive at the best are  Simple Models like Logistic Regression different thresholds for classification  Decision Tree followed by various ensemble models were tried on the original dataset and the results compared  Since the data was imbalanced, minority oversampling was used to improve the ratio to 67:33. Results:  Default classification achieved through Logistic regression did not give satisfactory results as the recall (sensitivity/true positive rate) and F1-scores were too poor to proceed with the model. However when the thresholds were tuned (to 25%) the performance improved to a great extent (recall=61% and F1- score=57%). This was better than any of the tree based models tried.  The tree based algorithms, on the original data, gave poor results in terms of recall (sensitivity/true positive rate) achieved. The results were poorer than the linear model (Logistic Regression) attempted.  The accuracy scores for the tree based models were better than logistic regression indicating the imbalance in target variable was impacting the performance of tree based models. The tree based models needed more examples of the minority class to learn better and generalize well on unseen data.  The recall and F1-scores for Decision tree and bagging algorithms were the worst indicating individual trees and correlated trees can give unstable/unreliable results. The ensemble models (with de-correlated trees) generalize better .  When the class imbalance was treated by improving the ratio of Majority class: Minority Class to 67:33, all the tree based algorithms outperformed logistic regression results by a big margin indicating the relationship between X and Y was not linear and when the tree based algorithms were provided with adequate number of observations, the learning was robust.  With the balanced data random forest and stacked ensemble models gave the best results. Check the comparison in the next slide and also in the jupyter notebook. Executive Summary