SlideShare a Scribd company logo
1 of 18
Bank Marketing Data Science Exploration
Advanced Data Science Course - Capstone Project
Sasha Lazarevic, Dec 2019
https://www.linkedin.com/in/LZRVC/
Data Scientist‘s Deck
Problem Statement
• A Bank would like to understand which customers will accept the
bank‘s offer for a loan. A realistic dataset has been gathered as
presented with 21 columns and 41188 rows, which can be used to
train our model.
• Some customers have accepted the offer, but majority not. What
features are important ? Can we generate or imagine to add some
new features that will imrove the quality of our prediction? What
algorithm would be the best?
Methodology
• My data science project will follow a
standard data science project
methodology with phases like
• Initial data exploration
• Exrtact, Transform, Load
• Feature Creation
• Model definition
• Model training
• Model evaluation
• Model deployment
Toolset
• I will use Watson Studio as Data Science platform with the following tools:
• Data Refinery, with which I will explore my data set and try to combine data and add some features
• Jupyter Notebook , which which I will explore the data, and develop machine learning models
• Watson Machine Learning, which will allow me to train and deploy the models
• AutoAI, which will allow me to automatically train the models and optimize the hyperparameters
Dataset
• Dataset is in csv format, uploaded in my Watson Studio Project. Here is the description of the columns
• 1 - age (numeric)
2 - job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-
employed','services','student','technician','unemployed','unknown')
3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
4 - education (categorical: basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')
5 - default: has credit in default? (categorical: 'no','yes','unknown')
6 - housing: has housing loan? (categorical: 'no','yes','unknown')
7 - loan: has personal loan? (categorical: 'no','yes','unknown')
• related with the last contact of the current campaign:
8 - contact: contact communication type (categorical: 'cellular','telephone')
9 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
10 - day_of_week: last contact day of the week (categorical: 'mon','tue','wed','thu','fri')
11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if
duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known.
Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic
predictive model.
• other attributes:
12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client
was not previously contacted)
14 - previous: number of contacts performed before this campaign and for this client (numeric)
15 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')
• social and economic context attributes
16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)
17 - cons.price.idx: consumer price index - monthly indicator (numeric)
18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric)
19 - euribor3m: euribor 3 month rate - daily indicator (numeric)
20 - nr.employed: number of employees - quarterly indicator (numeric)
• Output variable (desired target):
21 - y - has the client subscribed a term deposit? (binary: 'yes','no')
bank-additional-full-mod.csv is the main dataset
bank-data-full-mod.csv is the dataset augmented
artificially with an experimental estimate about the
clients‘ revenues
Data Exploration
• Exploration of the data , shape, number and types of columns
• Exploration of output labels, categories and counts per category
• Exploration of some features and if we can gather any intuition about their influence on the prediction outcome
• Visualization of different features in the context of outcomes
One of the graphics describing the features
Data Refinery
• Data Refinery is a tool inside Watson Studio which
helps Data Scientists visualize and refine the data
• It includes a menu with a set of operations or
additional operations can be spoecified
programatically
• Operations are then stored in Refinery Flows, with
individual steps that can be scheduled to execute as
automatic jobs
• For Bank Marketing Data Science Exploration project, I
have created the following Refinery Flow with 6 steps,
and using this flow have generated a new dataset that
I imported into my Jupyter Notebook and AutoAI
Feature Generation
• First, I have created an additional column: AGE_BINNED
• I have also added revenues_est
Feature Generation
• But these two new features were not so efficient as the artificially generated features that I got with AutoAI
Model Selection
• As the data is structured, I have used ensemble of learners and
compared them with deep learning:
• Random Forest
• XGBoost
• LightGBM
• Keras-based deep neural network (Sequential model)
Model Training
• I have converted categorical data into numeric and scaled the data from 0 to 1.
• I have splitted the dataset into training (70% ) and test (30% )
• RandomForest has a method to display the importances which is quite useful to see what features contribute the
most to the predictions
• Later I have trained XGBoost and LightGBM, and Keras sequential model with two layers and 344 parameters
Model Training
Metrics
• I’ve used ROC and Area Under Curve for the performance metrics
AutoAI
• AutoAI is the latest functionality of Watson Studio , which allows automatic machine learning of a set of models,
automatic selection of the best performing model, automatic hyperparameter tuning and feature generation
Models Comparison
• The best performing algorithm is LightGBM with ROC AUC better than what I could obtain with my selection of
RandomForest, XGBoost or Keras Sequential model
Scores:
• RandomForest ROC AUC score: 86.28%
• XGBoost ROC AUC score: 87.11%
• LGBoost ROC AUC score: 91.57%
• Keras Sequential ROC AUC score: 79.66%
• AutoAI LightGBM best ROC AUC score: 95.1%
Model Deployment
• The best performing model coming from AutoAI is saved as BMDSE_LGBMClassifierEstimator_v4 :
The model has been
successfully deployed
in Watson Machine
Learning cloud service,
which allows to use
APIs in the application
for the prediction in
production
Conclusions
• Using standard data science methodology is key to success of data science projects
• Dataset needs to be of good quality and realistic
• Data Scientist needs to spend time analyzing and exploring the dataset in order to understand the features that
could be used for the classification
• Chose of the data science platform and tools is very important for the success of the project and for the
productivity of the data scientist
• Deep Learning is not suitable for all types of AI or data science problems. Structured data in tabular format can
benefit from traditional machine learning algorithms like Random Forest , XGBoost or LightGBM
• LightGBM proved to be more performant than Random Forest or XGBoost
• AutoAI functionality of Watson Studio is very performant and can save a lot of time to Data Scientist
• Watson Studio with Watson Machine Learning can also be used for deployments, which helps integrate and
accelerate the whole process of data exploration, model development and deployments
18
Sasha Lazarevic, IBM Switzerland
https://www.linkedin.com/in/LZRVC/
LZRVC.com
ХВАЛА !
MERCI
THANK YOU
DANKE
СПАСИБО
Grazie

More Related Content

What's hot

Merging Artificial and Human Intelligence in Recruitment
Merging Artificial and Human Intelligence in RecruitmentMerging Artificial and Human Intelligence in Recruitment
Merging Artificial and Human Intelligence in RecruitmentMax Armbruster
 
AI Recruitment - How Businesses Are Winning the Race for the Talent
AI Recruitment - How Businesses Are Winning the Race for the TalentAI Recruitment - How Businesses Are Winning the Race for the Talent
AI Recruitment - How Businesses Are Winning the Race for the TalentSkyl.ai
 
An introduction to ML, AI and Analytics
An introduction to ML, AI and AnalyticsAn introduction to ML, AI and Analytics
An introduction to ML, AI and AnalyticsSpotle.ai
 
AI and Managerial Decision Making
AI and Managerial Decision MakingAI and Managerial Decision Making
AI and Managerial Decision MakingLee Schlenker
 
How AI can help the recruitment industry
How AI can help the recruitment industryHow AI can help the recruitment industry
How AI can help the recruitment industryMacawly
 
Fact or Fiction: AI in Recruitment Selection and assessments.
Fact or Fiction: AI in Recruitment Selection and assessments. Fact or Fiction: AI in Recruitment Selection and assessments.
Fact or Fiction: AI in Recruitment Selection and assessments. Gareth Jones
 
How to get on the AI journey?
How to get on the AI journey? How to get on the AI journey?
How to get on the AI journey? Aarthi Srinivasan
 
Impact of AI on Business Intelligence
Impact of AI on Business IntelligenceImpact of AI on Business Intelligence
Impact of AI on Business IntelligenceDeesha Mukherjee
 
Artificial intelligence (ai) and its impact to business
Artificial intelligence (ai) and its impact to businessArtificial intelligence (ai) and its impact to business
Artificial intelligence (ai) and its impact to businesspaul young cpa, cga
 
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...AI Frontiers
 
Data Science Pipelines in Python using Luigi
Data Science Pipelines in Python using LuigiData Science Pipelines in Python using Luigi
Data Science Pipelines in Python using LuigiShivam Bansal
 
Building Data Science Pipelines in Python using Luigi
Building Data Science Pipelines in Python using LuigiBuilding Data Science Pipelines in Python using Luigi
Building Data Science Pipelines in Python using LuigiShwet Kamal Mishra
 
AI in Talent Acquisition - Talent Connect 2017
AI in Talent Acquisition - Talent Connect 2017AI in Talent Acquisition - Talent Connect 2017
AI in Talent Acquisition - Talent Connect 2017Przemek Berendt
 
Types of Blockchain, AI and its future
Types of Blockchain, AI and its futureTypes of Blockchain, AI and its future
Types of Blockchain, AI and its futureAarthi Srinivasan
 
Ml master class cfa poland
Ml master class   cfa polandMl master class   cfa poland
Ml master class cfa polandQuantUniversity
 
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...Textkernel
 
Machine Learning Salary Trends - Great Learning
Machine Learning Salary Trends - Great LearningMachine Learning Salary Trends - Great Learning
Machine Learning Salary Trends - Great LearningGreat Learning
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021QuantUniversity
 

What's hot (20)

Merging Artificial and Human Intelligence in Recruitment
Merging Artificial and Human Intelligence in RecruitmentMerging Artificial and Human Intelligence in Recruitment
Merging Artificial and Human Intelligence in Recruitment
 
AI Recruitment - How Businesses Are Winning the Race for the Talent
AI Recruitment - How Businesses Are Winning the Race for the TalentAI Recruitment - How Businesses Are Winning the Race for the Talent
AI Recruitment - How Businesses Are Winning the Race for the Talent
 
An introduction to ML, AI and Analytics
An introduction to ML, AI and AnalyticsAn introduction to ML, AI and Analytics
An introduction to ML, AI and Analytics
 
AI and Managerial Decision Making
AI and Managerial Decision MakingAI and Managerial Decision Making
AI and Managerial Decision Making
 
How AI can help the recruitment industry
How AI can help the recruitment industryHow AI can help the recruitment industry
How AI can help the recruitment industry
 
Projects
ProjectsProjects
Projects
 
Fact or Fiction: AI in Recruitment Selection and assessments.
Fact or Fiction: AI in Recruitment Selection and assessments. Fact or Fiction: AI in Recruitment Selection and assessments.
Fact or Fiction: AI in Recruitment Selection and assessments.
 
How to get on the AI journey?
How to get on the AI journey? How to get on the AI journey?
How to get on the AI journey?
 
Impact of AI on Business Intelligence
Impact of AI on Business IntelligenceImpact of AI on Business Intelligence
Impact of AI on Business Intelligence
 
Artificial intelligence (ai) and its impact to business
Artificial intelligence (ai) and its impact to businessArtificial intelligence (ai) and its impact to business
Artificial intelligence (ai) and its impact to business
 
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
 
Data Science Pipelines in Python using Luigi
Data Science Pipelines in Python using LuigiData Science Pipelines in Python using Luigi
Data Science Pipelines in Python using Luigi
 
Building Data Science Pipelines in Python using Luigi
Building Data Science Pipelines in Python using LuigiBuilding Data Science Pipelines in Python using Luigi
Building Data Science Pipelines in Python using Luigi
 
AI in Talent Acquisition - Talent Connect 2017
AI in Talent Acquisition - Talent Connect 2017AI in Talent Acquisition - Talent Connect 2017
AI in Talent Acquisition - Talent Connect 2017
 
Types of Blockchain, AI and its future
Types of Blockchain, AI and its futureTypes of Blockchain, AI and its future
Types of Blockchain, AI and its future
 
Ml master class cfa poland
Ml master class   cfa polandMl master class   cfa poland
Ml master class cfa poland
 
Wizenhire
WizenhireWizenhire
Wizenhire
 
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
 
Machine Learning Salary Trends - Great Learning
Machine Learning Salary Trends - Great LearningMachine Learning Salary Trends - Great Learning
Machine Learning Salary Trends - Great Learning
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021
 

Similar to Bank Marketing Data Science Project Yields 95.1% Accurate Model

AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxkprasad8
 
Business intelligence prof nikhat fatma mumtaz husain shaikh
Business intelligence  prof nikhat fatma mumtaz husain shaikhBusiness intelligence  prof nikhat fatma mumtaz husain shaikh
Business intelligence prof nikhat fatma mumtaz husain shaikhNikhat Fatma Mumtaz Husain Shaikh
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIMEAli Raza Anjum
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NETDev Raj Gautam
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...Value Amplify Consulting
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and MLQuantUniversity
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilitiesAllan D. Butler
 
Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420Jeremy Lehman
 
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...Louis Dorard
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment Kunal Kashyap
 
Mramadhani project presentation report version 02
Mramadhani project presentation report version 02Mramadhani project presentation report version 02
Mramadhani project presentation report version 02Malinda Ramadhani
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentationNeerajNishad4
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detectionjagan477830
 

Similar to Bank Marketing Data Science Project Yields 95.1% Accurate Model (20)

Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
Business intelligence prof nikhat fatma mumtaz husain shaikh
Business intelligence  prof nikhat fatma mumtaz husain shaikhBusiness intelligence  prof nikhat fatma mumtaz husain shaikh
Business intelligence prof nikhat fatma mumtaz husain shaikh
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIME
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and ML
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
 
Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
 
Deep learning
Deep learningDeep learning
Deep learning
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 
Mramadhani project presentation report version 02
Mramadhani project presentation report version 02Mramadhani project presentation report version 02
Mramadhani project presentation report version 02
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 

More from Sasha Lazarevic

Quantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AIQuantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AISasha Lazarevic
 
What is Quantum Computing and Why it is Important
What is Quantum Computing and Why it is ImportantWhat is Quantum Computing and Why it is Important
What is Quantum Computing and Why it is ImportantSasha Lazarevic
 
Project Risk Management - Introduction 2011
Project Risk Management - Introduction 2011Project Risk Management - Introduction 2011
Project Risk Management - Introduction 2011Sasha Lazarevic
 
Cognitive Urban Transport
Cognitive Urban TransportCognitive Urban Transport
Cognitive Urban TransportSasha Lazarevic
 
DataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the DataDataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the DataSasha Lazarevic
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson StudioSasha Lazarevic
 
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016Sasha Lazarevic
 
What is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutionsWhat is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutionsSasha Lazarevic
 

More from Sasha Lazarevic (10)

Quantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AIQuantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AI
 
What is Quantum Computing and Why it is Important
What is Quantum Computing and Why it is ImportantWhat is Quantum Computing and Why it is Important
What is Quantum Computing and Why it is Important
 
AI and Blockchain
AI and BlockchainAI and Blockchain
AI and Blockchain
 
Lean IT Transformation
Lean IT TransformationLean IT Transformation
Lean IT Transformation
 
Project Risk Management - Introduction 2011
Project Risk Management - Introduction 2011Project Risk Management - Introduction 2011
Project Risk Management - Introduction 2011
 
Cognitive Urban Transport
Cognitive Urban TransportCognitive Urban Transport
Cognitive Urban Transport
 
DataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the DataDataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the Data
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
 
What is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutionsWhat is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutions
 

Recently uploaded

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 

Recently uploaded (20)

Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 

Bank Marketing Data Science Project Yields 95.1% Accurate Model

  • 1. Bank Marketing Data Science Exploration Advanced Data Science Course - Capstone Project Sasha Lazarevic, Dec 2019 https://www.linkedin.com/in/LZRVC/ Data Scientist‘s Deck
  • 2. Problem Statement • A Bank would like to understand which customers will accept the bank‘s offer for a loan. A realistic dataset has been gathered as presented with 21 columns and 41188 rows, which can be used to train our model. • Some customers have accepted the offer, but majority not. What features are important ? Can we generate or imagine to add some new features that will imrove the quality of our prediction? What algorithm would be the best?
  • 3. Methodology • My data science project will follow a standard data science project methodology with phases like • Initial data exploration • Exrtact, Transform, Load • Feature Creation • Model definition • Model training • Model evaluation • Model deployment
  • 4. Toolset • I will use Watson Studio as Data Science platform with the following tools: • Data Refinery, with which I will explore my data set and try to combine data and add some features • Jupyter Notebook , which which I will explore the data, and develop machine learning models • Watson Machine Learning, which will allow me to train and deploy the models • AutoAI, which will allow me to automatically train the models and optimize the hyperparameters
  • 5. Dataset • Dataset is in csv format, uploaded in my Watson Studio Project. Here is the description of the columns • 1 - age (numeric) 2 - job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self- employed','services','student','technician','unemployed','unknown') 3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed) 4 - education (categorical: basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown') 5 - default: has credit in default? (categorical: 'no','yes','unknown') 6 - housing: has housing loan? (categorical: 'no','yes','unknown') 7 - loan: has personal loan? (categorical: 'no','yes','unknown') • related with the last contact of the current campaign: 8 - contact: contact communication type (categorical: 'cellular','telephone') 9 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec') 10 - day_of_week: last contact day of the week (categorical: 'mon','tue','wed','thu','fri') 11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model. • other attributes: 12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted) 14 - previous: number of contacts performed before this campaign and for this client (numeric) 15 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success') • social and economic context attributes 16 - emp.var.rate: employment variation rate - quarterly indicator (numeric) 17 - cons.price.idx: consumer price index - monthly indicator (numeric) 18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric) 19 - euribor3m: euribor 3 month rate - daily indicator (numeric) 20 - nr.employed: number of employees - quarterly indicator (numeric) • Output variable (desired target): 21 - y - has the client subscribed a term deposit? (binary: 'yes','no') bank-additional-full-mod.csv is the main dataset bank-data-full-mod.csv is the dataset augmented artificially with an experimental estimate about the clients‘ revenues
  • 6. Data Exploration • Exploration of the data , shape, number and types of columns • Exploration of output labels, categories and counts per category • Exploration of some features and if we can gather any intuition about their influence on the prediction outcome • Visualization of different features in the context of outcomes One of the graphics describing the features
  • 7. Data Refinery • Data Refinery is a tool inside Watson Studio which helps Data Scientists visualize and refine the data • It includes a menu with a set of operations or additional operations can be spoecified programatically • Operations are then stored in Refinery Flows, with individual steps that can be scheduled to execute as automatic jobs • For Bank Marketing Data Science Exploration project, I have created the following Refinery Flow with 6 steps, and using this flow have generated a new dataset that I imported into my Jupyter Notebook and AutoAI
  • 8. Feature Generation • First, I have created an additional column: AGE_BINNED • I have also added revenues_est
  • 9. Feature Generation • But these two new features were not so efficient as the artificially generated features that I got with AutoAI
  • 10. Model Selection • As the data is structured, I have used ensemble of learners and compared them with deep learning: • Random Forest • XGBoost • LightGBM • Keras-based deep neural network (Sequential model)
  • 11. Model Training • I have converted categorical data into numeric and scaled the data from 0 to 1. • I have splitted the dataset into training (70% ) and test (30% ) • RandomForest has a method to display the importances which is quite useful to see what features contribute the most to the predictions • Later I have trained XGBoost and LightGBM, and Keras sequential model with two layers and 344 parameters
  • 13. Metrics • I’ve used ROC and Area Under Curve for the performance metrics
  • 14. AutoAI • AutoAI is the latest functionality of Watson Studio , which allows automatic machine learning of a set of models, automatic selection of the best performing model, automatic hyperparameter tuning and feature generation
  • 15. Models Comparison • The best performing algorithm is LightGBM with ROC AUC better than what I could obtain with my selection of RandomForest, XGBoost or Keras Sequential model Scores: • RandomForest ROC AUC score: 86.28% • XGBoost ROC AUC score: 87.11% • LGBoost ROC AUC score: 91.57% • Keras Sequential ROC AUC score: 79.66% • AutoAI LightGBM best ROC AUC score: 95.1%
  • 16. Model Deployment • The best performing model coming from AutoAI is saved as BMDSE_LGBMClassifierEstimator_v4 : The model has been successfully deployed in Watson Machine Learning cloud service, which allows to use APIs in the application for the prediction in production
  • 17. Conclusions • Using standard data science methodology is key to success of data science projects • Dataset needs to be of good quality and realistic • Data Scientist needs to spend time analyzing and exploring the dataset in order to understand the features that could be used for the classification • Chose of the data science platform and tools is very important for the success of the project and for the productivity of the data scientist • Deep Learning is not suitable for all types of AI or data science problems. Structured data in tabular format can benefit from traditional machine learning algorithms like Random Forest , XGBoost or LightGBM • LightGBM proved to be more performant than Random Forest or XGBoost • AutoAI functionality of Watson Studio is very performant and can save a lot of time to Data Scientist • Watson Studio with Watson Machine Learning can also be used for deployments, which helps integrate and accelerate the whole process of data exploration, model development and deployments
  • 18. 18 Sasha Lazarevic, IBM Switzerland https://www.linkedin.com/in/LZRVC/ LZRVC.com ХВАЛА ! MERCI THANK YOU DANKE СПАСИБО Grazie