SlideShare a Scribd company logo
Bank Marketing Data Science Exploration
Advanced Data Science Course - Capstone Project
Sasha Lazarevic, Dec 2019
https://www.linkedin.com/in/LZRVC/
Data Scientist‘s Deck
Problem Statement
• A Bank would like to understand which customers will accept the
bank‘s offer for a loan. A realistic dataset has been gathered as
presented with 21 columns and 41188 rows, which can be used to
train our model.
• Some customers have accepted the offer, but majority not. What
features are important ? Can we generate or imagine to add some
new features that will imrove the quality of our prediction? What
algorithm would be the best?
Methodology
• My data science project will follow a
standard data science project
methodology with phases like
• Initial data exploration
• Exrtact, Transform, Load
• Feature Creation
• Model definition
• Model training
• Model evaluation
• Model deployment
Toolset
• I will use Watson Studio as Data Science platform with the following tools:
• Data Refinery, with which I will explore my data set and try to combine data and add some features
• Jupyter Notebook , which which I will explore the data, and develop machine learning models
• Watson Machine Learning, which will allow me to train and deploy the models
• AutoAI, which will allow me to automatically train the models and optimize the hyperparameters
Dataset
• Dataset is in csv format, uploaded in my Watson Studio Project. Here is the description of the columns
• 1 - age (numeric)
2 - job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-
employed','services','student','technician','unemployed','unknown')
3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
4 - education (categorical: basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')
5 - default: has credit in default? (categorical: 'no','yes','unknown')
6 - housing: has housing loan? (categorical: 'no','yes','unknown')
7 - loan: has personal loan? (categorical: 'no','yes','unknown')
• related with the last contact of the current campaign:
8 - contact: contact communication type (categorical: 'cellular','telephone')
9 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
10 - day_of_week: last contact day of the week (categorical: 'mon','tue','wed','thu','fri')
11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if
duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known.
Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic
predictive model.
• other attributes:
12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client
was not previously contacted)
14 - previous: number of contacts performed before this campaign and for this client (numeric)
15 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')
• social and economic context attributes
16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)
17 - cons.price.idx: consumer price index - monthly indicator (numeric)
18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric)
19 - euribor3m: euribor 3 month rate - daily indicator (numeric)
20 - nr.employed: number of employees - quarterly indicator (numeric)
• Output variable (desired target):
21 - y - has the client subscribed a term deposit? (binary: 'yes','no')
bank-additional-full-mod.csv is the main dataset
bank-data-full-mod.csv is the dataset augmented
artificially with an experimental estimate about the
clients‘ revenues
Data Exploration
• Exploration of the data , shape, number and types of columns
• Exploration of output labels, categories and counts per category
• Exploration of some features and if we can gather any intuition about their influence on the prediction outcome
• Visualization of different features in the context of outcomes
One of the graphics describing the features
Data Refinery
• Data Refinery is a tool inside Watson Studio which
helps Data Scientists visualize and refine the data
• It includes a menu with a set of operations or
additional operations can be spoecified
programatically
• Operations are then stored in Refinery Flows, with
individual steps that can be scheduled to execute as
automatic jobs
• For Bank Marketing Data Science Exploration project, I
have created the following Refinery Flow with 6 steps,
and using this flow have generated a new dataset that
I imported into my Jupyter Notebook and AutoAI
Feature Generation
• First, I have created an additional column: AGE_BINNED
• I have also added revenues_est
Feature Generation
• But these two new features were not so efficient as the artificially generated features that I got with AutoAI
Model Selection
• As the data is structured, I have used ensemble of learners and
compared them with deep learning:
• Random Forest
• XGBoost
• LightGBM
• Keras-based deep neural network (Sequential model)
Model Training
• I have converted categorical data into numeric and scaled the data from 0 to 1.
• I have splitted the dataset into training (70% ) and test (30% )
• RandomForest has a method to display the importances which is quite useful to see what features contribute the
most to the predictions
• Later I have trained XGBoost and LightGBM, and Keras sequential model with two layers and 344 parameters
Model Training
Metrics
• I’ve used ROC and Area Under Curve for the performance metrics
AutoAI
• AutoAI is the latest functionality of Watson Studio , which allows automatic machine learning of a set of models,
automatic selection of the best performing model, automatic hyperparameter tuning and feature generation
Models Comparison
• The best performing algorithm is LightGBM with ROC AUC better than what I could obtain with my selection of
RandomForest, XGBoost or Keras Sequential model
Scores:
• RandomForest ROC AUC score: 86.28%
• XGBoost ROC AUC score: 87.11%
• LGBoost ROC AUC score: 91.57%
• Keras Sequential ROC AUC score: 79.66%
• AutoAI LightGBM best ROC AUC score: 95.1%
Model Deployment
• The best performing model coming from AutoAI is saved as BMDSE_LGBMClassifierEstimator_v4 :
The model has been
successfully deployed
in Watson Machine
Learning cloud service,
which allows to use
APIs in the application
for the prediction in
production
Conclusions
• Using standard data science methodology is key to success of data science projects
• Dataset needs to be of good quality and realistic
• Data Scientist needs to spend time analyzing and exploring the dataset in order to understand the features that
could be used for the classification
• Chose of the data science platform and tools is very important for the success of the project and for the
productivity of the data scientist
• Deep Learning is not suitable for all types of AI or data science problems. Structured data in tabular format can
benefit from traditional machine learning algorithms like Random Forest , XGBoost or LightGBM
• LightGBM proved to be more performant than Random Forest or XGBoost
• AutoAI functionality of Watson Studio is very performant and can save a lot of time to Data Scientist
• Watson Studio with Watson Machine Learning can also be used for deployments, which helps integrate and
accelerate the whole process of data exploration, model development and deployments
18
Sasha Lazarevic, IBM Switzerland
https://www.linkedin.com/in/LZRVC/
LZRVC.com
ХВАЛА !
MERCI
THANK YOU
DANKE
СПАСИБО
Grazie

More Related Content

What's hot

Merging Artificial and Human Intelligence in Recruitment
Merging Artificial and Human Intelligence in RecruitmentMerging Artificial and Human Intelligence in Recruitment
Merging Artificial and Human Intelligence in Recruitment
Max Armbruster
 
AI Recruitment - How Businesses Are Winning the Race for the Talent
AI Recruitment - How Businesses Are Winning the Race for the TalentAI Recruitment - How Businesses Are Winning the Race for the Talent
AI Recruitment - How Businesses Are Winning the Race for the Talent
Skyl.ai
 
An introduction to ML, AI and Analytics
An introduction to ML, AI and AnalyticsAn introduction to ML, AI and Analytics
An introduction to ML, AI and Analytics
Spotle.ai
 
AI and Managerial Decision Making
AI and Managerial Decision MakingAI and Managerial Decision Making
AI and Managerial Decision Making
Lee Schlenker
 
How AI can help the recruitment industry
How AI can help the recruitment industryHow AI can help the recruitment industry
How AI can help the recruitment industry
Macawly
 
Projects
ProjectsProjects
Fact or Fiction: AI in Recruitment Selection and assessments.
Fact or Fiction: AI in Recruitment Selection and assessments. Fact or Fiction: AI in Recruitment Selection and assessments.
Fact or Fiction: AI in Recruitment Selection and assessments.
Gareth Jones
 
How to get on the AI journey?
How to get on the AI journey? How to get on the AI journey?
How to get on the AI journey?
Aarthi Srinivasan
 
Impact of AI on Business Intelligence
Impact of AI on Business IntelligenceImpact of AI on Business Intelligence
Impact of AI on Business Intelligence
Deesha Mukherjee
 
Artificial intelligence (ai) and its impact to business
Artificial intelligence (ai) and its impact to businessArtificial intelligence (ai) and its impact to business
Artificial intelligence (ai) and its impact to business
paul young cpa, cga
 
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
AI Frontiers
 
Data Science Pipelines in Python using Luigi
Data Science Pipelines in Python using LuigiData Science Pipelines in Python using Luigi
Data Science Pipelines in Python using Luigi
Shivam Bansal
 
Building Data Science Pipelines in Python using Luigi
Building Data Science Pipelines in Python using LuigiBuilding Data Science Pipelines in Python using Luigi
Building Data Science Pipelines in Python using Luigi
Shwet Kamal Mishra
 
AI in Talent Acquisition - Talent Connect 2017
AI in Talent Acquisition - Talent Connect 2017AI in Talent Acquisition - Talent Connect 2017
AI in Talent Acquisition - Talent Connect 2017
Przemek Berendt
 
Types of Blockchain, AI and its future
Types of Blockchain, AI and its futureTypes of Blockchain, AI and its future
Types of Blockchain, AI and its future
Aarthi Srinivasan
 
Ml master class cfa poland
Ml master class   cfa polandMl master class   cfa poland
Ml master class cfa poland
QuantUniversity
 
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Textkernel
 
Machine Learning Salary Trends - Great Learning
Machine Learning Salary Trends - Great LearningMachine Learning Salary Trends - Great Learning
Machine Learning Salary Trends - Great Learning
Great Learning
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021
QuantUniversity
 

What's hot (20)

Merging Artificial and Human Intelligence in Recruitment
Merging Artificial and Human Intelligence in RecruitmentMerging Artificial and Human Intelligence in Recruitment
Merging Artificial and Human Intelligence in Recruitment
 
AI Recruitment - How Businesses Are Winning the Race for the Talent
AI Recruitment - How Businesses Are Winning the Race for the TalentAI Recruitment - How Businesses Are Winning the Race for the Talent
AI Recruitment - How Businesses Are Winning the Race for the Talent
 
An introduction to ML, AI and Analytics
An introduction to ML, AI and AnalyticsAn introduction to ML, AI and Analytics
An introduction to ML, AI and Analytics
 
AI and Managerial Decision Making
AI and Managerial Decision MakingAI and Managerial Decision Making
AI and Managerial Decision Making
 
How AI can help the recruitment industry
How AI can help the recruitment industryHow AI can help the recruitment industry
How AI can help the recruitment industry
 
Projects
ProjectsProjects
Projects
 
Fact or Fiction: AI in Recruitment Selection and assessments.
Fact or Fiction: AI in Recruitment Selection and assessments. Fact or Fiction: AI in Recruitment Selection and assessments.
Fact or Fiction: AI in Recruitment Selection and assessments.
 
How to get on the AI journey?
How to get on the AI journey? How to get on the AI journey?
How to get on the AI journey?
 
Impact of AI on Business Intelligence
Impact of AI on Business IntelligenceImpact of AI on Business Intelligence
Impact of AI on Business Intelligence
 
Artificial intelligence (ai) and its impact to business
Artificial intelligence (ai) and its impact to businessArtificial intelligence (ai) and its impact to business
Artificial intelligence (ai) and its impact to business
 
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
Naghi Prasad at AI Frontiers: Building AI systems to automate enterprise proc...
 
Data Science Pipelines in Python using Luigi
Data Science Pipelines in Python using LuigiData Science Pipelines in Python using Luigi
Data Science Pipelines in Python using Luigi
 
Building Data Science Pipelines in Python using Luigi
Building Data Science Pipelines in Python using LuigiBuilding Data Science Pipelines in Python using Luigi
Building Data Science Pipelines in Python using Luigi
 
AI in Talent Acquisition - Talent Connect 2017
AI in Talent Acquisition - Talent Connect 2017AI in Talent Acquisition - Talent Connect 2017
AI in Talent Acquisition - Talent Connect 2017
 
Types of Blockchain, AI and its future
Types of Blockchain, AI and its futureTypes of Blockchain, AI and its future
Types of Blockchain, AI and its future
 
Ml master class cfa poland
Ml master class   cfa polandMl master class   cfa poland
Ml master class cfa poland
 
Wizenhire
WizenhireWizenhire
Wizenhire
 
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
Set the Hiring Managers’ Expectations: Using Big Data to answer Big Questions...
 
Machine Learning Salary Trends - Great Learning
Machine Learning Salary Trends - Great LearningMachine Learning Salary Trends - Great Learning
Machine Learning Salary Trends - Great Learning
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021
 

Similar to BMDSE v1 - Data Scientist Deck

Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
vishwajeetparmar1
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
kprasad8
 
Business intelligence prof nikhat fatma mumtaz husain shaikh
Business intelligence  prof nikhat fatma mumtaz husain shaikhBusiness intelligence  prof nikhat fatma mumtaz husain shaikh
Business intelligence prof nikhat fatma mumtaz husain shaikh
Nikhat Fatma Mumtaz Husain Shaikh
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIME
Ali Raza Anjum
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
Roger Barga
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
Rising Media, Inc.
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
Dev Raj Gautam
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
Value Amplify Consulting
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and ML
QuantUniversity
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
Allan D. Butler
 
Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420
Jeremy Lehman
 
Ai in finance
Ai in financeAi in finance
Ai in finance
QuantUniversity
 
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
Louis Dorard
 
Deep learning
Deep learningDeep learning
Deep learning
Arun Shukla
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
Roger Barga
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
Kunal Kashyap
 
Mramadhani project presentation report version 02
Mramadhani project presentation report version 02Mramadhani project presentation report version 02
Mramadhani project presentation report version 02
Malinda Ramadhani
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
NeerajNishad4
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
jagan477830
 

Similar to BMDSE v1 - Data Scientist Deck (20)

Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
Business intelligence prof nikhat fatma mumtaz husain shaikh
Business intelligence  prof nikhat fatma mumtaz husain shaikhBusiness intelligence  prof nikhat fatma mumtaz husain shaikh
Business intelligence prof nikhat fatma mumtaz husain shaikh
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIME
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and ML
 
Customer choice probabilities
Customer choice probabilitiesCustomer choice probabilities
Customer choice probabilities
 
Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420Machine intelligence data science methodology 060420
Machine intelligence data science methodology 060420
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
From Data to Artificial Intelligence with the Machine Learning Canvas — ODSC ...
 
Deep learning
Deep learningDeep learning
Deep learning
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Personal Loan Risk Assessment
Personal Loan Risk Assessment Personal Loan Risk Assessment
Personal Loan Risk Assessment
 
Mramadhani project presentation report version 02
Mramadhani project presentation report version 02Mramadhani project presentation report version 02
Mramadhani project presentation report version 02
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 

More from Sasha Lazarevic

Quantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AIQuantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AI
Sasha Lazarevic
 
What is Quantum Computing and Why it is Important
What is Quantum Computing and Why it is ImportantWhat is Quantum Computing and Why it is Important
What is Quantum Computing and Why it is Important
Sasha Lazarevic
 
AI and Blockchain
AI and BlockchainAI and Blockchain
AI and Blockchain
Sasha Lazarevic
 
Lean IT Transformation
Lean IT TransformationLean IT Transformation
Lean IT Transformation
Sasha Lazarevic
 
Project Risk Management - Introduction 2011
Project Risk Management - Introduction 2011Project Risk Management - Introduction 2011
Project Risk Management - Introduction 2011
Sasha Lazarevic
 
Cognitive Urban Transport
Cognitive Urban TransportCognitive Urban Transport
Cognitive Urban Transport
Sasha Lazarevic
 
DataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the DataDataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the Data
Sasha Lazarevic
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
Sasha Lazarevic
 
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Sasha Lazarevic
 
What is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutionsWhat is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutions
Sasha Lazarevic
 

More from Sasha Lazarevic (10)

Quantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AIQuantum Machine Learning for IBM AI
Quantum Machine Learning for IBM AI
 
What is Quantum Computing and Why it is Important
What is Quantum Computing and Why it is ImportantWhat is Quantum Computing and Why it is Important
What is Quantum Computing and Why it is Important
 
AI and Blockchain
AI and BlockchainAI and Blockchain
AI and Blockchain
 
Lean IT Transformation
Lean IT TransformationLean IT Transformation
Lean IT Transformation
 
Project Risk Management - Introduction 2011
Project Risk Management - Introduction 2011Project Risk Management - Introduction 2011
Project Risk Management - Introduction 2011
 
Cognitive Urban Transport
Cognitive Urban TransportCognitive Urban Transport
Cognitive Urban Transport
 
DataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the DataDataLive conference in Geneva 2018 - Bringing AI to the Data
DataLive conference in Geneva 2018 - Bringing AI to the Data
 
Deep Learning and Watson Studio
Deep Learning and Watson StudioDeep Learning and Watson Studio
Deep Learning and Watson Studio
 
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
Cognitive Computing and IBM Watson Solutions in FinTech Industry - 2016
 
What is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutionsWhat is OpenStack and the added value of IBM solutions
What is OpenStack and the added value of IBM solutions
 

Recently uploaded

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 

Recently uploaded (20)

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 

BMDSE v1 - Data Scientist Deck

  • 1. Bank Marketing Data Science Exploration Advanced Data Science Course - Capstone Project Sasha Lazarevic, Dec 2019 https://www.linkedin.com/in/LZRVC/ Data Scientist‘s Deck
  • 2. Problem Statement • A Bank would like to understand which customers will accept the bank‘s offer for a loan. A realistic dataset has been gathered as presented with 21 columns and 41188 rows, which can be used to train our model. • Some customers have accepted the offer, but majority not. What features are important ? Can we generate or imagine to add some new features that will imrove the quality of our prediction? What algorithm would be the best?
  • 3. Methodology • My data science project will follow a standard data science project methodology with phases like • Initial data exploration • Exrtact, Transform, Load • Feature Creation • Model definition • Model training • Model evaluation • Model deployment
  • 4. Toolset • I will use Watson Studio as Data Science platform with the following tools: • Data Refinery, with which I will explore my data set and try to combine data and add some features • Jupyter Notebook , which which I will explore the data, and develop machine learning models • Watson Machine Learning, which will allow me to train and deploy the models • AutoAI, which will allow me to automatically train the models and optimize the hyperparameters
  • 5. Dataset • Dataset is in csv format, uploaded in my Watson Studio Project. Here is the description of the columns • 1 - age (numeric) 2 - job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self- employed','services','student','technician','unemployed','unknown') 3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed) 4 - education (categorical: basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown') 5 - default: has credit in default? (categorical: 'no','yes','unknown') 6 - housing: has housing loan? (categorical: 'no','yes','unknown') 7 - loan: has personal loan? (categorical: 'no','yes','unknown') • related with the last contact of the current campaign: 8 - contact: contact communication type (categorical: 'cellular','telephone') 9 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec') 10 - day_of_week: last contact day of the week (categorical: 'mon','tue','wed','thu','fri') 11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model. • other attributes: 12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted) 14 - previous: number of contacts performed before this campaign and for this client (numeric) 15 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success') • social and economic context attributes 16 - emp.var.rate: employment variation rate - quarterly indicator (numeric) 17 - cons.price.idx: consumer price index - monthly indicator (numeric) 18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric) 19 - euribor3m: euribor 3 month rate - daily indicator (numeric) 20 - nr.employed: number of employees - quarterly indicator (numeric) • Output variable (desired target): 21 - y - has the client subscribed a term deposit? (binary: 'yes','no') bank-additional-full-mod.csv is the main dataset bank-data-full-mod.csv is the dataset augmented artificially with an experimental estimate about the clients‘ revenues
  • 6. Data Exploration • Exploration of the data , shape, number and types of columns • Exploration of output labels, categories and counts per category • Exploration of some features and if we can gather any intuition about their influence on the prediction outcome • Visualization of different features in the context of outcomes One of the graphics describing the features
  • 7. Data Refinery • Data Refinery is a tool inside Watson Studio which helps Data Scientists visualize and refine the data • It includes a menu with a set of operations or additional operations can be spoecified programatically • Operations are then stored in Refinery Flows, with individual steps that can be scheduled to execute as automatic jobs • For Bank Marketing Data Science Exploration project, I have created the following Refinery Flow with 6 steps, and using this flow have generated a new dataset that I imported into my Jupyter Notebook and AutoAI
  • 8. Feature Generation • First, I have created an additional column: AGE_BINNED • I have also added revenues_est
  • 9. Feature Generation • But these two new features were not so efficient as the artificially generated features that I got with AutoAI
  • 10. Model Selection • As the data is structured, I have used ensemble of learners and compared them with deep learning: • Random Forest • XGBoost • LightGBM • Keras-based deep neural network (Sequential model)
  • 11. Model Training • I have converted categorical data into numeric and scaled the data from 0 to 1. • I have splitted the dataset into training (70% ) and test (30% ) • RandomForest has a method to display the importances which is quite useful to see what features contribute the most to the predictions • Later I have trained XGBoost and LightGBM, and Keras sequential model with two layers and 344 parameters
  • 13. Metrics • I’ve used ROC and Area Under Curve for the performance metrics
  • 14. AutoAI • AutoAI is the latest functionality of Watson Studio , which allows automatic machine learning of a set of models, automatic selection of the best performing model, automatic hyperparameter tuning and feature generation
  • 15. Models Comparison • The best performing algorithm is LightGBM with ROC AUC better than what I could obtain with my selection of RandomForest, XGBoost or Keras Sequential model Scores: • RandomForest ROC AUC score: 86.28% • XGBoost ROC AUC score: 87.11% • LGBoost ROC AUC score: 91.57% • Keras Sequential ROC AUC score: 79.66% • AutoAI LightGBM best ROC AUC score: 95.1%
  • 16. Model Deployment • The best performing model coming from AutoAI is saved as BMDSE_LGBMClassifierEstimator_v4 : The model has been successfully deployed in Watson Machine Learning cloud service, which allows to use APIs in the application for the prediction in production
  • 17. Conclusions • Using standard data science methodology is key to success of data science projects • Dataset needs to be of good quality and realistic • Data Scientist needs to spend time analyzing and exploring the dataset in order to understand the features that could be used for the classification • Chose of the data science platform and tools is very important for the success of the project and for the productivity of the data scientist • Deep Learning is not suitable for all types of AI or data science problems. Structured data in tabular format can benefit from traditional machine learning algorithms like Random Forest , XGBoost or LightGBM • LightGBM proved to be more performant than Random Forest or XGBoost • AutoAI functionality of Watson Studio is very performant and can save a lot of time to Data Scientist • Watson Studio with Watson Machine Learning can also be used for deployments, which helps integrate and accelerate the whole process of data exploration, model development and deployments
  • 18. 18 Sasha Lazarevic, IBM Switzerland https://www.linkedin.com/in/LZRVC/ LZRVC.com ХВАЛА ! MERCI THANK YOU DANKE СПАСИБО Grazie