SlideShare a Scribd company logo
1 of 14
1 | Page
DATA SCIENCE PORTFOLIO
Jombaba.s7@gmail.com
I enjoy working on data and highlighted in this portfolio is one of the projects I have implemented using the python-based tools and
programming platform. The entire project is presented in five sub-projects with each of them capturing specifics of the entire work.
Imagine that, a financial institution / bank wish to find a solution to a ‘Customer Acquisition and Customer Retention’ related
problem. As a data Scientist, this is my attempt at providing a wholesome solution and the series of five projects illustrate a plausible
approach in resolving the problem.
Machine Learning Project:
Purpose:
The purpose of the machine learning project is to find the best classifier for the class attribute A16 of our dataset. Essentially, we
would be looking for a classifier with the highest separability measure, i.e. one which most clearly separate between the + and -
values of the class variable A16.
Dataset:
The dataset is found at: - https://archive.ics.uci.edu/ml/datasets/credit+approval.
The dataset comprises continuous and nominal attributes of small and large values. For reasons of privacy, the dataset was
published with column labels A1 – A16 instead of the actual descriptive labels.
Number of instances (observations) = 690. Number of attributes =15 (columns A1-A15). There is one class attribute (column A16)
and 307 (44.5%) of the classifier is “+” while 383 (55.5%) is “-“
Attribut
e Label
Value Type
A1 Nominal
2 | Page
A2 Continuou
s
A3 Continuou
s
A4 Nominal
A5 Nominal
A6 Nominal
A7 Nominal
A8 Continuou
s
A9 Nominal
A10 Nominal
A11 Continuou
s (Integer)
A12 Nominal
A13 Nominal
A14 Continuou
s (Integer)
15 Continuou
s (Integer)
A16 Class
attribute
Process:
The pre-processing exercise of cleaning-up the ‘Credit Card Application’ dataset provided a thoroughly balanced dataset for the
machine learning stage. Basically, the 67 missing values were replaced with statistically derived substitutes.
The dataset was further evaluated by doing a cross validation per given standard deviation or mean of error. For reasons including
high correlation between columns, outliers and variance of < 0.005, some rows and columns of the dataset were eliminated.
3 | Page
Hence, our analysis is based on a dataset comprising of 538 records and 12 variables. Using stratified sampling the dataset was
partitioned into 80% training set and 20% testing set. We developed three predictors so as to compare the fit of the Decision Tree,
Logistic Regression and Tree Ensemble models.
We use the ROC-AUC as our evaluation of fit metric for the ability of the model to predict the value of the class variable, A16. Also,
mention is made of how to deploy the chosen model.
Descriptions of the three models are highlighted below.
The Decision Tree model:
The Receiver Operating Characteristic Curve (ROC-AUC) for the Decision Tree Model
4 | Page
With a ROC-AUC value of 0.823 and accuracy of 0.806, this is the best fitted model of the three we evaluated. Furthermore, the
model is stable and exhibits the following characteristics.
The confusion matrix for the decision tree model.
5 | Page
Accuracy statistics for the decision tree model.
The Logistic Regression model:
The Receiver Operating Characteristic Curve (ROC-AUC) for the Logistic Regression Model:
6 | Page
This is an unstable model with a ROC-AUC value of 0.764 and accuracy of 0.787. Furthermore, the model exhibits the following
characteristics.
The confusion matrix for the logistic regression model
7 | Page
Accuracy statistics for the logistic regression model
The Tree Ensemble model
The Receiver Operating Characteristic Curve (ROC-AUC) for the Tree Ensemble Model
8 | Page
This is a poor and unstable model with a ROC-AUC value of 0.3969. This model has an accuracy value of 0.769 and exhibits the
following characteristics.
9 | Page
The confusion matrix for the tree ensemble model
Accuracy statistics for the tree ensemble model.
Preferred Model
The decision tree model is best at separating the + and – values of the feature variable A16. With an AUC value of 0.823, the model
handles the separability of these classes quite efficiently.
Furthermore, with an accuracy of 0.806 the decision tree model is preferred to the other models. This is an indication that this
model is more dependable and steadier than the other two.
Observe also that, the true positive rate is highest standing at 0.775. This indicates that, the decision tree model stands out as the
most sensitive in correctly predicting a positive response rate (recall) where the value of variable A16 = +.
Also, the specificity of this model stands at 0.824. Thus, the true negative rate (specificity) is being efficiently predicted. Though this
is not the model with highest score for all characteristics, the decision tree model is preferable to the other two models.
Furthermore, the precision (positive predictive value) is pretty high at 0.721. Again, though this is not the highest among the three
models we considered, the decision tree model is most stable.
10 | Page
Deployment of the Decision Tree Model:
108 records of the dataset were used in testing the model. This is an extract of the predicted values using the decision tree model.
Pay attention to the two columns that are arrowed.
11 | Page
Deployment of the Logistic Regression Model:
108 records of the dataset were used in testing the model. This is an extract of the predicted values using the logistic regression
model. Pay attention to the two columns that are arrowed.
12 | Page
Deployment of the Tree Ensemble Model:
108 records of the dataset were used in testing the model. This is an extract of the predicted values using the tree ensemble model.
Pay attention to the two columns that are arrowed.
13 | Page
Effect of the preferred model:
We consider the likely impact, in terms of $ saved, our choice of model might have on the business of the credit provider. We base
our deductions on the following assumptions:
Assumptions
1: It costs a bank or credit offering companies about $250.00 to acquire each new customer. This is based on the estimate of
Mercator Advisory group-leading trusted advisor to the payments and banking industries.
2: At 15% APR per year per person the bank or credit provider will generate interest rate of between $400.00- $450.00 on every
$3,000.00 credit provided to the customer. This is by industry standards.
3: We do not concern ourselves with other fees (transaction, etc.) that are charged to cardholders.
4: The + value of the classifier variable represents a successful application while, - represents a failed application.
Business Impact
We use the tree ensemble model as the base for evolving our estimates. For our purpose, it is reasonable to compare the tree
ensemble and decision tree models’ accuracy and sensitivity i.e. the % of total number of + relevant results which were correctly
classified by the algorithm.
Using the Sensitivity measure
The decision tree model has a sensitivity measure of 0.775 predicting that about 22.5% of applicants can be expected to fail. As a
result of using this model, it can be expected that about $5,625.00 of every $25,000.00 spent on acquiring 100 customers is wasted
on applicants who do not get approved.
Similarly, with sensitivity measure of 0.55, the tree ensemble model predicts that about 45.0% of applicants can be expected to fail.
Hence, using this model it can be expected that about $11,250.00 of every $25,000.00 spent on acquiring 100 customers is wasted
on applicants who do not get approved.
Subsequently, the decision tree model can be expected to save the credit provider up to 50.0% of the fund wasted on applicants
who do not get approved.
14 | Page
Based on these results, using the decision tree model instead of the tree ensemble model is expected to yield a drop of up to 50% in
funds wasted on failed applicants. Thus, the management of credit card application can be made more efficient by adopting an
improved model in order to achieve higher levels of savings.
tyJA

More Related Content

What's hot

840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptopRising Media, Inc.
 
1225 lunchlearn shekhar_using his mac
1225 lunchlearn shekhar_using his mac1225 lunchlearn shekhar_using his mac
1225 lunchlearn shekhar_using his macRising Media, Inc.
 
Logistic regression with low event rate (rare events)
Logistic regression with low event rate (rare events)Logistic regression with low event rate (rare events)
Logistic regression with low event rate (rare events)Tejamoy Ghosh
 
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Matt Hansen
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data ScienceJohn B. Rollins, Ph.D.
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSIJCI JOURNAL
 
Step by Step guide to executing an analytics project
Step by Step guide to executing an analytics projectStep by Step guide to executing an analytics project
Step by Step guide to executing an analytics projectRamkumar Ravichandran
 
Business Development Analysis
Business Development Analysis Business Development Analysis
Business Development Analysis Manpreet Chandhok
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...Galit Shmueli
 
To explain or to predict
To explain or to predictTo explain or to predict
To explain or to predictGalit Shmueli
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee AttritionMohamad Sahil
 
Andrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsAndrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsLviv Startup Club
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...INFOGAIN PUBLICATION
 

What's hot (19)

1440 track3 galusha
1440 track3 galusha1440 track3 galusha
1440 track3 galusha
 
840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptop
 
1225 lunchlearn shekhar_using his mac
1225 lunchlearn shekhar_using his mac1225 lunchlearn shekhar_using his mac
1225 lunchlearn shekhar_using his mac
 
Logistic regression with low event rate (rare events)
Logistic regression with low event rate (rare events)Logistic regression with low event rate (rare events)
Logistic regression with low event rate (rare events)
 
Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)Hypothesis Testing: Relationships (Compare 2+ Factors)
Hypothesis Testing: Relationships (Compare 2+ Factors)
 
Foundational Methodology for Data Science
Foundational Methodology for Data ScienceFoundational Methodology for Data Science
Foundational Methodology for Data Science
 
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMSPREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
PREDICTING BANKRUPTCY USING MACHINE LEARNING ALGORITHMS
 
Step by Step guide to executing an analytics project
Step by Step guide to executing an analytics projectStep by Step guide to executing an analytics project
Step by Step guide to executing an analytics project
 
Business Development Analysis
Business Development Analysis Business Development Analysis
Business Development Analysis
 
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...Big Data - To Explain or To Predict?  Talk at U Toronto's Rotman School of Ma...
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
 
To explain or to predict
To explain or to predictTo explain or to predict
To explain or to predict
 
Predictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise MinerPredictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise Miner
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Dmml report final
Dmml report finalDmml report final
Dmml report final
 
Predicting Employee Attrition
Predicting Employee AttritionPredicting Employee Attrition
Predicting Employee Attrition
 
Andrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and toolsAndrii Belas: A/B testing overview: use-cases, theory and tools
Andrii Belas: A/B testing overview: use-cases, theory and tools
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
 

Similar to Machine learning project

Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction SystemIRJET Journal
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakashShivaram Prakash
 
Review Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxReview Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxcarlstromcurtis
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Editor IJCATR
 
Creating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmCreating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmBill Fite
 
Explainable Machine Learning
Explainable Machine LearningExplainable Machine Learning
Explainable Machine LearningBill Fite
 
Krishna Chaitanya Yarlagadda Main Poster- Support Vector machines
Krishna Chaitanya Yarlagadda Main Poster- Support Vector machinesKrishna Chaitanya Yarlagadda Main Poster- Support Vector machines
Krishna Chaitanya Yarlagadda Main Poster- Support Vector machinesKrishna Chaitanya Yarlagadda
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
6©iStockphotoThinkstockModels and ForecastingLear.docx
6©iStockphotoThinkstockModels and ForecastingLear.docx6©iStockphotoThinkstockModels and ForecastingLear.docx
6©iStockphotoThinkstockModels and ForecastingLear.docxsodhi3
 
6©iStockphotoThinkstockModels and ForecastingLear.docx
6©iStockphotoThinkstockModels and ForecastingLear.docx6©iStockphotoThinkstockModels and ForecastingLear.docx
6©iStockphotoThinkstockModels and ForecastingLear.docxblondellchancy
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaRahul Bhatia
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMSAli T. Lotia
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network ModelEric Esajian
 
Machine Learning Project - 1994 U.S. Census
Machine Learning Project - 1994 U.S. CensusMachine Learning Project - 1994 U.S. Census
Machine Learning Project - 1994 U.S. CensusTim Enalls
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with Regoodwintx
 

Similar to Machine learning project (20)

Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
 
German credit score shivaram prakash
German credit score shivaram prakashGerman credit score shivaram prakash
German credit score shivaram prakash
 
Review Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docxReview Parameters Model Building & Interpretation and Model Tunin.docx
Review Parameters Model Building & Interpretation and Model Tunin.docx
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
Proposing an Appropriate Pattern for Car Detection by Using Intelligent Algor...
 
Creating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning AlgorithmCreating an Explainable Machine Learning Algorithm
Creating an Explainable Machine Learning Algorithm
 
Explainable Machine Learning
Explainable Machine LearningExplainable Machine Learning
Explainable Machine Learning
 
Krishna Chaitanya Yarlagadda Main Poster- Support Vector machines
Krishna Chaitanya Yarlagadda Main Poster- Support Vector machinesKrishna Chaitanya Yarlagadda Main Poster- Support Vector machines
Krishna Chaitanya Yarlagadda Main Poster- Support Vector machines
 
Employee mode of commuting
Employee mode of commutingEmployee mode of commuting
Employee mode of commuting
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
6©iStockphotoThinkstockModels and ForecastingLear.docx
6©iStockphotoThinkstockModels and ForecastingLear.docx6©iStockphotoThinkstockModels and ForecastingLear.docx
6©iStockphotoThinkstockModels and ForecastingLear.docx
 
6©iStockphotoThinkstockModels and ForecastingLear.docx
6©iStockphotoThinkstockModels and ForecastingLear.docx6©iStockphotoThinkstockModels and ForecastingLear.docx
6©iStockphotoThinkstockModels and ForecastingLear.docx
 
MIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_BhatiaMIS637_Final_Project_Rahul_Bhatia
MIS637_Final_Project_Rahul_Bhatia
 
Guide for building GLMS
Guide for building GLMSGuide for building GLMS
Guide for building GLMS
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
Neural Network Model
Neural Network ModelNeural Network Model
Neural Network Model
 
Machine Learning Project - 1994 U.S. Census
Machine Learning Project - 1994 U.S. CensusMachine Learning Project - 1994 U.S. Census
Machine Learning Project - 1994 U.S. Census
 
HRUG - Linear regression with R
HRUG - Linear regression with RHRUG - Linear regression with R
HRUG - Linear regression with R
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 

Recently uploaded

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 

Recently uploaded (20)

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 

Machine learning project

  • 1. 1 | Page DATA SCIENCE PORTFOLIO Jombaba.s7@gmail.com I enjoy working on data and highlighted in this portfolio is one of the projects I have implemented using the python-based tools and programming platform. The entire project is presented in five sub-projects with each of them capturing specifics of the entire work. Imagine that, a financial institution / bank wish to find a solution to a ‘Customer Acquisition and Customer Retention’ related problem. As a data Scientist, this is my attempt at providing a wholesome solution and the series of five projects illustrate a plausible approach in resolving the problem. Machine Learning Project: Purpose: The purpose of the machine learning project is to find the best classifier for the class attribute A16 of our dataset. Essentially, we would be looking for a classifier with the highest separability measure, i.e. one which most clearly separate between the + and - values of the class variable A16. Dataset: The dataset is found at: - https://archive.ics.uci.edu/ml/datasets/credit+approval. The dataset comprises continuous and nominal attributes of small and large values. For reasons of privacy, the dataset was published with column labels A1 – A16 instead of the actual descriptive labels. Number of instances (observations) = 690. Number of attributes =15 (columns A1-A15). There is one class attribute (column A16) and 307 (44.5%) of the classifier is “+” while 383 (55.5%) is “-“ Attribut e Label Value Type A1 Nominal
  • 2. 2 | Page A2 Continuou s A3 Continuou s A4 Nominal A5 Nominal A6 Nominal A7 Nominal A8 Continuou s A9 Nominal A10 Nominal A11 Continuou s (Integer) A12 Nominal A13 Nominal A14 Continuou s (Integer) 15 Continuou s (Integer) A16 Class attribute Process: The pre-processing exercise of cleaning-up the ‘Credit Card Application’ dataset provided a thoroughly balanced dataset for the machine learning stage. Basically, the 67 missing values were replaced with statistically derived substitutes. The dataset was further evaluated by doing a cross validation per given standard deviation or mean of error. For reasons including high correlation between columns, outliers and variance of < 0.005, some rows and columns of the dataset were eliminated.
  • 3. 3 | Page Hence, our analysis is based on a dataset comprising of 538 records and 12 variables. Using stratified sampling the dataset was partitioned into 80% training set and 20% testing set. We developed three predictors so as to compare the fit of the Decision Tree, Logistic Regression and Tree Ensemble models. We use the ROC-AUC as our evaluation of fit metric for the ability of the model to predict the value of the class variable, A16. Also, mention is made of how to deploy the chosen model. Descriptions of the three models are highlighted below. The Decision Tree model: The Receiver Operating Characteristic Curve (ROC-AUC) for the Decision Tree Model
  • 4. 4 | Page With a ROC-AUC value of 0.823 and accuracy of 0.806, this is the best fitted model of the three we evaluated. Furthermore, the model is stable and exhibits the following characteristics. The confusion matrix for the decision tree model.
  • 5. 5 | Page Accuracy statistics for the decision tree model. The Logistic Regression model: The Receiver Operating Characteristic Curve (ROC-AUC) for the Logistic Regression Model:
  • 6. 6 | Page This is an unstable model with a ROC-AUC value of 0.764 and accuracy of 0.787. Furthermore, the model exhibits the following characteristics. The confusion matrix for the logistic regression model
  • 7. 7 | Page Accuracy statistics for the logistic regression model The Tree Ensemble model The Receiver Operating Characteristic Curve (ROC-AUC) for the Tree Ensemble Model
  • 8. 8 | Page This is a poor and unstable model with a ROC-AUC value of 0.3969. This model has an accuracy value of 0.769 and exhibits the following characteristics.
  • 9. 9 | Page The confusion matrix for the tree ensemble model Accuracy statistics for the tree ensemble model. Preferred Model The decision tree model is best at separating the + and – values of the feature variable A16. With an AUC value of 0.823, the model handles the separability of these classes quite efficiently. Furthermore, with an accuracy of 0.806 the decision tree model is preferred to the other models. This is an indication that this model is more dependable and steadier than the other two. Observe also that, the true positive rate is highest standing at 0.775. This indicates that, the decision tree model stands out as the most sensitive in correctly predicting a positive response rate (recall) where the value of variable A16 = +. Also, the specificity of this model stands at 0.824. Thus, the true negative rate (specificity) is being efficiently predicted. Though this is not the model with highest score for all characteristics, the decision tree model is preferable to the other two models. Furthermore, the precision (positive predictive value) is pretty high at 0.721. Again, though this is not the highest among the three models we considered, the decision tree model is most stable.
  • 10. 10 | Page Deployment of the Decision Tree Model: 108 records of the dataset were used in testing the model. This is an extract of the predicted values using the decision tree model. Pay attention to the two columns that are arrowed.
  • 11. 11 | Page Deployment of the Logistic Regression Model: 108 records of the dataset were used in testing the model. This is an extract of the predicted values using the logistic regression model. Pay attention to the two columns that are arrowed.
  • 12. 12 | Page Deployment of the Tree Ensemble Model: 108 records of the dataset were used in testing the model. This is an extract of the predicted values using the tree ensemble model. Pay attention to the two columns that are arrowed.
  • 13. 13 | Page Effect of the preferred model: We consider the likely impact, in terms of $ saved, our choice of model might have on the business of the credit provider. We base our deductions on the following assumptions: Assumptions 1: It costs a bank or credit offering companies about $250.00 to acquire each new customer. This is based on the estimate of Mercator Advisory group-leading trusted advisor to the payments and banking industries. 2: At 15% APR per year per person the bank or credit provider will generate interest rate of between $400.00- $450.00 on every $3,000.00 credit provided to the customer. This is by industry standards. 3: We do not concern ourselves with other fees (transaction, etc.) that are charged to cardholders. 4: The + value of the classifier variable represents a successful application while, - represents a failed application. Business Impact We use the tree ensemble model as the base for evolving our estimates. For our purpose, it is reasonable to compare the tree ensemble and decision tree models’ accuracy and sensitivity i.e. the % of total number of + relevant results which were correctly classified by the algorithm. Using the Sensitivity measure The decision tree model has a sensitivity measure of 0.775 predicting that about 22.5% of applicants can be expected to fail. As a result of using this model, it can be expected that about $5,625.00 of every $25,000.00 spent on acquiring 100 customers is wasted on applicants who do not get approved. Similarly, with sensitivity measure of 0.55, the tree ensemble model predicts that about 45.0% of applicants can be expected to fail. Hence, using this model it can be expected that about $11,250.00 of every $25,000.00 spent on acquiring 100 customers is wasted on applicants who do not get approved. Subsequently, the decision tree model can be expected to save the credit provider up to 50.0% of the fund wasted on applicants who do not get approved.
  • 14. 14 | Page Based on these results, using the decision tree model instead of the tree ensemble model is expected to yield a drop of up to 50% in funds wasted on failed applicants. Thus, the management of credit card application can be made more efficient by adopting an improved model in order to achieve higher levels of savings. tyJA