SlideShare a Scribd company logo
1 of 23
Data Analysis on
Bank Marketing Data Set
Anish Bhanushali
Information about dataset
• UCI machine learning repository link :
https://archive.ics.uci.edu/ml/datasets/Bank+Marketing
• This dataset has 20 attributes .
• Attribute 2 – 15 are having categorical inputs
• 21st attribute named ‘y’ is out class attribute which we want to
predict
Using logistic regression for classification
• Assign numerical values to categorical input data and normalize
numeric attributes
• To convert cat. Data into numeric we will do 1 hot encoding
• In this type of encoding if there are n distinct values are there
in the cat. Attribute then system will create a table of nx(n-1)
numerical values associated with given cat. Attribute
• In each entry of that table there will be at most one 1 and
remaining 0s will be stored
Example of cat. To numeric
• 2nd attribute job has 12 level (i.e. It’s having 12 distinct values )
• After conversion one more attribute named ‘contrastas’
• Here you can see that each value is coded into 11 bit binary
stream .
R code that converts all categorical inputs
to numeric values
colum_list = c(2,3,4,5,6,7,8,9,10,15)
for(i in colum_list)
{
n = length(levels(bank_data[[i]]))
contrasts(bank_data[[i]]) = contr.treatment(n)
}
Normalizing attributes
Following R code normalizes the attribute which are having numerical values (other than those attribute
which are having values as 0 or 1 )
normal = function(x)
{
return ((x - min(x))/(max(x) - min(x)))
}
colum_list = c(11,12,13,14,16,17,18,19,20)
for (i in colum_list)
{
bank_data[[i]] = normal(bank_data[[i]])
bank_data <<- bank_data
print(bank_data[[i]])
}
Preparing test and train data
• We are taking approx. 9% of data as test and remaining as
training data
• While dividing data into test and train we should take care about
the proportion of “yes” and “no” valued class
• In whole data set if we see 21st column then “yes” valued rows are
11% and 89% rows are having “no” as value of the same column
• we have to maintain same proportion into test data as well
R Code for making test/train set
bank_data_yes = bank_data[bank_data$y=="yes" , ]
bank_data_no = bank_data[bank_data$y=="no" , ]
true = vector('logical' ,length = 3000)
true = !true
false = vector('logical',length = (length(bank_data_no[[1]]) - 3000))
total_index_no = c(false,true)
x_no = runif(length(bank_data_no[[1]]))
total_index_no= total_index_no[order(x_no)]
test_no = bank_data_no[total_index_no ,]
This gives me total negative test set in test_no
R Code for making test/train set
true_yes = vector('logical' ,length = 400)
true_yes = !true_yes
false_yes = vector('logical',length = (length(bank_data_yes[[1]]) - 400))
total_index_yes = c(false_yes,true_yes)
x_yes = runif(length(bank_data_yes[[1]]))
length(x_yes)
length(total_index_yes)
total_index_yes= total_index_yes[order(x_yes)]
test_yes = bank_data_yes[total_index_yes ,]
total_test = as.data.frame(rbind(test_yes,test_no))
This gives me total positive test in test_yes then we combine them into
one dataset using rbind() method and name it as total_test
R Code for making test/train set
train_yes = bank_data_yes[!total_index_yes ,]
train_no = bank_data_no[!total_index_no , ]
total_train = as.data.frame(rbind(train_yes,train_no))
• These commands will make train dataset by excluding test rows
in main dataset
Using glm for logistic regression
model <- glm(total_train$y ~.,family=binomial(link='logit'),data=total_train[,-11])
• Here we have not included 11th column in train dataset because it is
clearly mention on uci repository page that ,
“this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet,
the duration is not known before a call is performed. Also, after the end of the call
y is obviously known. Thus, this input should only be included for benchmark
purposes and should be discarded if the intention is to have a realistic predictive
model.”
Summary of model
• smmary(model) command gives the output shown below and ***
indicates most relevant attribute
Predict the test data with logistic regression
model
The code below gives us the predicted output of test set notice
that here we have excluded 11th column
fitted.results <- predict(model,total_test[,-11],type='response')
fitted.results_yes_no <- ifelse(fitted.results > 0.5,"yes","no")
table(total_test$y , fitted.results_yes_no)
Here we have use the threshold value 0.5 which overall gives
good accuracy but can’t avoid huge error in ‘true positive’
prediction
Accuracy
• Confusion matrix with 0.5 as threshold
• Here we are getting over all accuracy of 89.5% but if you observe
only true values , they have accuracy of only 20.5%
• To avoid such loss we will analyze ROC curve
R code for to plot ROC curve
You’ll need “ROCR” package
require(ROCR)
pr <- prediction(fitted.results, total_test$y)
prf <- performance(pr, measure = "tpr", x.measure = "fpr")
plot(prf)
ROC curve
• This is the roc curve and here we can clearly
See that maximum we can have only 62%
‘true positive ’ rate
The area under this curve is given by following
Code ,
auc <- performance(pr, measure = "auc")
auc <- auc@y.values[[1]]
Value of auc is 0.7618
Increasing true positive rate
• To increase true positive rate we have to change threshold
• It was observed that if we were decreasing the threshold value
from 0.5 , it was showing increment in true positive value
• But observing the roc curve we can say that optimum true
positive rate that we could achieve is between 0.60 to 0.62
• For this process we have to slowly decrease threshold and
observe true positive rate simultaneously .
Optimal threshold is 0.12
• Here if we run this code ,
fitted.results <- predict(model,total_test[,-11],type='response')
fitted.results_yes_no <- ifelse(fitted.results > 0.12,"yes","no")
table(total_test$y , fitted.results_yes_no)
we will get this confusion matrix
Overall accuracy = 82.5% and true positive rate = 60% (0.6)
Using naïve bayes
• R code :
require(e1071)
naive_model <- naiveBayes(total_train$y ~. , data = total_train[,-11] , laplace = 0)
result = predict(naive_model , total_test[,-11])
table(total_test$y , result)
• here we get this confusion matrix
• Accuracy = 83.76 % true positive rate = 53.5% (0.535)
• NOTE : it was observed that if we use laplacian smoothing then result’s true positive rate
decreases
Using SVM with no kernel
• R code
require(e1071)
svmmod <- svm(total_train$y ~.,data = total_train[,-11] )
pred <- predict(svmmod, total_test[,c(-11,-21)], decision.values = TRUE)
table(total_test$y , pred)
This dataset is having 40k rows and svm will take huge time to
generate a predictive model out of it but you can load already
saved svm model and test your data on that.
SVM
Steps to load existing model and predict
Store ‘svm_model.rda’ file in your working directory and run this code,
load("savm_model.rda")
ls() #to check if svmmod is loaded or not
pred <- predict(svmmod, total_test[,c(-11,-21)], decision.values = TRUE)
Make sure that you include all necessary libraries before running
the ‘predict’ method
SVM accuracy
• This is the confusion matrix we got using SVM
• Overall accuracy is 89.41% but if we see the true positive rate ,
it’s 17.75%(0.177) which is very low compare to all previous
method that we saw
Final verdict
• This dataset shows good result with logistic and naïve bayes
method
• SVM is giving good accuracy but it fails in case of true positive
rate and
• This data set is having lot’s of categorical attributes that makes it
prone to be correctly classified by Decision trees

More Related Content

What's hot

Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialJia-Bin Huang
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsankit_ppt
 
Numerical analysis using Scilab: Solving nonlinear equations
Numerical analysis using Scilab: Solving nonlinear equationsNumerical analysis using Scilab: Solving nonlinear equations
Numerical analysis using Scilab: Solving nonlinear equationsScilab
 
Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2Scilab
 
Chapter 3.3
Chapter 3.3Chapter 3.3
Chapter 3.3sotlsoc
 
A complete introduction on matlab and matlab's projects
A complete introduction on matlab and matlab's projectsA complete introduction on matlab and matlab's projects
A complete introduction on matlab and matlab's projectsMukesh Kumar
 
Numerical analysis using Scilab: Error analysis and propagation
Numerical analysis using Scilab: Error analysis and propagationNumerical analysis using Scilab: Error analysis and propagation
Numerical analysis using Scilab: Error analysis and propagationScilab
 
Matlab-Data types and operators
Matlab-Data types and operatorsMatlab-Data types and operators
Matlab-Data types and operatorsLuckshay Batra
 
Mat lab workshop
Mat lab workshopMat lab workshop
Mat lab workshopVinay Kumar
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionSiddharth Shrivastava
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Yao Yao
 
B61301007 matlab documentation
B61301007 matlab documentationB61301007 matlab documentation
B61301007 matlab documentationManchireddy Reddy
 
Introduction to matlab lecture 3 of 4
Introduction to matlab lecture 3 of 4Introduction to matlab lecture 3 of 4
Introduction to matlab lecture 3 of 4Randa Elanwar
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning IntroductionKuppusamy P
 
Matlab ch1 (3)
Matlab ch1 (3)Matlab ch1 (3)
Matlab ch1 (3)mohsinggg
 
Data preprocessing for Machine Learning with R and Python
Data preprocessing for Machine Learning with R and PythonData preprocessing for Machine Learning with R and Python
Data preprocessing for Machine Learning with R and PythonAkhilesh Joshi
 

What's hot (20)

Linear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorialLinear Algebra and Matlab tutorial
Linear Algebra and Matlab tutorial
 
Ml3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metricsMl3 logistic regression-and_classification_error_metrics
Ml3 logistic regression-and_classification_error_metrics
 
Numerical analysis using Scilab: Solving nonlinear equations
Numerical analysis using Scilab: Solving nonlinear equationsNumerical analysis using Scilab: Solving nonlinear equations
Numerical analysis using Scilab: Solving nonlinear equations
 
Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2Scilab for real dummies j.heikell - part 2
Scilab for real dummies j.heikell - part 2
 
Chapter 3.3
Chapter 3.3Chapter 3.3
Chapter 3.3
 
MATLAB - Arrays and Matrices
MATLAB - Arrays and MatricesMATLAB - Arrays and Matrices
MATLAB - Arrays and Matrices
 
A complete introduction on matlab and matlab's projects
A complete introduction on matlab and matlab's projectsA complete introduction on matlab and matlab's projects
A complete introduction on matlab and matlab's projects
 
Numerical analysis using Scilab: Error analysis and propagation
Numerical analysis using Scilab: Error analysis and propagationNumerical analysis using Scilab: Error analysis and propagation
Numerical analysis using Scilab: Error analysis and propagation
 
Matlab-Data types and operators
Matlab-Data types and operatorsMatlab-Data types and operators
Matlab-Data types and operators
 
Matlab Tutorial
Matlab TutorialMatlab Tutorial
Matlab Tutorial
 
Matlabch01
Matlabch01Matlabch01
Matlabch01
 
Matlab introduction
Matlab introductionMatlab introduction
Matlab introduction
 
Mat lab workshop
Mat lab workshopMat lab workshop
Mat lab workshop
 
Machine Learning - Simple Linear Regression
Machine Learning - Simple Linear RegressionMachine Learning - Simple Linear Regression
Machine Learning - Simple Linear Regression
 
Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...Lab 2: Classification and Regression Prediction Models, training and testing ...
Lab 2: Classification and Regression Prediction Models, training and testing ...
 
B61301007 matlab documentation
B61301007 matlab documentationB61301007 matlab documentation
B61301007 matlab documentation
 
Introduction to matlab lecture 3 of 4
Introduction to matlab lecture 3 of 4Introduction to matlab lecture 3 of 4
Introduction to matlab lecture 3 of 4
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Matlab ch1 (3)
Matlab ch1 (3)Matlab ch1 (3)
Matlab ch1 (3)
 
Data preprocessing for Machine Learning with R and Python
Data preprocessing for Machine Learning with R and PythonData preprocessing for Machine Learning with R and Python
Data preprocessing for Machine Learning with R and Python
 

Viewers also liked

Reducing Time Spent On Requirements
Reducing Time Spent On RequirementsReducing Time Spent On Requirements
Reducing Time Spent On RequirementsByron Workman
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersBrian Griffith
 
business requirements functional and non functional
business requirements functional and  non functionalbusiness requirements functional and  non functional
business requirements functional and non functionalCHANDRA KAMAL
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Venkata Reddy Konasani
 
Strategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management SystemsStrategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management SystemsBoris Otto
 
2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practiceAlejandro Correa Bahnsen, PhD
 
Telecom Subscription, Churn and ARPU Analysis
Telecom Subscription, Churn and ARPU AnalysisTelecom Subscription, Churn and ARPU Analysis
Telecom Subscription, Churn and ARPU AnalysisAnurag Shandilya
 
Project Business Requirements Document
Project Business Requirements DocumentProject Business Requirements Document
Project Business Requirements DocumentJoshua Flewelling
 
Business requirements documents
Business requirements documentsBusiness requirements documents
Business requirements documentshapy
 
Sample Project Requirements Document – Library Blog
Sample Project Requirements Document – Library BlogSample Project Requirements Document – Library Blog
Sample Project Requirements Document – Library BlogALATechSource
 
Example requirements specification
Example requirements specificationExample requirements specification
Example requirements specificationindrisrozas
 
Sample Business Requirement Document
Sample Business Requirement DocumentSample Business Requirement Document
Sample Business Requirement DocumentIsabel Elaine Leong
 
KPCB Design in Tech Report 2015: Simplified and Redesigned
KPCB Design in Tech Report 2015: Simplified and RedesignedKPCB Design in Tech Report 2015: Simplified and Redesigned
KPCB Design in Tech Report 2015: Simplified and RedesignedStinson
 

Viewers also liked (19)

Regression analysis using sas
Regression analysis using sasRegression analysis using sas
Regression analysis using sas
 
2014_HMDA
2014_HMDA2014_HMDA
2014_HMDA
 
Reducing Time Spent On Requirements
Reducing Time Spent On RequirementsReducing Time Spent On Requirements
Reducing Time Spent On Requirements
 
Automation of reporting process
Automation of reporting processAutomation of reporting process
Automation of reporting process
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
 
Internship Report
Internship Report Internship Report
Internship Report
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
 
business requirements functional and non functional
business requirements functional and  non functionalbusiness requirements functional and  non functional
business requirements functional and non functional
 
Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS Table of Contents - Practical Business Analytics using SAS
Table of Contents - Practical Business Analytics using SAS
 
Strategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management SystemsStrategic Business Requirements for Master Data Management Systems
Strategic Business Requirements for Master Data Management Systems
 
2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice
 
Telecom Subscription, Churn and ARPU Analysis
Telecom Subscription, Churn and ARPU AnalysisTelecom Subscription, Churn and ARPU Analysis
Telecom Subscription, Churn and ARPU Analysis
 
Project Business Requirements Document
Project Business Requirements DocumentProject Business Requirements Document
Project Business Requirements Document
 
Business requirements documents
Business requirements documentsBusiness requirements documents
Business requirements documents
 
Sample Project Requirements Document – Library Blog
Sample Project Requirements Document – Library BlogSample Project Requirements Document – Library Blog
Sample Project Requirements Document – Library Blog
 
Example requirements specification
Example requirements specificationExample requirements specification
Example requirements specification
 
Sample Business Requirement Document
Sample Business Requirement DocumentSample Business Requirement Document
Sample Business Requirement Document
 
KPCB Design in Tech Report 2015: Simplified and Redesigned
KPCB Design in Tech Report 2015: Simplified and RedesignedKPCB Design in Tech Report 2015: Simplified and Redesigned
KPCB Design in Tech Report 2015: Simplified and Redesigned
 
Logistic Regression Analysis
Logistic Regression AnalysisLogistic Regression Analysis
Logistic Regression Analysis
 

Similar to Data Analysis of Bank Marketing Dataset Using Logistic Regression and Naive Bayes

Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Yao Yao
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by KodebayKodebay
 
wk5ppt1_Titanic
wk5ppt1_Titanicwk5ppt1_Titanic
wk5ppt1_TitanicAliciaWei1
 
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docx
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docxE2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docx
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docxjacksnathalie
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_reportRavi Gupta
 
Chapter 2&3 (java fundamentals and Control Structures).ppt
Chapter 2&3 (java fundamentals and Control Structures).pptChapter 2&3 (java fundamentals and Control Structures).ppt
Chapter 2&3 (java fundamentals and Control Structures).ppthenokmetaferia1
 
Chp-1 Quick Review of basic concepts.pdf
Chp-1 Quick Review of basic concepts.pdfChp-1 Quick Review of basic concepts.pdf
Chp-1 Quick Review of basic concepts.pdfSolomonMolla4
 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIIMax Kleiner
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning VMax Kleiner
 
Chapter 1: Linear Regression
Chapter 1: Linear RegressionChapter 1: Linear Regression
Chapter 1: Linear RegressionAkmelSyed
 

Similar to Data Analysis of Bank Marketing Dataset Using Logistic Regression and Naive Bayes (20)

lab program 6.pdf
lab program 6.pdflab program 6.pdf
lab program 6.pdf
 
wk5ppt2_Iris
wk5ppt2_Iriswk5ppt2_Iris
wk5ppt2_Iris
 
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
Mini-lab 1: Stochastic Gradient Descent classifier, Optimizing Logistic Regre...
 
Linear regression by Kodebay
Linear regression by KodebayLinear regression by Kodebay
Linear regression by Kodebay
 
wk5ppt1_Titanic
wk5ppt1_Titanicwk5ppt1_Titanic
wk5ppt1_Titanic
 
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docx
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docxE2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docx
E2 – Fundamentals, Functions & ArraysPlease refer to announcemen.docx
 
Ai_Project_report
Ai_Project_reportAi_Project_report
Ai_Project_report
 
Chapter 2&3 (java fundamentals and Control Structures).ppt
Chapter 2&3 (java fundamentals and Control Structures).pptChapter 2&3 (java fundamentals and Control Structures).ppt
Chapter 2&3 (java fundamentals and Control Structures).ppt
 
Chp-1 Quick Review of basic concepts.pdf
Chp-1 Quick Review of basic concepts.pdfChp-1 Quick Review of basic concepts.pdf
Chp-1 Quick Review of basic concepts.pdf
 
Curvefitting
CurvefittingCurvefitting
Curvefitting
 
maXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VIImaXbox starter69 Machine Learning VII
maXbox starter69 Machine Learning VII
 
EPE821_Lecture3.pptx
EPE821_Lecture3.pptxEPE821_Lecture3.pptx
EPE821_Lecture3.pptx
 
COM1407: Arrays
COM1407: ArraysCOM1407: Arrays
COM1407: Arrays
 
Linear Regression
Linear RegressionLinear Regression
Linear Regression
 
maXbox starter67 machine learning V
maXbox starter67 machine learning VmaXbox starter67 machine learning V
maXbox starter67 machine learning V
 
Naïve Bayes.pptx
Naïve Bayes.pptxNaïve Bayes.pptx
Naïve Bayes.pptx
 
Chapter 1: Linear Regression
Chapter 1: Linear RegressionChapter 1: Linear Regression
Chapter 1: Linear Regression
 
Array
ArrayArray
Array
 
Advance excel
Advance excelAdvance excel
Advance excel
 
Regression
RegressionRegression
Regression
 

Recently uploaded

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

Data Analysis of Bank Marketing Dataset Using Logistic Regression and Naive Bayes

  • 1. Data Analysis on Bank Marketing Data Set Anish Bhanushali
  • 2. Information about dataset • UCI machine learning repository link : https://archive.ics.uci.edu/ml/datasets/Bank+Marketing • This dataset has 20 attributes . • Attribute 2 – 15 are having categorical inputs • 21st attribute named ‘y’ is out class attribute which we want to predict
  • 3. Using logistic regression for classification • Assign numerical values to categorical input data and normalize numeric attributes • To convert cat. Data into numeric we will do 1 hot encoding • In this type of encoding if there are n distinct values are there in the cat. Attribute then system will create a table of nx(n-1) numerical values associated with given cat. Attribute • In each entry of that table there will be at most one 1 and remaining 0s will be stored
  • 4. Example of cat. To numeric • 2nd attribute job has 12 level (i.e. It’s having 12 distinct values ) • After conversion one more attribute named ‘contrastas’ • Here you can see that each value is coded into 11 bit binary stream .
  • 5. R code that converts all categorical inputs to numeric values colum_list = c(2,3,4,5,6,7,8,9,10,15) for(i in colum_list) { n = length(levels(bank_data[[i]])) contrasts(bank_data[[i]]) = contr.treatment(n) }
  • 6. Normalizing attributes Following R code normalizes the attribute which are having numerical values (other than those attribute which are having values as 0 or 1 ) normal = function(x) { return ((x - min(x))/(max(x) - min(x))) } colum_list = c(11,12,13,14,16,17,18,19,20) for (i in colum_list) { bank_data[[i]] = normal(bank_data[[i]]) bank_data <<- bank_data print(bank_data[[i]]) }
  • 7. Preparing test and train data • We are taking approx. 9% of data as test and remaining as training data • While dividing data into test and train we should take care about the proportion of “yes” and “no” valued class • In whole data set if we see 21st column then “yes” valued rows are 11% and 89% rows are having “no” as value of the same column • we have to maintain same proportion into test data as well
  • 8. R Code for making test/train set bank_data_yes = bank_data[bank_data$y=="yes" , ] bank_data_no = bank_data[bank_data$y=="no" , ] true = vector('logical' ,length = 3000) true = !true false = vector('logical',length = (length(bank_data_no[[1]]) - 3000)) total_index_no = c(false,true) x_no = runif(length(bank_data_no[[1]])) total_index_no= total_index_no[order(x_no)] test_no = bank_data_no[total_index_no ,] This gives me total negative test set in test_no
  • 9. R Code for making test/train set true_yes = vector('logical' ,length = 400) true_yes = !true_yes false_yes = vector('logical',length = (length(bank_data_yes[[1]]) - 400)) total_index_yes = c(false_yes,true_yes) x_yes = runif(length(bank_data_yes[[1]])) length(x_yes) length(total_index_yes) total_index_yes= total_index_yes[order(x_yes)] test_yes = bank_data_yes[total_index_yes ,] total_test = as.data.frame(rbind(test_yes,test_no)) This gives me total positive test in test_yes then we combine them into one dataset using rbind() method and name it as total_test
  • 10. R Code for making test/train set train_yes = bank_data_yes[!total_index_yes ,] train_no = bank_data_no[!total_index_no , ] total_train = as.data.frame(rbind(train_yes,train_no)) • These commands will make train dataset by excluding test rows in main dataset
  • 11. Using glm for logistic regression model <- glm(total_train$y ~.,family=binomial(link='logit'),data=total_train[,-11]) • Here we have not included 11th column in train dataset because it is clearly mention on uci repository page that , “this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.”
  • 12. Summary of model • smmary(model) command gives the output shown below and *** indicates most relevant attribute
  • 13. Predict the test data with logistic regression model The code below gives us the predicted output of test set notice that here we have excluded 11th column fitted.results <- predict(model,total_test[,-11],type='response') fitted.results_yes_no <- ifelse(fitted.results > 0.5,"yes","no") table(total_test$y , fitted.results_yes_no) Here we have use the threshold value 0.5 which overall gives good accuracy but can’t avoid huge error in ‘true positive’ prediction
  • 14. Accuracy • Confusion matrix with 0.5 as threshold • Here we are getting over all accuracy of 89.5% but if you observe only true values , they have accuracy of only 20.5% • To avoid such loss we will analyze ROC curve
  • 15. R code for to plot ROC curve You’ll need “ROCR” package require(ROCR) pr <- prediction(fitted.results, total_test$y) prf <- performance(pr, measure = "tpr", x.measure = "fpr") plot(prf)
  • 16. ROC curve • This is the roc curve and here we can clearly See that maximum we can have only 62% ‘true positive ’ rate The area under this curve is given by following Code , auc <- performance(pr, measure = "auc") auc <- auc@y.values[[1]] Value of auc is 0.7618
  • 17. Increasing true positive rate • To increase true positive rate we have to change threshold • It was observed that if we were decreasing the threshold value from 0.5 , it was showing increment in true positive value • But observing the roc curve we can say that optimum true positive rate that we could achieve is between 0.60 to 0.62 • For this process we have to slowly decrease threshold and observe true positive rate simultaneously .
  • 18. Optimal threshold is 0.12 • Here if we run this code , fitted.results <- predict(model,total_test[,-11],type='response') fitted.results_yes_no <- ifelse(fitted.results > 0.12,"yes","no") table(total_test$y , fitted.results_yes_no) we will get this confusion matrix Overall accuracy = 82.5% and true positive rate = 60% (0.6)
  • 19. Using naïve bayes • R code : require(e1071) naive_model <- naiveBayes(total_train$y ~. , data = total_train[,-11] , laplace = 0) result = predict(naive_model , total_test[,-11]) table(total_test$y , result) • here we get this confusion matrix • Accuracy = 83.76 % true positive rate = 53.5% (0.535) • NOTE : it was observed that if we use laplacian smoothing then result’s true positive rate decreases
  • 20. Using SVM with no kernel • R code require(e1071) svmmod <- svm(total_train$y ~.,data = total_train[,-11] ) pred <- predict(svmmod, total_test[,c(-11,-21)], decision.values = TRUE) table(total_test$y , pred) This dataset is having 40k rows and svm will take huge time to generate a predictive model out of it but you can load already saved svm model and test your data on that.
  • 21. SVM Steps to load existing model and predict Store ‘svm_model.rda’ file in your working directory and run this code, load("savm_model.rda") ls() #to check if svmmod is loaded or not pred <- predict(svmmod, total_test[,c(-11,-21)], decision.values = TRUE) Make sure that you include all necessary libraries before running the ‘predict’ method
  • 22. SVM accuracy • This is the confusion matrix we got using SVM • Overall accuracy is 89.41% but if we see the true positive rate , it’s 17.75%(0.177) which is very low compare to all previous method that we saw
  • 23. Final verdict • This dataset shows good result with logistic and naïve bayes method • SVM is giving good accuracy but it fails in case of true positive rate and • This data set is having lot’s of categorical attributes that makes it prone to be correctly classified by Decision trees