SlideShare a Scribd company logo
1 of 17
Data Mining – analyse
Bank Marketing Data Set
by WEKA.
EXPLORATORY PROJECT BY
MATEUSZ BRZOSKA
MIDDLESEX UNIVERSITY 2015
1
Abstract / Aims / Objectives
Aims
 To study techniques and
methodologies in data mining
 To analyse a data set of interest for
clustering, classification, learning
dependencies and prediction
 To process the data and achieve the
final satisfactory result
Objectives
 To study Knowledge Discovery in
Database (KDD)
 To understand the need for analyses
of large, complex, information - rich
data sets
 To provide essential information and
demonstrate relevant algorithms onto
techniques
2
Bank Marketing
Data Set
“The data is come from marketing
campaigns of a Portuguese banking
institution. The marketing campaigns
were based on phone calls. Often, more
than one contact to the same client was
required, in order to access if the product
(bank term deposit) would be ('yes') or
not ('no') subscribed.“
41188 instances / 11 inputs
3
 predict if the client will subscribe (yes/no) a term deposit
Knowledge Discovery
in Databases
The KDD process consists of the
following steps (see the picture):
Selection of data which are relevant to
the analysis task
Preprocessing of these data, including
tasks like data cleaning and data
integration
Transformation of the data into forms
appropriate for mining
Application of Data Mining algorithms
for the extraction of patterns
Interpretation/evaluation of the
generated patterns so as to identify
those patterns that represent real
knowledge, based on some
interestingness measures.
4
Data Mining Overview
 "sink" in the electronic data
 data mining technology can extract knowledge
 efficiently and rationally utilize the data collected in the knowledge
 "a process of automatic discovery of non-trivial, previously unknown,
potentially useful rules, dependencies, patterns, similarities and trends in
large data repositories."
5
Data Mining Methods
Discovering
association rules
methods of discovering interesting
relationship or correlation
Classification
and prediction
includes methods for discovering
models (classifiers)
Grouping (cluster
analysis, clustering)
finding the classes of finite sets of
objects with similar characteristics
6
WEKA Software
 automatically make predictions
 help people make decisions faster and
more accurately
 freely available for download
 the most popular used data mining
systems
 the tools can be used in many different
data mining task
 discovering knowledge from Bank
Marketing Data Set through:
- classification
- clustering
- association rules
7
Visualization of Data Set and Examining Data
You can Visualize the attributes based on selected class.
8
Data Mining – Classification
(OneR, J48, Naive Bayes)
 method of data analysis
 assign an object (data) to one of the
predefined classes based on a set of
attributes that describe the object
 the purpose of classification is the
prediction
 the most popular classification
algorithms: Decision Trees (J48), Naive
Bayes, Bayesian Networks, OneR
9
Discovering potentially useful patterns
from a data set
- classification algorithms
OneR
OneR generate a one-level
decision tree. The rules are simple
to understand but also less
accurate.
Deposit = YES (AGE)
If 64.5 – 66.5
If 75.5 – 80.5
If more than 88.5
Deposit = NO (AGE)
If less than 64.5
If 66.5 – 75.5
If 80.5 – 88.5
J48
Divides the original data set
relative to each variable. Creates
many variants of the division.
Deposit = YES
Age > 60
Job = retired
Education = basic.4y
Marital = married
Loan = no
Housing = yes
Naïve Bayes
Assign a new case to one of the
classes.
10
Attribute NO YES
AGE 40 41
JOB Admin
MARITAL Married
EDUCATION University degree
DEFAULT No
HOUSING Yes
LOAN No
CONTRACT Cellular
MONTH May
DAY OF WEEK Monday Thursday
Data Mining – Clustering
(SimpleKMeans)
 a process of grouping objects in a
class called clusters
 definitions of the concept of the
cluster:
- a set of objects that are "similar“
- a set of objects such that the
distance between any two objects
belonging to the cluster that is less
than the distance between any
object
 algorithm SimpleKMeans as an
example in WEKA
11
Discovering potentially useful patterns
from a data set
- clustering algorithm
12
Represent the group with the centroid for the documents that belong to this group.
Membership in the group is determined by finding the most similar group centroid for each
document.
SimpleKMeans
Data Mining - Association
(Rules Function|Apriori)
 Association Rule is an unsupervised
data mining function
 It finds rules associated with frequently
co-occurring items
 It gives rules that explain how items or
events are associated with each other
 Apriori algorithm to discover
co-occurring items.
13
Discovering potentially useful patterns
from a data set
- association algorithm
14
Apriori
Apriori finds rules with support greater than a specified minimum support and confidence greater
than a specified minimum confidence.
1. marital=married contact=telephone month=may 5454 ==> y=no 5283 conf:(0.97)
2. marital=married loan=no contact=telephone month=may 4511 ==> y=no 4367 conf:(0.97)
3. contact=telephone month=may 8251 ==> y=no 7979 conf:(0.97)
4. loan=no contact=telephone month=may 6819 ==> y=no 6593 conf:(0.97)
5. default=no contact=telephone month=may 5726 ==> y=no 5533 conf:(0.97)
6. default=no loan=no contact=telephone month=may 4749 ==> y=no 4587 conf:(0.97)
7. month=aug y=no 5523 ==> contact=cellular 5290 conf:(0.96)
8. month=aug 6178 ==> contact=cellular 5909 conf:(0.96)
9. loan=no month=aug y=no 4562 ==> contact=cellular 4362 conf:(0.96)
10. loan=no month=aug 5120 ==> contact=cellular 4890 conf:(0.96)
Conclusion
Analysis
 shows information about techniques
and methodologies in data mining,
also Knowledge Discovery Database
 analyses a big dataset
 provides essential information and
demonstrate relevant algorithms onto
techniques
Results
 knowledge which is potentially useful;
 the computer search engines already
provide the best results in gaining of
specific goals;
 WEKA helped to collect certain rules;
 process the data and achieve the
final satisfactory result
15
Results
Will subscribe term deposit YES
AGE >65
JOB: services, blue-collar, technician, entrepreneur
MARITAL: married
EDUCATION: basic.9y, basic.6y, high.school
DEFAULT: unknown (has credit in default)
HOUSING: no (has housing loan)
LOAN: there is no big difference (has personal loan)
CONTACT: telephone
MONTH: may, jun, jul, agu, nov
DAY OF WEEK: mon, fri
Will subscribe term deposit NO
16
AGE <65
JOB: admin, student, unemployed, retired
MARITAL: single
EDUCATION: university degree, unknown
DEFAULT: no (has credit in default)
HOUSING: yes (has housing loan)
LOAN: there is no big difference (has personal loan)
CONTACT: cellular
MONTH: oct, sep, dec, mar, apr
DAY OF WEEK: tue, wed, thu
Who want that data?
marketing companies / banking institutions
Thank you for listening
17

More Related Content

What's hot

Churn Prediction in Practice
Churn Prediction in PracticeChurn Prediction in Practice
Churn Prediction in Practice
BigData Republic
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
Abhishek Singh
 

What's hot (20)

Digitizing Merchant Payments: What Will It Take?
Digitizing Merchant Payments: What Will It Take?Digitizing Merchant Payments: What Will It Take?
Digitizing Merchant Payments: What Will It Take?
 
FinQLOUD platform for digital banking
FinQLOUD platform for digital bankingFinQLOUD platform for digital banking
FinQLOUD platform for digital banking
 
Data Mining in Retail Industries
Data Mining in Retail IndustriesData Mining in Retail Industries
Data Mining in Retail Industries
 
Digital Finance Use Cases
Digital Finance Use CasesDigital Finance Use Cases
Digital Finance Use Cases
 
Big Data Trends
Big Data TrendsBig Data Trends
Big Data Trends
 
N26 pitch deck
N26 pitch deckN26 pitch deck
N26 pitch deck
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in Banking
 
Churn Prediction in Practice
Churn Prediction in PracticeChurn Prediction in Practice
Churn Prediction in Practice
 
Telecom Churn Prediction Presentation
Telecom Churn Prediction PresentationTelecom Churn Prediction Presentation
Telecom Churn Prediction Presentation
 
Careem Ride Share Presentation May 2016
Careem Ride Share Presentation May 2016Careem Ride Share Presentation May 2016
Careem Ride Share Presentation May 2016
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Fintech
FintechFintech
Fintech
 
Mattermark 2nd (Final) Series A Deck
Mattermark 2nd (Final) Series A DeckMattermark 2nd (Final) Series A Deck
Mattermark 2nd (Final) Series A Deck
 
Mobile Money Business Models
Mobile Money Business ModelsMobile Money Business Models
Mobile Money Business Models
 
Customer Segmentation
Customer SegmentationCustomer Segmentation
Customer Segmentation
 
Loan prediction
Loan predictionLoan prediction
Loan prediction
 
Marketing et big data
Marketing et big dataMarketing et big data
Marketing et big data
 
Credit eda case study presentation
Credit eda case study presentation  Credit eda case study presentation
Credit eda case study presentation
 

Similar to Data Mining – analyse Bank Marketing Data Set

Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
IJERA Editor
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
butest
 
Cluster2
Cluster2Cluster2
Cluster2
work
 

Similar to Data Mining – analyse Bank Marketing Data Set (20)

Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
Chapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data MiningChapter 1: Introduction to Data Mining
Chapter 1: Introduction to Data Mining
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
 
Data mining
Data miningData mining
Data mining
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
Data Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptxData Processing & Explain each term in details.pptx
Data Processing & Explain each term in details.pptx
 
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using RapidminerStudy and Analysis of K-Means Clustering Algorithm Using Rapidminer
Study and Analysis of K-Means Clustering Algorithm Using Rapidminer
 
Data mining
Data miningData mining
Data mining
 
PowerPoint Template
PowerPoint TemplatePowerPoint Template
PowerPoint Template
 
Data analytics and visualization
Data analytics and visualizationData analytics and visualization
Data analytics and visualization
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
 
Cluster2
Cluster2Cluster2
Cluster2
 
data mining
data miningdata mining
data mining
 
Configuring Associations to Increase Trust in Product Purchase
Configuring Associations to Increase Trust in Product Purchase Configuring Associations to Increase Trust in Product Purchase
Configuring Associations to Increase Trust in Product Purchase
 

Data Mining – analyse Bank Marketing Data Set

  • 1. Data Mining – analyse Bank Marketing Data Set by WEKA. EXPLORATORY PROJECT BY MATEUSZ BRZOSKA MIDDLESEX UNIVERSITY 2015 1
  • 2. Abstract / Aims / Objectives Aims  To study techniques and methodologies in data mining  To analyse a data set of interest for clustering, classification, learning dependencies and prediction  To process the data and achieve the final satisfactory result Objectives  To study Knowledge Discovery in Database (KDD)  To understand the need for analyses of large, complex, information - rich data sets  To provide essential information and demonstrate relevant algorithms onto techniques 2
  • 3. Bank Marketing Data Set “The data is come from marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. Often, more than one contact to the same client was required, in order to access if the product (bank term deposit) would be ('yes') or not ('no') subscribed.“ 41188 instances / 11 inputs 3  predict if the client will subscribe (yes/no) a term deposit
  • 4. Knowledge Discovery in Databases The KDD process consists of the following steps (see the picture): Selection of data which are relevant to the analysis task Preprocessing of these data, including tasks like data cleaning and data integration Transformation of the data into forms appropriate for mining Application of Data Mining algorithms for the extraction of patterns Interpretation/evaluation of the generated patterns so as to identify those patterns that represent real knowledge, based on some interestingness measures. 4
  • 5. Data Mining Overview  "sink" in the electronic data  data mining technology can extract knowledge  efficiently and rationally utilize the data collected in the knowledge  "a process of automatic discovery of non-trivial, previously unknown, potentially useful rules, dependencies, patterns, similarities and trends in large data repositories." 5
  • 6. Data Mining Methods Discovering association rules methods of discovering interesting relationship or correlation Classification and prediction includes methods for discovering models (classifiers) Grouping (cluster analysis, clustering) finding the classes of finite sets of objects with similar characteristics 6
  • 7. WEKA Software  automatically make predictions  help people make decisions faster and more accurately  freely available for download  the most popular used data mining systems  the tools can be used in many different data mining task  discovering knowledge from Bank Marketing Data Set through: - classification - clustering - association rules 7
  • 8. Visualization of Data Set and Examining Data You can Visualize the attributes based on selected class. 8
  • 9. Data Mining – Classification (OneR, J48, Naive Bayes)  method of data analysis  assign an object (data) to one of the predefined classes based on a set of attributes that describe the object  the purpose of classification is the prediction  the most popular classification algorithms: Decision Trees (J48), Naive Bayes, Bayesian Networks, OneR 9
  • 10. Discovering potentially useful patterns from a data set - classification algorithms OneR OneR generate a one-level decision tree. The rules are simple to understand but also less accurate. Deposit = YES (AGE) If 64.5 – 66.5 If 75.5 – 80.5 If more than 88.5 Deposit = NO (AGE) If less than 64.5 If 66.5 – 75.5 If 80.5 – 88.5 J48 Divides the original data set relative to each variable. Creates many variants of the division. Deposit = YES Age > 60 Job = retired Education = basic.4y Marital = married Loan = no Housing = yes Naïve Bayes Assign a new case to one of the classes. 10 Attribute NO YES AGE 40 41 JOB Admin MARITAL Married EDUCATION University degree DEFAULT No HOUSING Yes LOAN No CONTRACT Cellular MONTH May DAY OF WEEK Monday Thursday
  • 11. Data Mining – Clustering (SimpleKMeans)  a process of grouping objects in a class called clusters  definitions of the concept of the cluster: - a set of objects that are "similar“ - a set of objects such that the distance between any two objects belonging to the cluster that is less than the distance between any object  algorithm SimpleKMeans as an example in WEKA 11
  • 12. Discovering potentially useful patterns from a data set - clustering algorithm 12 Represent the group with the centroid for the documents that belong to this group. Membership in the group is determined by finding the most similar group centroid for each document. SimpleKMeans
  • 13. Data Mining - Association (Rules Function|Apriori)  Association Rule is an unsupervised data mining function  It finds rules associated with frequently co-occurring items  It gives rules that explain how items or events are associated with each other  Apriori algorithm to discover co-occurring items. 13
  • 14. Discovering potentially useful patterns from a data set - association algorithm 14 Apriori Apriori finds rules with support greater than a specified minimum support and confidence greater than a specified minimum confidence. 1. marital=married contact=telephone month=may 5454 ==> y=no 5283 conf:(0.97) 2. marital=married loan=no contact=telephone month=may 4511 ==> y=no 4367 conf:(0.97) 3. contact=telephone month=may 8251 ==> y=no 7979 conf:(0.97) 4. loan=no contact=telephone month=may 6819 ==> y=no 6593 conf:(0.97) 5. default=no contact=telephone month=may 5726 ==> y=no 5533 conf:(0.97) 6. default=no loan=no contact=telephone month=may 4749 ==> y=no 4587 conf:(0.97) 7. month=aug y=no 5523 ==> contact=cellular 5290 conf:(0.96) 8. month=aug 6178 ==> contact=cellular 5909 conf:(0.96) 9. loan=no month=aug y=no 4562 ==> contact=cellular 4362 conf:(0.96) 10. loan=no month=aug 5120 ==> contact=cellular 4890 conf:(0.96)
  • 15. Conclusion Analysis  shows information about techniques and methodologies in data mining, also Knowledge Discovery Database  analyses a big dataset  provides essential information and demonstrate relevant algorithms onto techniques Results  knowledge which is potentially useful;  the computer search engines already provide the best results in gaining of specific goals;  WEKA helped to collect certain rules;  process the data and achieve the final satisfactory result 15
  • 16. Results Will subscribe term deposit YES AGE >65 JOB: services, blue-collar, technician, entrepreneur MARITAL: married EDUCATION: basic.9y, basic.6y, high.school DEFAULT: unknown (has credit in default) HOUSING: no (has housing loan) LOAN: there is no big difference (has personal loan) CONTACT: telephone MONTH: may, jun, jul, agu, nov DAY OF WEEK: mon, fri Will subscribe term deposit NO 16 AGE <65 JOB: admin, student, unemployed, retired MARITAL: single EDUCATION: university degree, unknown DEFAULT: no (has credit in default) HOUSING: yes (has housing loan) LOAN: there is no big difference (has personal loan) CONTACT: cellular MONTH: oct, sep, dec, mar, apr DAY OF WEEK: tue, wed, thu Who want that data? marketing companies / banking institutions
  • 17. Thank you for listening 17