SlideShare a Scribd company logo
Marketing Campaign Effectiveness
Classification and Decision Tree Classifier
CIS 435
Francisco E. Figueroa
I. Introduction
Classification is a data mining task or function that assign objects to one of several
predefined categories or classes. The classification model encompasese diverse of
applications such as identifying load applicants as low, medium or high credit scores, detect
spam email messages based on the message header, among other examples. We must
consider that the classification model is the middle process where an input of attribute (x) that
goes through the classification model to obtain the output of the class label (y). The
classification task begins with a data set in which the class assignments are known. The
classifications are discrete and do not imply any type of order. If the class label is a continuous
attribute, then regression models will be used as predictive model. The simplest type of
classification problem is binary, where two possible values are possible. In the case that has
more values, then we have a multiclass. (Tan, 2006)
When building the classification model, after preparing the data, the training process is
key to the classification algorithm to find the relationships between the values of the predictors
and the values of the target. Descriptive modeling support the training process because it serve
as an explanatory tool to distinguish between objects of different classes. In the case, of the
predictive modeling, is used to predict the class label of unknown records. It’s important to point
out that classification techniques are suited for predicting or describing data sets with binary or
nominal categories. (SAS,2016)
In general, the classification technique requires a learning algorithm to identify a model
that best fits the relationship between the attribute set and the class label of the input data. The
objective of the algorithm is to build models with good generalization capability. To solve
classification problems we need to use a training set that will be applied to the test set, which
consist of records with unknown class labels. The evaluation of the performance of the
classification model is based on the confusion matrix.
The classification model has many application in customer segmentation, business
modeling, marketing, and credit analysis, among others.
II. Overview of Decision Tree
The decision tree is a classifier and is a powerful form to perform multiple variable
analysis. Decision trees are produced by algorithms that identify various ways of splitting a data
set into branch-like segments. Multiple variable analyses allow us to predict, explain, describe,
or classify an outcome (or target). An example of a multiple variable analysis is a probability of
sale or the likelihood to respond to a marketing campaign as a result of the combined effects of
multiple input variables, factors, or dimensions. This multiple variable analysis capability of
decision trees enables to go beyond simple one-cause, one-effect relationships and to discover
and describe things in the context of multiple influences. (SAS,2016)
In a decision tree is created from a series of questions and their possible answers that
are organized in a hierarchical structure consisting of nodes and directed edges. The tree has
three types of nodes: a) root node - has no incoming edges and zero or more outgoing edges;
b) internal nodes - each of which has exactly one incoming edge and two or more outgoing
edges; and c) leaf or terminal nodes - each of which has exactly one incoming edge and not
outgoing edges.
Efficient algorithms have been developed to induce a reasonably accurate decision
trees. The algorithms usually employ a greedy strategy that grows a decision tree by making a
series of locally optimum decisions about which attribute to use for partitioning the data. The
Hunt’s algorithm is the bases of many existing decision tree induction algorithms.
One of the biggest questions is how to split the training records and when to stop the
splitting. The decision induction algorithm must provide a method for expressing an attribute
test condition and its corresponding outcomes for different attribute types. There are measures
that can be used to determine the best way to split the records. The measures are defined in
terms of the class distribution of the record before and after the splitting. The measures
developed for selecting the best split are often based on the degree of impurity of the child
ones. Examples of impurity measures include the Gini (t) and Entropy(t). (Tan,2006) Entropy
is the quantitative measure of disorder in a system. It is used to calculate to find homogeneity in
the dataset to divide dataset into several classes. Entropy is used for when node belongs to
only one class, then entropy will become zero, when disorder of dataset is high or classes are
equally divided then entropy will be maximal and help in making decision at several stages.
(Gulati,2016). The information gain ratio reduce the bias of info gain. The Gini index is used by
CART and is an impurity measure of dataset. It’s an alternative of information gain. Entropy and
Gini are primary factors of measuring data impurity for classification. Entropy is best for
categorical attributes and Gini more numeric and continuous attributes.
III. Parameters Used for Model Accuracy
The evaluation metrics available for binary classification models are: Accuracy,
Precision, Recall, and AUC. The module outputs a confusion matrix showing the number of true
positives, false negatives, false positives, and true negatives, as well as ROC, Precision/Recall,
and Lift curves. When you see the accuracy is the proportion of correctly classified instances
and it is usually the first metric you look to evaluate a classifier. In the case that the data is is
unbalanced (where most of the instances belong to one of the classes), or you are more
interested in the performance on either one of the classes, accuracy doesn’t really capture the
effectiveness of a classifier.
The precision of the model let us understand which is the proportion of positives that are
classified correctly: TP/(TP+FP). The Recall let us now how many records did the classifier
classify correctly TP/(TP+FN) of the classifier. It is interesting that there is a trade-off between
precision and recall. Other areas that generates value to the accuracy model is the inspection
of the true positive rate vs. the false positive rate in the Receiver Operating Characteristic
(ROC) curve and the corresponding Area Under the Curve (AUC) value. The closer this curve is
to the upper left corner, the better the classifier’s performance is (that is maximizing the true
positive rate while minimizing the false positive rate). (Azure,2016)
IV. Weka Exercises
According to the exercise, we are trying to predict if a client will subscribe to a term deposit. In
this case when we apply the training set with all the attributes we obtained the following results:
Correctly Classified Instances 4023 88.9847 %
Incorrectly Classified Instances 498 11.0153 %
No Yes
No 3838 (TN) 162 (FP)
Yes 336 (FN) 185 (TP)
The Accuracy = (TP + TN ) / (P+N) = (185+3,838)/4,521 = .889. The decision tree has 104
Leaves and the size of the tree is 146.
When eliminating contact, day, month, and duration we obtained the following :
Correctly Classified Instances 4025 89.029 %
Incorrectly Classified Instances 496 10.971 %
No Yes
No 3961 (TN) 39 (FP)
Yes 457 (FN) 64 (TP)
The Accuracy = (TP + TN ) / (P+N) = (64+3,961)/4,521 = .890. The decision tree has 30 leaves
and the size of the tree is 42. In summary, the training data when eliminating the contact, day,
month, and duration becomes more effective in terms of accuracy and the decision tree is less
complex.
V. Use Cases
Decision Tree is one of the successful data mining techniques used in the diagnosis of heart
disease. Yet its accuracy is not perfect. Most research applies the J4.8 Decision Tree that is
based on Gain Ratio and binary discretization. (Showman,2011). Another application is for
marketing when a marketing manager at a company needs to analyze a customer with a given
profile, who will buy a new item.
References
Gulati,P., Sharma, A., Gupta, M. Theorical Study of Decision Tree Algorithms to Identify Pivotal
Factors for Performance Improvement: A Review. May 2016. International Journal of Computer
Applications. Vol 141 - No. 14.
Magee, J. Decision Trees for Decision Making.
Microsoft Azure. How to evaluate model performance in Azure Machine Learning. Retrieved
from
https://azure.microsoft.com/en-us/documentation/articles/machine-learning-evaluate-model-perf
ormance/
SAS. Decision Trees - What are They. Retrieved from
http://support.sas.com/publishing/pubcat/chaps/57587.pdf
Shouman,M. ,Turner T., Stocker R. Using Decision Tree for Diagnosing Heart Disease Patients
Retrieved from ​http://crpit.com/confpapers/CRPITV121Shouman.pdf

More Related Content

What's hot

Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
NBER
 
Research Method EMBA chapter 11
Research Method EMBA chapter 11Research Method EMBA chapter 11
Research Method EMBA chapter 11
Mazhar Poohlah
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
ijsrd.com
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
Dr. Hamdan Al-Sabri
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
Building classification model, tree model, confusion matrix and prediction ac...
Building classification model, tree model, confusion matrix and prediction ac...Building classification model, tree model, confusion matrix and prediction ac...
Building classification model, tree model, confusion matrix and prediction ac...
National Cheng Kung University
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
SSSW
 
Evaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsEvaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data sets
Alexander Decker
 
Data analysis
Data analysisData analysis
Data analysis
Carthikvinay1
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
hktripathy
 
Exam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsExam Short Preparation on Data Analytics
Exam Short Preparation on Data Analytics
Harsh Parekh
 
Research methodology
Research methodologyResearch methodology
Research methodology
avadheshprohit
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
Nguyen Ngoc Binh Phuong
 
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection AlgorithmPerformance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
rahulmonikasharma
 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep Learning
Pranov Mishra
 

What's hot (17)

Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1Nbe rtopicsandrecomvlecture1
Nbe rtopicsandrecomvlecture1
 
Research Method EMBA chapter 11
Research Method EMBA chapter 11Research Method EMBA chapter 11
Research Method EMBA chapter 11
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
Exploratory data analysis data visualization
Exploratory data analysis data visualizationExploratory data analysis data visualization
Exploratory data analysis data visualization
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Building classification model, tree model, confusion matrix and prediction ac...
Building classification model, tree model, confusion matrix and prediction ac...Building classification model, tree model, confusion matrix and prediction ac...
Building classification model, tree model, confusion matrix and prediction ac...
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
 
Evaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data setsEvaluation measures for models assessment over imbalanced data sets
Evaluation measures for models assessment over imbalanced data sets
 
Data analysis
Data analysisData analysis
Data analysis
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
 
Exam Short Preparation on Data Analytics
Exam Short Preparation on Data AnalyticsExam Short Preparation on Data Analytics
Exam Short Preparation on Data Analytics
 
Data analysis
Data analysisData analysis
Data analysis
 
Mkt research
Mkt researchMkt research
Mkt research
 
Research methodology
Research methodologyResearch methodology
Research methodology
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
 
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection AlgorithmPerformance Analysis of a Gaussian Mixture based Feature Selection Algorithm
Performance Analysis of a Gaussian Mixture based Feature Selection Algorithm
 
Automation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep LearningAutomation of IT Ticket Automation using NLP and Deep Learning
Automation of IT Ticket Automation using NLP and Deep Learning
 

Viewers also liked

El Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS Optimized
El Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS OptimizedEl Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS Optimized
El Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS OptimizedFrancisco E. Figueroa-Nigaglioni
 
Collect Pro Datasheet
Collect Pro DatasheetCollect Pro Datasheet
Collect Pro Datasheet
Francisco E. Figueroa-Nigaglioni
 
Association rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithmsAssociation rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithms
Francisco E. Figueroa-Nigaglioni
 
Neural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learningNeural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learning
Francisco E. Figueroa-Nigaglioni
 
Applying data mining for wine industry
Applying data mining for wine industryApplying data mining for wine industry
Applying data mining for wine industry
Francisco E. Figueroa-Nigaglioni
 
The iron triangle of healthcare
The iron triangle of healthcareThe iron triangle of healthcare
The iron triangle of healthcare
Francisco E. Figueroa-Nigaglioni
 
Integration and interoperability LOINC
Integration and interoperability LOINCIntegration and interoperability LOINC
Integration and interoperability LOINC
Francisco E. Figueroa-Nigaglioni
 

Viewers also liked (7)

El Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS Optimized
El Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS OptimizedEl Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS Optimized
El Nuevo Dia - Apuesta Tecnologica para la Salud 7 Agosto 2016 DHS Optimized
 
Collect Pro Datasheet
Collect Pro DatasheetCollect Pro Datasheet
Collect Pro Datasheet
 
Association rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithmsAssociation rules and frequent pattern growth algorithms
Association rules and frequent pattern growth algorithms
 
Neural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learningNeural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learning
 
Applying data mining for wine industry
Applying data mining for wine industryApplying data mining for wine industry
Applying data mining for wine industry
 
The iron triangle of healthcare
The iron triangle of healthcareThe iron triangle of healthcare
The iron triangle of healthcare
 
Integration and interoperability LOINC
Integration and interoperability LOINCIntegration and interoperability LOINC
Integration and interoperability LOINC
 

Similar to Classification and decision tree classifier machine learning

dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
AsrithaKorupolu
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
Esteban Ribero
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
Benjaminlapid1
 
Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics Algorithms
PromptCloud
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted Cluster
IOSR Journals
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
aciijournal
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
aciijournal
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
aciijournal
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
Gokulks007
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
DrGnaneswariG
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
DrMAlagupriyasafiq
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research Report
DrMAlagupriyasafiq
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
AnanthReddy38
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
Adetimehin Oluwasegun Matthew
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
editorijettcs
 
CREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES
CREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUESCREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES
CREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES
ijaia
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
IJSRD
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
Editor IJCATR
 

Similar to Classification and decision tree classifier machine learning (20)

dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
 
Supervised learning techniques and applications
Supervised learning techniques and applicationsSupervised learning techniques and applications
Supervised learning techniques and applications
 
Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics Algorithms
 
Classification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted ClusterClassification By Clustering Based On Adjusted Cluster
Classification By Clustering Based On Adjusted Cluster
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
Analysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through ApplicationAnalysis of Common Supervised Learning Algorithms Through Application
Analysis of Common Supervised Learning Algorithms Through Application
 
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATIONANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
ANALYSIS OF COMMON SUPERVISED LEARNING ALGORITHMS THROUGH APPLICATION
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
U0 vqmtq2otq=
U0 vqmtq2otq=U0 vqmtq2otq=
U0 vqmtq2otq=
 
Chapter 1.pdf
Chapter 1.pdfChapter 1.pdf
Chapter 1.pdf
 
Research Methodology-Data Processing
Research Methodology-Data ProcessingResearch Methodology-Data Processing
Research Methodology-Data Processing
 
Research methodology-Research Report
Research methodology-Research ReportResearch methodology-Research Report
Research methodology-Research Report
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
CREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES
CREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUESCREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES
CREDIT RISK MANAGEMENT USING ARTIFICIAL INTELLIGENCE TECHNIQUES
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
 

More from Francisco E. Figueroa-Nigaglioni

Healthcare terminologies recommendations
Healthcare terminologies recommendationsHealthcare terminologies recommendations
Healthcare terminologies recommendations
Francisco E. Figueroa-Nigaglioni
 
Interoperability critique
Interoperability critiqueInteroperability critique
Interoperability critique
Francisco E. Figueroa-Nigaglioni
 
Data mining applications
Data mining applicationsData mining applications
Data mining applications
Francisco E. Figueroa-Nigaglioni
 
Clustering algorithm Machine Learning
Clustering algorithm Machine LearningClustering algorithm Machine Learning
Clustering algorithm Machine Learning
Francisco E. Figueroa-Nigaglioni
 
Resumen Solucion CollectPro
Resumen Solucion CollectProResumen Solucion CollectPro
Resumen Solucion CollectPro
Francisco E. Figueroa-Nigaglioni
 
Introduction to CollectPro
Introduction to CollectProIntroduction to CollectPro
Introduction to CollectPro
Francisco E. Figueroa-Nigaglioni
 

More from Francisco E. Figueroa-Nigaglioni (7)

Healthcare terminologies recommendations
Healthcare terminologies recommendationsHealthcare terminologies recommendations
Healthcare terminologies recommendations
 
Interoperability critique
Interoperability critiqueInteroperability critique
Interoperability critique
 
Data mining applications
Data mining applicationsData mining applications
Data mining applications
 
Clustering algorithm Machine Learning
Clustering algorithm Machine LearningClustering algorithm Machine Learning
Clustering algorithm Machine Learning
 
Caribbean Business News - eCloud Suite 050512
Caribbean Business News - eCloud Suite 050512Caribbean Business News - eCloud Suite 050512
Caribbean Business News - eCloud Suite 050512
 
Resumen Solucion CollectPro
Resumen Solucion CollectProResumen Solucion CollectPro
Resumen Solucion CollectPro
 
Introduction to CollectPro
Introduction to CollectProIntroduction to CollectPro
Introduction to CollectPro
 

Recently uploaded

Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 

Recently uploaded (20)

Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 

Classification and decision tree classifier machine learning

  • 1. Marketing Campaign Effectiveness Classification and Decision Tree Classifier CIS 435 Francisco E. Figueroa I. Introduction Classification is a data mining task or function that assign objects to one of several predefined categories or classes. The classification model encompasese diverse of applications such as identifying load applicants as low, medium or high credit scores, detect spam email messages based on the message header, among other examples. We must consider that the classification model is the middle process where an input of attribute (x) that goes through the classification model to obtain the output of the class label (y). The classification task begins with a data set in which the class assignments are known. The classifications are discrete and do not imply any type of order. If the class label is a continuous attribute, then regression models will be used as predictive model. The simplest type of classification problem is binary, where two possible values are possible. In the case that has more values, then we have a multiclass. (Tan, 2006) When building the classification model, after preparing the data, the training process is key to the classification algorithm to find the relationships between the values of the predictors and the values of the target. Descriptive modeling support the training process because it serve as an explanatory tool to distinguish between objects of different classes. In the case, of the predictive modeling, is used to predict the class label of unknown records. It’s important to point out that classification techniques are suited for predicting or describing data sets with binary or nominal categories. (SAS,2016) In general, the classification technique requires a learning algorithm to identify a model that best fits the relationship between the attribute set and the class label of the input data. The objective of the algorithm is to build models with good generalization capability. To solve classification problems we need to use a training set that will be applied to the test set, which consist of records with unknown class labels. The evaluation of the performance of the classification model is based on the confusion matrix. The classification model has many application in customer segmentation, business modeling, marketing, and credit analysis, among others. II. Overview of Decision Tree The decision tree is a classifier and is a powerful form to perform multiple variable analysis. Decision trees are produced by algorithms that identify various ways of splitting a data set into branch-like segments. Multiple variable analyses allow us to predict, explain, describe, or classify an outcome (or target). An example of a multiple variable analysis is a probability of sale or the likelihood to respond to a marketing campaign as a result of the combined effects of multiple input variables, factors, or dimensions. This multiple variable analysis capability of decision trees enables to go beyond simple one-cause, one-effect relationships and to discover and describe things in the context of multiple influences. (SAS,2016)
  • 2. In a decision tree is created from a series of questions and their possible answers that are organized in a hierarchical structure consisting of nodes and directed edges. The tree has three types of nodes: a) root node - has no incoming edges and zero or more outgoing edges; b) internal nodes - each of which has exactly one incoming edge and two or more outgoing edges; and c) leaf or terminal nodes - each of which has exactly one incoming edge and not outgoing edges. Efficient algorithms have been developed to induce a reasonably accurate decision trees. The algorithms usually employ a greedy strategy that grows a decision tree by making a series of locally optimum decisions about which attribute to use for partitioning the data. The Hunt’s algorithm is the bases of many existing decision tree induction algorithms. One of the biggest questions is how to split the training records and when to stop the splitting. The decision induction algorithm must provide a method for expressing an attribute test condition and its corresponding outcomes for different attribute types. There are measures that can be used to determine the best way to split the records. The measures are defined in terms of the class distribution of the record before and after the splitting. The measures developed for selecting the best split are often based on the degree of impurity of the child ones. Examples of impurity measures include the Gini (t) and Entropy(t). (Tan,2006) Entropy is the quantitative measure of disorder in a system. It is used to calculate to find homogeneity in the dataset to divide dataset into several classes. Entropy is used for when node belongs to only one class, then entropy will become zero, when disorder of dataset is high or classes are equally divided then entropy will be maximal and help in making decision at several stages. (Gulati,2016). The information gain ratio reduce the bias of info gain. The Gini index is used by CART and is an impurity measure of dataset. It’s an alternative of information gain. Entropy and Gini are primary factors of measuring data impurity for classification. Entropy is best for categorical attributes and Gini more numeric and continuous attributes. III. Parameters Used for Model Accuracy The evaluation metrics available for binary classification models are: Accuracy, Precision, Recall, and AUC. The module outputs a confusion matrix showing the number of true positives, false negatives, false positives, and true negatives, as well as ROC, Precision/Recall, and Lift curves. When you see the accuracy is the proportion of correctly classified instances and it is usually the first metric you look to evaluate a classifier. In the case that the data is is unbalanced (where most of the instances belong to one of the classes), or you are more interested in the performance on either one of the classes, accuracy doesn’t really capture the effectiveness of a classifier. The precision of the model let us understand which is the proportion of positives that are classified correctly: TP/(TP+FP). The Recall let us now how many records did the classifier classify correctly TP/(TP+FN) of the classifier. It is interesting that there is a trade-off between precision and recall. Other areas that generates value to the accuracy model is the inspection of the true positive rate vs. the false positive rate in the Receiver Operating Characteristic (ROC) curve and the corresponding Area Under the Curve (AUC) value. The closer this curve is
  • 3. to the upper left corner, the better the classifier’s performance is (that is maximizing the true positive rate while minimizing the false positive rate). (Azure,2016) IV. Weka Exercises According to the exercise, we are trying to predict if a client will subscribe to a term deposit. In this case when we apply the training set with all the attributes we obtained the following results: Correctly Classified Instances 4023 88.9847 % Incorrectly Classified Instances 498 11.0153 % No Yes No 3838 (TN) 162 (FP) Yes 336 (FN) 185 (TP) The Accuracy = (TP + TN ) / (P+N) = (185+3,838)/4,521 = .889. The decision tree has 104 Leaves and the size of the tree is 146. When eliminating contact, day, month, and duration we obtained the following : Correctly Classified Instances 4025 89.029 % Incorrectly Classified Instances 496 10.971 % No Yes No 3961 (TN) 39 (FP) Yes 457 (FN) 64 (TP) The Accuracy = (TP + TN ) / (P+N) = (64+3,961)/4,521 = .890. The decision tree has 30 leaves and the size of the tree is 42. In summary, the training data when eliminating the contact, day, month, and duration becomes more effective in terms of accuracy and the decision tree is less complex. V. Use Cases Decision Tree is one of the successful data mining techniques used in the diagnosis of heart disease. Yet its accuracy is not perfect. Most research applies the J4.8 Decision Tree that is based on Gain Ratio and binary discretization. (Showman,2011). Another application is for marketing when a marketing manager at a company needs to analyze a customer with a given profile, who will buy a new item.
  • 4. References Gulati,P., Sharma, A., Gupta, M. Theorical Study of Decision Tree Algorithms to Identify Pivotal Factors for Performance Improvement: A Review. May 2016. International Journal of Computer Applications. Vol 141 - No. 14. Magee, J. Decision Trees for Decision Making. Microsoft Azure. How to evaluate model performance in Azure Machine Learning. Retrieved from https://azure.microsoft.com/en-us/documentation/articles/machine-learning-evaluate-model-perf ormance/ SAS. Decision Trees - What are They. Retrieved from http://support.sas.com/publishing/pubcat/chaps/57587.pdf Shouman,M. ,Turner T., Stocker R. Using Decision Tree for Diagnosing Heart Disease Patients Retrieved from ​http://crpit.com/confpapers/CRPITV121Shouman.pdf