SlideShare a Scribd company logo
1 of 44
Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
Journey Towards Augmented Analytics
Decision Tree
Parameter Tuning & Use cases
Terminologies
Introduction &
Example
Standard input/tuning parameters & Sample
UI
Sample output
UI
Interpretation of Output
Limitations
Business use cases
What Are
All Covered
Decision Tree Terminologies
Terminologies
 Decision tree : It’s a powerful and popular tool for classification and prediction in form of a tree structure
 Predictors and Target variable :
 Target variable usually denoted by Y , is the variable being predicted and is also called dependent variable, output variable, response variable
or outcome variable (Ex : One highlighted in red box in image below)
 Predictor, sometimes called an independent variable, is a variable that is being used to predict the target variable ( Ex : variables highlighted in
green below )
 Here the predictors highlighted in green box above which consist of wine attributes are used to predict the target variable that is Quality of a
wine (labeled as Quality_category) highlighted in red box above
Leaf Node :
Terminal node in a decision
tree where there are no
further splits
Interior Node :
The non leaf
nodes. Also
called decision
nodes
Splitting :
It is a process of
dividing a node into
two or more sub-
nodes.
Root Node :
The top most node
in a tree
Terminologies
Terminologies
Each internal (non-leaf)
node denotes a test on
a feature/predictor
Each leaf represents a value
of the target variable /class
label given the values of the
input variables represented
by the path from the root to
the leaf
Each branch
represents the
outcome of a test
Types of Decision Tree
Types of Decision Tree
Examples
Based on the historical data related to credit card
payments , loan payments , delinquency rate ,
outstanding balance we want to classify/divide the
customers into defaulters and non defaulters.
To access the characteristics of a customer such as his
purchase frequency, income , age, type of bank account,
occupation etc. that leads to purchase/non purchase of a
particular banking product such as installment loan ,
personal loan, checking account etc. Here classification tree
will classify the customers into purchasers and non
purchasers
There are two basic types of decision tree :
• Classification
• Regression
Classification trees are needed when target
variable is categorical and as the name
implies are used to classify/divide the data
into these predefined categories of a target
variable.
Types of Decision Tree
•Based on customer’ past
behavioral data on a retail website
such as days from last purchase,
brand preference, income , age ,
gender, website visits , location,
total amount of purchase so far
etc., if we want to predict the
purchase amount by each
customer then regression trees
are useful (Here the target
variable would be purchase
amount)
Examples
:
Regression trees are needed when the target
variable is numeric
Types of Decision Tree
Similarly, regression tree can also be used to
identify the market segment who is more
likely to respond to a future mailing.
For instance the segments (green box in
image below) having response rate higher
than overall response rate can be targeted
first as they will require little effort to obtain.
Where as different marketing strategy needs
to be devised for lower segments (segments
having response rate less than overall - red
box in image below)
Classification Tree
Let’s say we have only two predictors : level
of Alcohol and free sulfur dioxide in a wine
and we want to predict if a wine quality
(target variable) will be High or Low
•Since the target variable wine quality here
contains categorical values (High & low) , the
classification method will be applicable here
as the predictors will be classifying the data
into high & low.
•Decision tree splits the nodes on all
available variables and then selects the split
which results in most homogeneous/pure
sub-nodes.
•For example, if the target can be either yes
or no (will or will not increase spending), the
objective is to produce nodes where most of
the cases will increase spending or most of
the cases will not increase spending.
How Does A Tree Decide Where To Split
This is where decision
tree helps, it will
segregate the students
based on all values of
three variables
and identify the variable,
which creates the best
homogeneous sets of
students.
Now, I want to create a
model to predict who will
play cricket during leisure
period?
15 out of these 30 play
cricket in leisure time.
Let’s say we have a
sample of 30 students
with three variables
Gender (Boy/ Girl), Class
(IX/ X) and Height (5 to 6
feet).
Homogeneous Nodes
In the snapshot below, you can see that variable Gender is able to identify best homogeneous sets compared to the
other two variables.
Here most of the cases (65%) play cricketHere most of the cases
(80%)
Don’t play cricket
Hence Homogeneous
How To Interpret The
Classification Tree Output
In our data if it turns out that most of the
wines containing alcohol level <11 turned out
to be of low quality (hence homogeneous)
then the first split happens based on Alcohol
level and it becomes a top node in the tree
Total number of
cases with
prediction = low for
wine quality
Total number of
cases with
prediction = high for
wine quality
How To Interpret The
Classification Tree Output
Further , if alcohol >=12 then it classifies the
wine to be of high quality else low quality (As
seen in red box in image below)
The cases/records falling in high quality are
further tested with free sulfur dioxide level. If
free sulfur dioxide is >=28 and alcohol is also
>=12 then such wines are classified to be of
High quality. (As seen in green box in image
below)
But wines with alcohol >=12 but not having
free sulfur dioxide >=28 are classified to be
of low quality. (As seen in blue box in image
below)
Method: Regression
Regression-type trees are generally
applicable where we attempt to predict the
values of a numeric target variable from one
or more numeric and/or categorical predictor
variables.
For example, we may want to predict the
quality of wine (a numeric target variable)
from various other predictors such as volatile
acidity in wine , alcohol level in wine etc.
In this case the leaf nodes will contain
predicted wine quality based on wine
attributes such as alcohol and volatility as
shown below.
Method : Regression
Let’s take an example of predicting the wine
quality on a scale of 3 to 10 based on
predictors such as alcohol level , free sulfur
dioxide level , volatility etc.
Method : Regression
As seen in the red box in image below , the
first split is again based on alcohol level as
we observed in the output of classification
tree example.
The similar type of pattern is shown up here
wherein the quality is predicted to be high in
case of free sulfur >=24 and alcohol >=12
(Blue box in image below)
Additional pattern observed here is that the
wine quality is also dependent on volatility
level. Quality is high in case of volatility level
<0.21 (Purple box below)
Standard Tuning Parameters
Max Depth
It sets the maximum depth of any node of
the final tree, with the root node counted as
depth 0.
Lesser this number, lesser the length of tree.
For instance, setting max depth=2 while
generating a classification tree to predict
wine quality will lead to output as shown
below:
Depth 1
Depth 2
Depth 0
Depth 1
Depth 2
Depth 3
Depth 0Max Depth
Similarly , setting max depth = 3
will give following output :
Hence, higher the max depth,
lengthier the final tree.
Lengthier trees are generally not
reliable as they tend to have
nodes with very less records so
the tree would have poor
generalizability
Final Tuning Parameters Along
With UI Suggestions
Input Wizard Sample For Selecting Target
Variables And Predictors
Predictors and target
variable should be
selected using input
wizard as shown
below
Select the variables you want
to use
for prediction of selected
target variable
Purchase frequency
Age
Gender
Income
Website visits
Select the variable
you would like to
predict (Target
variable):
Purchase frequency
Age
Gender
Income
Website visits
Purchase
1 2
3 4
High
Age
Medium Low
<= 18 18 to 25 >=25
Male Female
Gender
Method
Impurity
Max
Depth
# Classes in
target variable
Classification
Gini
Two
Categorical
predictors
Purchase Frequency
Age
Gender
Select the classes to include
Purchase frequency
Tuning parameters
Note : Tuning parameters are explained in next section
Assuming the target
variable contains
yes/no values
Input Wizard Sample For Categorical
Predictors’ Class Selection & Tuning
Parameters
Sample Output Formats
Please note : Spark expects to give input for these parameters instead of auto detection
Sample Output Formats
1.Method
When target variable type is
numeric , regression should
be auto selected and in
case of categorical target
variable type , classification
should be auto selected
Input control type : Static
label
2.Impurity
If method=classification
then impurity should be set
to gini automatically
If method=regression then
impurity should be set to
variance automatically
Input control type : Static
label
3.Categorical
predictors info
Categorical predictors and
their class values should be
auto detected
Input control type for
categorical predictors list :
Static label
Input control type for
classes selection : Multiple
checkbox buttons
4. Max depth :
Input control type :
Editable slider with numeric
value label
(Suggested value : 3 to 5)
5. Number of
classes (Only for
method =
classification) :
This value is based on total
number of classes present in
the target variable.
For example in wine quality
classification case , the total
classes of wine quality are
two : high & low
Input control type : Static
label
Limitations
Classification Tree
Sample Outputs:
Regression Tree
Sample Outputs :
Regression Tree
Sample Outputs :
Business Use Cases
Limitations Of Decision Tree
Frequent changes to the data lead
to substantial differences in the
output , hence decision tree should
not be applied on data which is
fluctuating significantly.
There has to be predefined classes
for target variable (The categories
to which each record belongs for
classification tree) in the dataset.
Decision trees are prone to errors
in classification problems with
target variable containing many
classes and training dataset
containing relatively small number
of records.
• Hence total records in a training dataset
must be large in proportion to the total
classes of a target variable(There is no
thumb rule on how much larger the size of
records should be compared to target
variable classes)
Business Problem 1 – Classification tree
• Which customer segments should be targeted for increasing the subscription
rate of a term deposit product
• In this case, the classification tree can be used to access the characteristics of
customers that lead to subscription / non subscription of a term deposit
product targeted in direct marketing campaign
• Here the target variable would be the column of whether the customer that
was called during a direct marketing campaign , subscribed to a term deposit
product (“yes” if subscribed else “no”)
The Dataset
• Let’s say we have following customer attributes :
oAge
oJob type
oMarital status
oEducation
oAccount default status
oLoan status
oContact type
oOutcome of previous contact Target variable
Predictors
As shown above, we want to
classify customer attributes
such as
Age (numeric predictor), prior
loan status (categorical
predictor) ,
marital status (categorical
predictor) etc. into subscribers
and non subscribers of term
deposit product (target variable
classes)
Bar plot in leaf nodes show break up of yes and no classes
in the node with 0 to 1 scale in right side of the bar plot
indicating percentage of yes and no in that node and n
showing number of records belonging to that leaf node
Output Tree 1
Interpretation of tree output
As per the tree output, loan
status came out to be the best
predictor for term deposit
product purchase
Customers with prior loan and
marital status : “married”
outperforms all other segments
(Highlighted through blue
dashed line)
Also the customers with no prior
loans and age > 60 has the
second highest propensity to
purchase term deposit product
(Highlighted through green
dashed line)
Moreover , within the segment
with no prior loans , the singles
with age <=22 seem to be out
performing age >22 segment in
terms of term deposit product
purchase (Highlighted through
black dashed line)
How Splits And Terminal Nodes Are
Generated
Term deposit
purchased : Yes
Term deposit
purchased : No
Total records (%)
Prior Loan : Yes 10% 6% 16%
Prior Loan : No 14% 70% 84%
Marital Status No Yes Total records (%)
Divorced 13% 26% 39%
Married 21% 19% 40%
Single 6% 15% 21%
Decision tree chooses the predictor most
predictive of the target class
Here in our case , most of the records (84% of
records in dataset) contain prior loan status :
no and only 16% have loan status : Yes
Within loan status : no , 70% population don’t
purchase term deposit.
None of the other predictors have such homogeneity
with respect to term deposit purchase
For example marital status categories breakup is as
follows :
Thus due to relatively low homogeneity of other variables such as marital status, loan status was chosen as an attribute to create the first split
Similarly the sub nodes’ split happen using same homogeneity criteria
Terminal nodes are those nodes which can’t be split further due to the stopping criteria such as max depth (when maximum depth defined is 3, then node
splitting stops happening when tree depth =3 is achieved and last generated nodes become the terminal nodes )
Output 2 : Accuracy Of Prediction
No Yes
No 38,439 6,772
Yes 0 0
Actualclasses
Predicted classes
Accuracy = sum of boxes highlighted in red / all boxes = 38,439/(38,439+6,772+0+0) =
85.02%
Hence the sample decision tree model we just built is 85% accurate
and there is 15% chance of error here
Actual versus predicted table shows
how many classes are predicted
correctly by decision tree as shown
below :
Business Benefits
The segments
highlighted in black ,
blue and green in tree
output 1 are the low
hanging fruits requiring
less efforts to obtain so
no need to devise a
different target
marketing strategy for
these segments
The segments having
highest number of “No's”
(which are not
highlighted in tree
output 1) need to be
targeted in a different
and more efficient way to
convert them into
purchasers. For example
customers with marital
status : single/divorced.
Thus segmenting
customers based on their
propensity to buy/not
buy a product can aid in
devising better and
efficient target
marketing strategy in
order to convert more
non purchasers into
purchasers and in turn
increasing the product
penetration.
Use Case 2 – Classification Tree
Business benefit:
•Bank can decide on which customer
segments are eligible for any type of loan
versus which customer segments should
be denied any loan as they are likely to
default.
•This way risker customers are identified
easily and bank can avert the risk of
delinquencies
Business problem :
Based on the historical customer
attributes such as his/her credit card
payments ,loan payments ,outstanding
balance etc. a bank needs to classify the
customers into defaulters and non
defaulters
•In this case, the classification tree can be
used to access the characteristics of
customers that are likely to default
•Here the target variable would be a
column of whether customer has
defaulted previously or not (“yes” if
defaulted else “no”)
Use Case 3 – Regression Tree
Business benefit:
• Online retailers can identify the customer
segments which have higher capacity to
purchase and can design special marketing
strategy for such segments as these
segments are their main revenue drivers.
• This way premium customers can be given
special attention to retain their loyalty and in
turn revenue can be increased.
Business problem
Based on customers’ attributes and past
online shopping behavioral data, an online
retail giant such as Amazon/Flipkart wants to
predict the future purchase amount of
customers
• Here predictors can be customer's ‘days from
last purchase’, ‘brand preference’, ‘income’ ,
‘age’ , ‘gender’, ‘website visits’ , ‘location’,
‘total amount of purchase so far’ etc.,
• As the target variable is numeric (purchase
amount) , regression tree can be used to
predict the purchase amount by different
type of customer segments.
Use case 4 – Regression Tree
Business benefit:
• As soon as the new order arrives , a
service provider can provide
estimated completion time to a
customer based on the general
pattern observed through
regression tree model
• Proper workforce allocation and
planning
• Avoiding revenue leakage through
prevention of delay fine
Business problem :
• Predicting order completion time
for telecom service provider
• Predictors in this case can be : user
location , work force availability,
distance from nearest network
junction, average time taken in last
6 months , average historical delay
in last 6 months etc.
• Target variable here would be turn
around time of order completion
Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018

More Related Content

What's hot

Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationKomal Kotak
 
An introduction to decision trees
An introduction to decision treesAn introduction to decision trees
An introduction to decision treesFahim Muntaha
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningAbhishek Vijayvargia
 
Basics of Statistical decision theory
Basics of Statistical decision theoryBasics of Statistical decision theory
Basics of Statistical decision theoryKindson The Genius
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision TreesRupak Roy
 
An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boostingbutest
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision treehktripathy
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest Rupak Roy
 
ITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat AgarwalITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat AgarwalPrabhat Agarwal
 
165662191 chapter-03-answers-1
165662191 chapter-03-answers-1165662191 chapter-03-answers-1
165662191 chapter-03-answers-1Firas Husseini
 

What's hot (18)

Decision Tree and Bayesian Classification
Decision Tree and Bayesian ClassificationDecision Tree and Bayesian Classification
Decision Tree and Bayesian Classification
 
An introduction to decision trees
An introduction to decision treesAn introduction to decision trees
An introduction to decision trees
 
Decision tree example problem
Decision tree example problemDecision tree example problem
Decision tree example problem
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
Decision analysis
Decision analysisDecision analysis
Decision analysis
 
Basics of Statistical decision theory
Basics of Statistical decision theoryBasics of Statistical decision theory
Basics of Statistical decision theory
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision analysis
Decision analysisDecision analysis
Decision analysis
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision Trees
 
An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boosting
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
Decision theory Problems
Decision theory ProblemsDecision theory Problems
Decision theory Problems
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
ITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat AgarwalITB tutorial WEKA Prabhat Agarwal
ITB tutorial WEKA Prabhat Agarwal
 
193_report (1)
193_report (1)193_report (1)
193_report (1)
 
Les5e ppt 03
Les5e ppt 03Les5e ppt 03
Les5e ppt 03
 
165662191 chapter-03-answers-1
165662191 chapter-03-answers-1165662191 chapter-03-answers-1
165662191 chapter-03-answers-1
 
Decision tree
Decision treeDecision tree
Decision tree
 

Similar to What is the Decision Tree Analysis and How Does it Help a Business to Analyze Data?

What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...Smarten Augmented Analytics
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)AlexAman1
 
6 measurement & scaling
6  measurement & scaling6  measurement & scaling
6 measurement & scalingarjeskay
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?Smarten Augmented Analytics
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...Smarten Augmented Analytics
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 
SurveyAnalytics:Conjoint Analysis
SurveyAnalytics:Conjoint AnalysisSurveyAnalytics:Conjoint Analysis
SurveyAnalytics:Conjoint AnalysisQuestionPro
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progressoveesingh
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docxShiraPrater50
 
TYPES OF SCALES by Dr Mohmed Amin Mir.pptx
TYPES OF SCALES by Dr Mohmed Amin Mir.pptxTYPES OF SCALES by Dr Mohmed Amin Mir.pptx
TYPES OF SCALES by Dr Mohmed Amin Mir.pptxDr. Mohmed Amin Mir
 
Kaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overviewKaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overviewAdam Pah
 
3. Secondary Data, Online Information Databases, and Measurement.docx
3. Secondary Data, Online Information Databases, and Measurement.docx3. Secondary Data, Online Information Databases, and Measurement.docx
3. Secondary Data, Online Information Databases, and Measurement.docxtamicawaysmith
 
3 steps to PPC domination by Lance Loveday
3 steps to PPC domination by Lance Loveday3 steps to PPC domination by Lance Loveday
3 steps to PPC domination by Lance LovedayAnton Shulke
 
Value Based Pricing Strategy Powerpoint Presentation Slides
Value Based Pricing Strategy Powerpoint Presentation SlidesValue Based Pricing Strategy Powerpoint Presentation Slides
Value Based Pricing Strategy Powerpoint Presentation SlidesSlideTeam
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwaresDr.ammara khakwani
 
Research Method for Business chapter 7
Research Method for Business chapter  7Research Method for Business chapter  7
Research Method for Business chapter 7Mazhar Poohlah
 

Similar to What is the Decision Tree Analysis and How Does it Help a Business to Analyze Data? (20)

What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
What is Hierarchical Clustering and How Can an Organization Use it to Analyze...
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)
 
6 measurement & scaling
6  measurement & scaling6  measurement & scaling
6 measurement & scaling
 
What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?What Is Random Forest Classification And How Can It Help Your Business?
What Is Random Forest Classification And How Can It Help Your Business?
 
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
What Is Multilayer Perceptron Classifier And How Is It Used For Enterprise An...
 
Captsone_Paper_Alexander
Captsone_Paper_AlexanderCaptsone_Paper_Alexander
Captsone_Paper_Alexander
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 
Mr4 ms10
Mr4 ms10Mr4 ms10
Mr4 ms10
 
SurveyAnalytics:Conjoint Analysis
SurveyAnalytics:Conjoint AnalysisSurveyAnalytics:Conjoint Analysis
SurveyAnalytics:Conjoint Analysis
 
Store segmentation progresso
Store segmentation progressoStore segmentation progresso
Store segmentation progresso
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docx
 
TYPES OF SCALES by Dr Mohmed Amin Mir.pptx
TYPES OF SCALES by Dr Mohmed Amin Mir.pptxTYPES OF SCALES by Dr Mohmed Amin Mir.pptx
TYPES OF SCALES by Dr Mohmed Amin Mir.pptx
 
Kaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overviewKaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overview
 
3. Secondary Data, Online Information Databases, and Measurement.docx
3. Secondary Data, Online Information Databases, and Measurement.docx3. Secondary Data, Online Information Databases, and Measurement.docx
3. Secondary Data, Online Information Databases, and Measurement.docx
 
Chapter_03.ppt
Chapter_03.pptChapter_03.ppt
Chapter_03.ppt
 
3 steps to PPC domination by Lance Loveday
3 steps to PPC domination by Lance Loveday3 steps to PPC domination by Lance Loveday
3 steps to PPC domination by Lance Loveday
 
Value Based Pricing Strategy Powerpoint Presentation Slides
Value Based Pricing Strategy Powerpoint Presentation SlidesValue Based Pricing Strategy Powerpoint Presentation Slides
Value Based Pricing Strategy Powerpoint Presentation Slides
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
 
Research Method for Business chapter 7
Research Method for Business chapter  7Research Method for Business chapter  7
Research Method for Business chapter 7
 

More from Smarten Augmented Analytics

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenSmarten Augmented Analytics
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...Smarten Augmented Analytics
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?Smarten Augmented Analytics
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values Smarten Augmented Analytics
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Smarten Augmented Analytics
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenSmarten Augmented Analytics
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenSmarten Augmented Analytics
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenSmarten Augmented Analytics
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenSmarten Augmented Analytics
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?Smarten Augmented Analytics
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...Smarten Augmented Analytics
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...Smarten Augmented Analytics
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...Smarten Augmented Analytics
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...Smarten Augmented Analytics
 

More from Smarten Augmented Analytics (20)

Crime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – SmartenCrime Type Prediction - Augmented Analytics Use Case – Smarten
Crime Type Prediction - Augmented Analytics Use Case – Smarten
 
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
What Is Generalized Linear Regression with Gaussian Distribution And How Can ...
 
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
What is Isotonic Regression and How Can a Business Utilize it to Analyze Data?
 
Students' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – SmartenStudents' Academic Performance Predictive Analytics Use Case – Smarten
Students' Academic Performance Predictive Analytics Use Case – Smarten
 
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values  Random Forest Regression Analysis Reveals Impact of Variables on Target Values
Random Forest Regression Analysis Reveals Impact of Variables on Target Values
 
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
Gradient Boosting Regression Analysis Reveals Dependent Variables and Interre...
 
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
Fraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – SmartenFraud Mitigation Predictive Analytics Use Case – Smarten
Fraud Mitigation Predictive Analytics Use Case – Smarten
 
Quality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - SmartenQuality Control Predictive Analytics Use Case - Smarten
Quality Control Predictive Analytics Use Case - Smarten
 
Machine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - SmartenMachine Maintenance Management Predictive Analytics Use Case - Smarten
Machine Maintenance Management Predictive Analytics Use Case - Smarten
 
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - SmartenPredictive Analytics Using External Data Augmented Analytics Use Case - Smarten
Predictive Analytics Using External Data Augmented Analytics Use Case - Smarten
 
Marketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - SmartenMarketing Optimization Augmented Analytics Use Cases - Smarten
Marketing Optimization Augmented Analytics Use Cases - Smarten
 
Human Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - SmartenHuman Resource Attrition Augmented Analytics Use Case - Smarten
Human Resource Attrition Augmented Analytics Use Case - Smarten
 
Customer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - SmartenCustomer Targeting Augmented Analytics Use Case - Smarten
Customer Targeting Augmented Analytics Use Case - Smarten
 
What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?What is KNN Classification and How Can This Analysis Help an Enterprise?
What is KNN Classification and How Can This Analysis Help an Enterprise?
 
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
 
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...What is the Independent Samples T Test Method of Analysis and How Can it Bene...
What is the Independent Samples T Test Method of Analysis and How Can it Bene...
 
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
What Are Simple Random Sampling and Stratified Random Sampling Analytical Tec...
 
What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...What is Binary Logistic Regression Classification and How is it Used in Analy...
What is Binary Logistic Regression Classification and How is it Used in Analy...
 

Recently uploaded

Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 

Recently uploaded (20)

2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva2.pdf Ejercicios de programación competitiva
2.pdf Ejercicios de programación competitiva
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 

What is the Decision Tree Analysis and How Does it Help a Business to Analyze Data?

  • 1. Master the Art of Analytics A Simplistic Explainer Series For Citizen Data Scientists Journey Towards Augmented Analytics
  • 3. Terminologies Introduction & Example Standard input/tuning parameters & Sample UI Sample output UI Interpretation of Output Limitations Business use cases What Are All Covered
  • 5. Terminologies  Decision tree : It’s a powerful and popular tool for classification and prediction in form of a tree structure  Predictors and Target variable :  Target variable usually denoted by Y , is the variable being predicted and is also called dependent variable, output variable, response variable or outcome variable (Ex : One highlighted in red box in image below)  Predictor, sometimes called an independent variable, is a variable that is being used to predict the target variable ( Ex : variables highlighted in green below )  Here the predictors highlighted in green box above which consist of wine attributes are used to predict the target variable that is Quality of a wine (labeled as Quality_category) highlighted in red box above
  • 6. Leaf Node : Terminal node in a decision tree where there are no further splits Interior Node : The non leaf nodes. Also called decision nodes Splitting : It is a process of dividing a node into two or more sub- nodes. Root Node : The top most node in a tree Terminologies
  • 7. Terminologies Each internal (non-leaf) node denotes a test on a feature/predictor Each leaf represents a value of the target variable /class label given the values of the input variables represented by the path from the root to the leaf Each branch represents the outcome of a test
  • 9. Types of Decision Tree Examples Based on the historical data related to credit card payments , loan payments , delinquency rate , outstanding balance we want to classify/divide the customers into defaulters and non defaulters. To access the characteristics of a customer such as his purchase frequency, income , age, type of bank account, occupation etc. that leads to purchase/non purchase of a particular banking product such as installment loan , personal loan, checking account etc. Here classification tree will classify the customers into purchasers and non purchasers There are two basic types of decision tree : • Classification • Regression Classification trees are needed when target variable is categorical and as the name implies are used to classify/divide the data into these predefined categories of a target variable.
  • 10. Types of Decision Tree •Based on customer’ past behavioral data on a retail website such as days from last purchase, brand preference, income , age , gender, website visits , location, total amount of purchase so far etc., if we want to predict the purchase amount by each customer then regression trees are useful (Here the target variable would be purchase amount) Examples : Regression trees are needed when the target variable is numeric
  • 11. Types of Decision Tree Similarly, regression tree can also be used to identify the market segment who is more likely to respond to a future mailing. For instance the segments (green box in image below) having response rate higher than overall response rate can be targeted first as they will require little effort to obtain. Where as different marketing strategy needs to be devised for lower segments (segments having response rate less than overall - red box in image below)
  • 12. Classification Tree Let’s say we have only two predictors : level of Alcohol and free sulfur dioxide in a wine and we want to predict if a wine quality (target variable) will be High or Low •Since the target variable wine quality here contains categorical values (High & low) , the classification method will be applicable here as the predictors will be classifying the data into high & low. •Decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous/pure sub-nodes. •For example, if the target can be either yes or no (will or will not increase spending), the objective is to produce nodes where most of the cases will increase spending or most of the cases will not increase spending.
  • 13. How Does A Tree Decide Where To Split This is where decision tree helps, it will segregate the students based on all values of three variables and identify the variable, which creates the best homogeneous sets of students. Now, I want to create a model to predict who will play cricket during leisure period? 15 out of these 30 play cricket in leisure time. Let’s say we have a sample of 30 students with three variables Gender (Boy/ Girl), Class (IX/ X) and Height (5 to 6 feet).
  • 14. Homogeneous Nodes In the snapshot below, you can see that variable Gender is able to identify best homogeneous sets compared to the other two variables. Here most of the cases (65%) play cricketHere most of the cases (80%) Don’t play cricket Hence Homogeneous
  • 15. How To Interpret The Classification Tree Output In our data if it turns out that most of the wines containing alcohol level <11 turned out to be of low quality (hence homogeneous) then the first split happens based on Alcohol level and it becomes a top node in the tree Total number of cases with prediction = low for wine quality Total number of cases with prediction = high for wine quality
  • 16. How To Interpret The Classification Tree Output Further , if alcohol >=12 then it classifies the wine to be of high quality else low quality (As seen in red box in image below) The cases/records falling in high quality are further tested with free sulfur dioxide level. If free sulfur dioxide is >=28 and alcohol is also >=12 then such wines are classified to be of High quality. (As seen in green box in image below) But wines with alcohol >=12 but not having free sulfur dioxide >=28 are classified to be of low quality. (As seen in blue box in image below)
  • 17. Method: Regression Regression-type trees are generally applicable where we attempt to predict the values of a numeric target variable from one or more numeric and/or categorical predictor variables. For example, we may want to predict the quality of wine (a numeric target variable) from various other predictors such as volatile acidity in wine , alcohol level in wine etc. In this case the leaf nodes will contain predicted wine quality based on wine attributes such as alcohol and volatility as shown below.
  • 18. Method : Regression Let’s take an example of predicting the wine quality on a scale of 3 to 10 based on predictors such as alcohol level , free sulfur dioxide level , volatility etc.
  • 19. Method : Regression As seen in the red box in image below , the first split is again based on alcohol level as we observed in the output of classification tree example. The similar type of pattern is shown up here wherein the quality is predicted to be high in case of free sulfur >=24 and alcohol >=12 (Blue box in image below) Additional pattern observed here is that the wine quality is also dependent on volatility level. Quality is high in case of volatility level <0.21 (Purple box below)
  • 21. Max Depth It sets the maximum depth of any node of the final tree, with the root node counted as depth 0. Lesser this number, lesser the length of tree. For instance, setting max depth=2 while generating a classification tree to predict wine quality will lead to output as shown below: Depth 1 Depth 2 Depth 0
  • 22. Depth 1 Depth 2 Depth 3 Depth 0Max Depth Similarly , setting max depth = 3 will give following output : Hence, higher the max depth, lengthier the final tree. Lengthier trees are generally not reliable as they tend to have nodes with very less records so the tree would have poor generalizability
  • 23. Final Tuning Parameters Along With UI Suggestions
  • 24. Input Wizard Sample For Selecting Target Variables And Predictors Predictors and target variable should be selected using input wizard as shown below Select the variables you want to use for prediction of selected target variable Purchase frequency Age Gender Income Website visits Select the variable you would like to predict (Target variable): Purchase frequency Age Gender Income Website visits Purchase 1 2
  • 25. 3 4 High Age Medium Low <= 18 18 to 25 >=25 Male Female Gender Method Impurity Max Depth # Classes in target variable Classification Gini Two Categorical predictors Purchase Frequency Age Gender Select the classes to include Purchase frequency Tuning parameters Note : Tuning parameters are explained in next section Assuming the target variable contains yes/no values Input Wizard Sample For Categorical Predictors’ Class Selection & Tuning Parameters
  • 27. Please note : Spark expects to give input for these parameters instead of auto detection Sample Output Formats 1.Method When target variable type is numeric , regression should be auto selected and in case of categorical target variable type , classification should be auto selected Input control type : Static label 2.Impurity If method=classification then impurity should be set to gini automatically If method=regression then impurity should be set to variance automatically Input control type : Static label 3.Categorical predictors info Categorical predictors and their class values should be auto detected Input control type for categorical predictors list : Static label Input control type for classes selection : Multiple checkbox buttons 4. Max depth : Input control type : Editable slider with numeric value label (Suggested value : 3 to 5) 5. Number of classes (Only for method = classification) : This value is based on total number of classes present in the target variable. For example in wine quality classification case , the total classes of wine quality are two : high & low Input control type : Static label
  • 33. Limitations Of Decision Tree Frequent changes to the data lead to substantial differences in the output , hence decision tree should not be applied on data which is fluctuating significantly. There has to be predefined classes for target variable (The categories to which each record belongs for classification tree) in the dataset. Decision trees are prone to errors in classification problems with target variable containing many classes and training dataset containing relatively small number of records. • Hence total records in a training dataset must be large in proportion to the total classes of a target variable(There is no thumb rule on how much larger the size of records should be compared to target variable classes)
  • 34. Business Problem 1 – Classification tree • Which customer segments should be targeted for increasing the subscription rate of a term deposit product • In this case, the classification tree can be used to access the characteristics of customers that lead to subscription / non subscription of a term deposit product targeted in direct marketing campaign • Here the target variable would be the column of whether the customer that was called during a direct marketing campaign , subscribed to a term deposit product (“yes” if subscribed else “no”)
  • 35. The Dataset • Let’s say we have following customer attributes : oAge oJob type oMarital status oEducation oAccount default status oLoan status oContact type oOutcome of previous contact Target variable Predictors As shown above, we want to classify customer attributes such as Age (numeric predictor), prior loan status (categorical predictor) , marital status (categorical predictor) etc. into subscribers and non subscribers of term deposit product (target variable classes)
  • 36. Bar plot in leaf nodes show break up of yes and no classes in the node with 0 to 1 scale in right side of the bar plot indicating percentage of yes and no in that node and n showing number of records belonging to that leaf node Output Tree 1
  • 37. Interpretation of tree output As per the tree output, loan status came out to be the best predictor for term deposit product purchase Customers with prior loan and marital status : “married” outperforms all other segments (Highlighted through blue dashed line) Also the customers with no prior loans and age > 60 has the second highest propensity to purchase term deposit product (Highlighted through green dashed line) Moreover , within the segment with no prior loans , the singles with age <=22 seem to be out performing age >22 segment in terms of term deposit product purchase (Highlighted through black dashed line)
  • 38. How Splits And Terminal Nodes Are Generated Term deposit purchased : Yes Term deposit purchased : No Total records (%) Prior Loan : Yes 10% 6% 16% Prior Loan : No 14% 70% 84% Marital Status No Yes Total records (%) Divorced 13% 26% 39% Married 21% 19% 40% Single 6% 15% 21% Decision tree chooses the predictor most predictive of the target class Here in our case , most of the records (84% of records in dataset) contain prior loan status : no and only 16% have loan status : Yes Within loan status : no , 70% population don’t purchase term deposit. None of the other predictors have such homogeneity with respect to term deposit purchase For example marital status categories breakup is as follows : Thus due to relatively low homogeneity of other variables such as marital status, loan status was chosen as an attribute to create the first split Similarly the sub nodes’ split happen using same homogeneity criteria Terminal nodes are those nodes which can’t be split further due to the stopping criteria such as max depth (when maximum depth defined is 3, then node splitting stops happening when tree depth =3 is achieved and last generated nodes become the terminal nodes )
  • 39. Output 2 : Accuracy Of Prediction No Yes No 38,439 6,772 Yes 0 0 Actualclasses Predicted classes Accuracy = sum of boxes highlighted in red / all boxes = 38,439/(38,439+6,772+0+0) = 85.02% Hence the sample decision tree model we just built is 85% accurate and there is 15% chance of error here Actual versus predicted table shows how many classes are predicted correctly by decision tree as shown below :
  • 40. Business Benefits The segments highlighted in black , blue and green in tree output 1 are the low hanging fruits requiring less efforts to obtain so no need to devise a different target marketing strategy for these segments The segments having highest number of “No's” (which are not highlighted in tree output 1) need to be targeted in a different and more efficient way to convert them into purchasers. For example customers with marital status : single/divorced. Thus segmenting customers based on their propensity to buy/not buy a product can aid in devising better and efficient target marketing strategy in order to convert more non purchasers into purchasers and in turn increasing the product penetration.
  • 41. Use Case 2 – Classification Tree Business benefit: •Bank can decide on which customer segments are eligible for any type of loan versus which customer segments should be denied any loan as they are likely to default. •This way risker customers are identified easily and bank can avert the risk of delinquencies Business problem : Based on the historical customer attributes such as his/her credit card payments ,loan payments ,outstanding balance etc. a bank needs to classify the customers into defaulters and non defaulters •In this case, the classification tree can be used to access the characteristics of customers that are likely to default •Here the target variable would be a column of whether customer has defaulted previously or not (“yes” if defaulted else “no”)
  • 42. Use Case 3 – Regression Tree Business benefit: • Online retailers can identify the customer segments which have higher capacity to purchase and can design special marketing strategy for such segments as these segments are their main revenue drivers. • This way premium customers can be given special attention to retain their loyalty and in turn revenue can be increased. Business problem Based on customers’ attributes and past online shopping behavioral data, an online retail giant such as Amazon/Flipkart wants to predict the future purchase amount of customers • Here predictors can be customer's ‘days from last purchase’, ‘brand preference’, ‘income’ , ‘age’ , ‘gender’, ‘website visits’ , ‘location’, ‘total amount of purchase so far’ etc., • As the target variable is numeric (purchase amount) , regression tree can be used to predict the purchase amount by different type of customer segments.
  • 43. Use case 4 – Regression Tree Business benefit: • As soon as the new order arrives , a service provider can provide estimated completion time to a customer based on the general pattern observed through regression tree model • Proper workforce allocation and planning • Avoiding revenue leakage through prevention of delay fine Business problem : • Predicting order completion time for telecom service provider • Predictors in this case can be : user location , work force availability, distance from nearest network junction, average time taken in last 6 months , average historical delay in last 6 months etc. • Target variable here would be turn around time of order completion
  • 44. Want to Learn More? Get in touch with us @ support@Smarten.com And Do Checkout the Learning section on Smarten.com June 2018

Editor's Notes

  1. http://www.dmstat1.com/res/MarketSegmentationWithCHAID.html
  2. https://www.analyticsvidhya.com/blog/2016/04/complete-tutorial-tree-based-modeling-scratch-in-python/#three