The document discusses decision trees and their use for classification problems. It covers:
1) The agenda includes an overview of decision trees, how they work, and a demonstration on a banking customer classification problem.
2) Decision trees can be used to classify banking customers as "Risky" or "Good" based on attributes like income, marital status, etc.
3) The document demonstrates how a decision tree is built on sample customer data and how a new customer can then be classified.
2. Slide 2Slide 2Slide 2 www.edureka.co/r-for-analytics
Today we will take you through the following:
The Classic Banking Challenge !!.. Have you already guessed it??
The Available Options for Solution
Why Decision Tree?
How Decision Tree Methodology Works ?
Agenda
4. Slide 4Slide 4Slide 4 www.edureka.co/r-for-analytics
A bank wants to classify its future customers into two categories “Risky” and “Good” based on customer’s
available attributes.
Let’s say a customer xyz has the following attributes. How will the bank know to which category this
customer belong.
Undergrad Marital Status Taxable Income City Population Work Experience (Yrs) Urban Category
No Married 98,727 1,01,894 14 NO ????
The Problem?
5. Slide 5Slide 5Slide 5 www.edureka.co/r-for-analytics
# A manager has to decide whether he should hire more
human resources or not in order to optimize the work load
balance
# An individual has to make a decision such as whether or not
to undertake a capital project, or must chose between two
competing ventures
Let See Few More Cases..
6. Slide 6Slide 6Slide 6 www.edureka.co/r-for-analytics
The Available Solution Options…
7. Slide 7Slide 7Slide 7 www.edureka.co/r-for-analytics
Algorithms that can help..
Such type of problems comes under “classification”
It is the separation or ordering of objects into classes
There are few techniques in classification method, like:
Decision Tree
Naïve Bayes
k-Nearest Neighbor
Support Vector Machine etc..
8. Slide 8Slide 8Slide 8 www.edureka.co/r-for-analytics
Why Decision Tree is Favorable..?
DT NB KNN SVM
Simple visual representation of a decision situation YES NO NO NO
Easy to interpret and explain to executives (Non-programmers)! YES NO NO NO
Illustrates a variety of decisions and also the impact of each decision if different
decisions were to be taken
YES NO NO NO
Allow us to predict, explain, describe, or classify an outcome altogether YES NO NO NO
Help determine worst, best and expected values for different scenarios YES NO NO NO
Able to handle both numerical and categorical data YES NO NO NO
Advantages of Decision Tree Methodology
Decision Tree (DT)
Naïve Bayes (NB)
k-Nearest Neighbor (KNN)
Support Vector Machine (SVM)
9. Slide 9Slide 9Slide 9 www.edureka.co/r-for-analytics
Decision Trees are
"white boxes" : The acquired knowledge can be expressed in a readable form,
while KNN,SVM,NB are
“black boxes”, :You cannot read the acquired knowledge in a comprehensible way
e.g. To Explain a suitable Weather Condition for Playing in Decision Tree format..
If weather is nice and wind is normal and the day is sunny then only play ( *Readable Format)
Decision Tree Advantages..
Easy to interpret and explain to executives (Non-programmers)!
Cond. 1 Cond. 2 Cond. 3
10. Slide 10Slide 10Slide 10 www.edureka.co/r-for-analytics
If stays for 6 months with the company
If doesn’t stay for 6 months with the company
If stays for 6 months with the company
If doesn’t stay for 6 months with the company
With 50 % success, $100
With 50 % fail, -$40
With 50 % success, $90
With 50 % fail, -$20
50%($100) – 50% ($40) = $30
50%($90) – 50% ($20) = $35
Permanent
Outsourced
Manager
Illustrates a variety of decisions and also the impact of each decision if different decisions were to be taken
Decision Tree Advantages..contd
11. Slide 11Slide 11Slide 11 www.edureka.co/r-for-analytics
Let’s Understand it More
… What is Decision Tree?
12. Slide 12Slide 12Slide 12 www.edureka.co/r-for-analytics
Decision Tree
Decision Tree is a supervised rule based classification
* * During tree construction, attribute selection measures are used to select the attribute which best partitions the tuples into distinct classes
Classification
rule
Flowchart - like tree structure
The topmost node in a tree is the root node
Each internal node denotes a test on an attribute, e.g. whether a
coin flip comes up heads or tails
Each branch represents an outcome of the test
Each leaf node holds a class label (decision taken after computing
all attributes)
Paths from root to leaf represents classification rules
13. Slide 13Slide 13Slide 13 www.edureka.co/r-for-analytics
Under
-grad
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urban Category
Yes Married 98,727 1,01,894 14 NO Risky
No Single 44,000 10,18,945 12 YES Good
No Divorced 50,000 10,15,845 14 YES Good
No Single 32,100 12,58,945 12 NO Risky
Yes Married 28,000 1,22,945 8 YES Risky
No Single 35,100 12,56,845 10 NO Good
No Divorced 38,100 18,95,945 7 NO Good
Under
-grad
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urban Category
Yes Divorced 98,727 1,01,894 14 NO ????
DT Can Be used With Machine Learning
When Coupled with Machine Learning, Decision Tree can be used for Prediction
14. Slide 14Slide 14Slide 14 www.edureka.co/r-for-analytics
How it Works..
Let’s Build a Decision Tree Model !
15. Slide 15Slide 15Slide 15 www.edureka.co/r-for-analytics
Train Model (Build Tree)
Undergrad
4 Good / 3 Risky
No Yes
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urban category
Single 44,000 1,01,8945 12 YES Good
Divorced 50,000 1,01,5845 14 YES Good
Single 32,100 1,25,8945 12 NO Risky
Single 35,100 1,25,6845 10 NO Good
Divorced 38,100 1,89,5945 7 NO Good
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urban category
Married 98,727 1,01,894 14 NO Risky
Married 28,000 1,22,945 8 YES Risky
16. Slide 16Slide 16Slide 16 www.edureka.co/r-for-analytics
Train Model (Build Tree)
Undergrad
4 Good / 3 Risky
No Yes
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urban category
Single 44,000 1,01,8945 12 YES Good
Divorced 50,000 1,01,5845 14 YES Good
Single 32,100 1,25,8945 12 NO Risky
Single 35,100 1,25,6845 10 NO Good
Divorced 38,100 1,89,5945 7 NO Good
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urban category
Married 98,727 1,01,894 14 NO Risky
Married 28,000 1,22,945 8 YES Risky
2 Risky
Pure Subset
4 Good/1 Risky
Split Further
17. Slide 17Slide 17Slide 17 www.edureka.co/r-for-analytics
Train Model (Build Tree)
Undergrad
4 Good / 1 Risky
No Yes
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urban category
Married 98,727 1,01,894 14 NO Risky
Married 28,000 1,22,945 8 YES Risky
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urban category
50,000 1,01,5845 14 YES Good
38,100 1,89,5945 7 NO Good
Single Divorced
Taxable
Income
City
Population
Work
Experience
Urban category
44,000 1,01,8945 12 YES Good
32,100 1,25,8945 12 NO Risky
35,100 1,25,6845 10 NO Good
4 Good / 3 Risky
2 Risky
Pure Subset
18. Slide 18Slide 18Slide 18 www.edureka.co/r-for-analytics
Train Model (Build Tree)
Undergrad
4 Good / 1 Risky
No Yes
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urban category
Married 98,727 1,01,894 14 NO Risky
Married 28,000 1,22,945 8 YES Risky
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urban category
50,000 1,01,5845 14 YES Good
38,100 1,89,5945 7 NO Good
Single Divorced
Taxable
Income
City
Population
Work
Experience
Urban category
44,000 1,01,8945 12 YES Good
32,100 1,25,8945 12 NO Risky
35,100 1,25,6845 10 NO Good
2 Good/1 Risky
Split Further
4 Good / 3 Risky
2 Risky
Pure Subset
2 Good
Pure Subset
19. Slide 19Slide 19Slide 19 www.edureka.co/r-for-analytics
Train Model (Build Tree)
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urban category
50,000 1,01,5845 14 YES Good
38,100 1,89,5945 7 NO Good
Single Divorced
2 Good
Pure Subset
Taxable
Income
< 33000 >33000
Taxable
Income
City
Population
Work
Experience
Urban category
32,100 1,25,8945 12 NO Risky
Taxable
Income
City
Population
Work
Experience
Urban category
44,000 1,01,8945 12 YES Good
35,100 1,25,6845 10 NO Good
4 Good / 1 Risky
2 Good / 1 Risky
20. Slide 20Slide 20Slide 20 www.edureka.co/r-for-analytics
Train Model (Build Tree)
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urban category
50,000 1,01,5845 14 YES Good
38,100 1,89,5945 7 NO Good
Single Divorced
2 Good
Pure Subset
Taxable
Income
< 33000 >33000
Taxable
Income
City
Population
Work
Experience
Urban category
32,100 1,25,8945 12 NO Risky
Taxable
Income
City
Population
Work
Experience
Urban category
44,000 1,01,8945 12 YES Good
35,100 1,25,6845 10 NO Good
2 Good
Pure Subset
1 Risky
Pure Subset
4 Good / 1 Risky
2 Good / 1 Risky
21. Slide 21Slide 21Slide 21 www.edureka.co/r-for-analytics
Undergrad
Marital Status
Taxable
Income
GoodRisky
Risky
Good
Yes No
DivorcedSingle
< 33K > 33K
The Final Built Model
Here is a trained model that will help us in future
“Classification”
23. Slide 23Slide 23Slide 23 www.edureka.co/r-for-analytics
Test Data
Start from the root of tree
Under
-grad
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urba
n
category
No Divorced 98727 101894 14 NO ????
Undergrad
Marital Status
Taxable Income
GoodRisky
Risky
Good
Yes No
DivorcedSingle
< 33K > 33K
24. Slide 24Slide 24Slide 24 www.edureka.co/r-for-analytics
Test Data
How It Works
Under
-grad
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urba
n
category
No Divorced 98727 101894 14 NO ????
Undergrad
25. Slide 25Slide 25Slide 25 www.edureka.co/r-for-analytics
Test Data
No
How It Works
Under
-grad
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urba
n
category
No Divorced 98727 101894 14 NO ????
Undergrad
26. Slide 26Slide 26Slide 26 www.edureka.co/r-for-analytics
Test DataNo
How It Works
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urba
n
category
Divorced 98727 101894 14 NO ????
Undergrad
Marital Status
27. Slide 27Slide 27Slide 27 www.edureka.co/r-for-analytics
Test Data
How It Works
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urba
n
category
Divorced 98727 101894 14 NO ????
Divorced
No
Undergrad
Marital Status
28. Slide 28Slide 28Slide 28 www.edureka.co/r-for-analytics
Test Data
How It Works
Marital
Status
Taxable
Income
City
Population
Work
Experience
Urba
n
category
Divorced 98727 101894 14 NO Good
Divorced
Good
The customer lies in the “Good” Group and he can
be considered for all policies for a good customer
like more loans can be provided to him, his credit
card limit can be increased etc..
No
Undergrad
Marital Status
35. Slide 35Slide 35Slide 35 www.edureka.co/r-for-analytics
Conclusion!
When to Apply Decision Tree ??
Whenever you are making a future complex decision
When you have are just experimenting with the decisions and you want to evaluate
and visualize your decision and the impact
When you want to present your decision and its comparison with other decisions on
the same problem
36. Slide 36
Your feedback is vital for us, be it a compliment, a suggestion or a complaint. It helps us to make your
experience better!
Please spare few minutes to take the survey after the webinar.
Survey