Murphy Choy, University College Dublin
Building a decision tree
from decision stumps
Contents
•Introduction to decision trees
•What is a decision tree stump?
•CART VS CHAID
•Criterion for splitting
•Building...
Introduction to Decision tree
Decision tree Stump
CART VS CHAID
•Easier to understand splits
oBinary splits are easier to understand
oCan be phrased as an either or stateme...
CART VS CHAID
•More robust statistics
oCHAID uses chi square test which is size dependent
and suffers from multiple compar...
Splitting criterion
•Gini impurity is the measure of how frequently a
randomly chosen element from a set is
incorrectly la...
Building the Decision tree stump SAS
Macro
Gini
Gini
Gini
Selection
Building the linkage for a tree
Conclusion
•Useful for a variety of purposes
•Build a full decision tree
Upcoming SlideShare
Loading in …5
×

Building a decision tree from decision stumps

604 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
604
On SlideShare
0
From Embeds
0
Number of Embeds
12
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Single layer decision tree
    Often used in large sample segmentation
    Also used to do simple prediction in small sample
    Easy to manage in terms of coding
  • Pre-summarization
    Calculate the gini impurity
    Selecting the split
  • Iterative calling of the decision stumps to build a tree
  • Building a decision tree from decision stumps

    1. 1. Murphy Choy, University College Dublin Building a decision tree from decision stumps
    2. 2. Contents •Introduction to decision trees •What is a decision tree stump? •CART VS CHAID •Criterion for splitting •Building a decision tree stump macro •Linking the tree up •Conclusion
    3. 3. Introduction to Decision tree
    4. 4. Decision tree Stump
    5. 5. CART VS CHAID •Easier to understand splits oBinary splits are easier to understand oCan be phrased as an either or statement •Able to handle different data types oCART is able to handle nominal, categorical and missing values simultaneously unlike CHAID.
    6. 6. CART VS CHAID •More robust statistics oCHAID uses chi square test which is size dependent and suffers from multiple comparison test deficiency. oBenferroni adjustment does not fully compensate for the deficiency. •Less dispersion effects oMultiple splits in a single node results in smaller subsequent nodes that may cause severe skewness in validation.
    7. 7. Splitting criterion •Gini impurity is the measure of how frequently a randomly chosen element from a set is incorrectly labeled if it were labeled randomly according to the distribution of labels in the subset.
    8. 8. Building the Decision tree stump SAS Macro Gini Gini Gini Selection
    9. 9. Building the linkage for a tree
    10. 10. Conclusion •Useful for a variety of purposes •Build a full decision tree

    ×