Building a decision tree from decision stumps

Murphy Choy, University College Dublin
Building a decision tree
from decision stumps

Contents
•Introduction to decision trees
•What is a decision tree stump?
•CART VS CHAID
•Criterion for splitting
•Building a decision tree stump macro
•Linking the tree up
•Conclusion

CART VS CHAID
•Easier to understand splits
oBinary splits are easier to understand
oCan be phrased as an either or statement
•Able to handle different data types
oCART is able to handle nominal, categorical and
missing values simultaneously unlike CHAID.

CART VS CHAID
•More robust statistics
oCHAID uses chi square test which is size dependent
and suffers from multiple comparison test deficiency.
oBenferroni adjustment does not fully compensate for the
deficiency.
•Less dispersion effects
oMultiple splits in a single node results in smaller
subsequent nodes that may cause severe skewness in
validation.

Splitting criterion
•Gini impurity is the measure of how frequently a
randomly chosen element from a set is
incorrectly labeled if it were labeled randomly
according to the distribution of labels in the
subset.

Building the Decision tree stump SAS
Macro
Gini
Gini
Gini
Selection

Building the linkage for a tree

Conclusion
•Useful for a variety of purposes
•Build a full decision tree

Building a decision tree from decision stumps

More Related Content

Similar to Building a decision tree from decision stumps

More from Murphy Choy

Recently uploaded

Building a decision tree from decision stumps

Editor's Notes