• Save
Building a decision tree from decision stumps
Upcoming SlideShare
Loading in...5
×
 

Building a decision tree from decision stumps

on

  • 62 views

 

Statistics

Views

Total Views
62
Views on SlideShare
61
Embed Views
1

Actions

Likes
0
Downloads
0
Comments
0

1 Embed 1

https://www.linkedin.com 1

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Single layer decision tree <br /> Often used in large sample segmentation <br /> Also used to do simple prediction in small sample <br /> Easy to manage in terms of coding
  • Pre-summarization <br /> Calculate the gini impurity <br /> Selecting the split <br />
  • Iterative calling of the decision stumps to build a tree <br />

Building a decision tree from decision stumps Building a decision tree from decision stumps Presentation Transcript

  • Murphy Choy, University College Dublin Building a decision tree from decision stumps
  • Contents •Introduction to decision trees •What is a decision tree stump? •CART VS CHAID •Criterion for splitting •Building a decision tree stump macro •Linking the tree up •Conclusion
  • Introduction to Decision tree
  • Decision tree Stump
  • CART VS CHAID •Easier to understand splits oBinary splits are easier to understand oCan be phrased as an either or statement •Able to handle different data types oCART is able to handle nominal, categorical and missing values simultaneously unlike CHAID.
  • CART VS CHAID •More robust statistics oCHAID uses chi square test which is size dependent and suffers from multiple comparison test deficiency. oBenferroni adjustment does not fully compensate for the deficiency. •Less dispersion effects oMultiple splits in a single node results in smaller subsequent nodes that may cause severe skewness in validation.
  • Splitting criterion •Gini impurity is the measure of how frequently a randomly chosen element from a set is incorrectly labeled if it were labeled randomly according to the distribution of labels in the subset.
  • Building the Decision tree stump SAS Macro Gini Gini Gini Selection
  • Building the linkage for a tree
  • Conclusion •Useful for a variety of purposes •Build a full decision tree