2. Tree Pruning
O A decision tree is build many of the branches
will reflect anomalies in the training data due
to noise or outlier.
O Tree pruning methods address this problem of
overfitting the data.
O Such methods typically use statistical measure
to remove the least reliable branches.
3. O In the prepruning approach a tree is “pruned” by halting
its construction early.
O The second and more common approach is Postpruning
which removes subtrees from a “fully grown ”tree.
O A subtree at a given node is pruned by removing its
branches and replacing it with a leaf.
O The cost complexity pruning algorithm used in CART
is an example of the postprunning approach.
4. O A pruning set of class-labeled tuples is used to estimate
cost complexity.
O This set is independent of the training set used to build
the unpruned tree and of any test set used for accuracy
estimation.
O A method called Pessimistic pruning ,which is similar to
the cost complexity is preferred.
O The best pruned tree is the one that minimizes the number
of encoding bits.
5.
6.
7.
8. Scalability and Decision Tree
Induction
O In data mining applications very large training
sets of millions of tuples are common.
O Decision tree construction becomes inefficient
due to swapping of the training tuples in and
out of main and cache memories.
O More scalable approaches, capable of handling
training data that are too large to fit in
memory , required.
9. O BOAT(Bootstrapped Optimistic Algorithm For Tree
Construction) is a decision tree algorithm that takes a
completely different approach to scalability.
O It is not based on the use of any special data structures.
O Instead, it uses a statistical technique known as boost
strapping.