Data Mining
Decision Tree Induction
By,
Dharshini N
MCA-I
20PMC03
Tree Pruning
O A decision tree is build many of the branches
will reflect anomalies in the training data due
to noise or outlier.
O Tree pruning methods address this problem of
overfitting the data.
O Such methods typically use statistical measure
to remove the least reliable branches.
O In the prepruning approach a tree is “pruned” by halting
its construction early.
O The second and more common approach is Postpruning
which removes subtrees from a “fully grown ”tree.
O A subtree at a given node is pruned by removing its
branches and replacing it with a leaf.
O The cost complexity pruning algorithm used in CART
is an example of the postprunning approach.
O A pruning set of class-labeled tuples is used to estimate
cost complexity.
O This set is independent of the training set used to build
the unpruned tree and of any test set used for accuracy
estimation.
O A method called Pessimistic pruning ,which is similar to
the cost complexity is preferred.
O The best pruned tree is the one that minimizes the number
of encoding bits.
Scalability and Decision Tree
Induction
O In data mining applications very large training
sets of millions of tuples are common.
O Decision tree construction becomes inefficient
due to swapping of the training tuples in and
out of main and cache memories.
O More scalable approaches, capable of handling
training data that are too large to fit in
memory , required.
O BOAT(Bootstrapped Optimistic Algorithm For Tree
Construction) is a decision tree algorithm that takes a
completely different approach to scalability.
O It is not based on the use of any special data structures.
O Instead, it uses a statistical technique known as boost
strapping.
☺THANK YOU☺

Data mining

  • 1.
    Data Mining Decision TreeInduction By, Dharshini N MCA-I 20PMC03
  • 2.
    Tree Pruning O Adecision tree is build many of the branches will reflect anomalies in the training data due to noise or outlier. O Tree pruning methods address this problem of overfitting the data. O Such methods typically use statistical measure to remove the least reliable branches.
  • 3.
    O In theprepruning approach a tree is “pruned” by halting its construction early. O The second and more common approach is Postpruning which removes subtrees from a “fully grown ”tree. O A subtree at a given node is pruned by removing its branches and replacing it with a leaf. O The cost complexity pruning algorithm used in CART is an example of the postprunning approach.
  • 4.
    O A pruningset of class-labeled tuples is used to estimate cost complexity. O This set is independent of the training set used to build the unpruned tree and of any test set used for accuracy estimation. O A method called Pessimistic pruning ,which is similar to the cost complexity is preferred. O The best pruned tree is the one that minimizes the number of encoding bits.
  • 8.
    Scalability and DecisionTree Induction O In data mining applications very large training sets of millions of tuples are common. O Decision tree construction becomes inefficient due to swapping of the training tuples in and out of main and cache memories. O More scalable approaches, capable of handling training data that are too large to fit in memory , required.
  • 9.
    O BOAT(Bootstrapped OptimisticAlgorithm For Tree Construction) is a decision tree algorithm that takes a completely different approach to scalability. O It is not based on the use of any special data structures. O Instead, it uses a statistical technique known as boost strapping.
  • 11.