Machine learning
Topic:-Decision Tree
PRESENTED BY:
Reshma
(23285A0512)
Contents
•What is a Decision Tree?
•Types of Decision Trees
•Key Terminology
•Splitting Criteria
•Pruning Techniques
•Decision Tree Algorithms
•Advantages
•Disadvantages
•Applications
What is a Decision Tree?
• A tree-like model used in machine learning to make decisions.
• Composed of:
• Internal nodes = Features
• Branches = Decision rules
• Leaves = Final output (labels or values)
 Decision trees are intuitive supervised learning models. They work for
classification and regression by splitting data based on features. The goal
is to make accurate predictions through a series of simple decisions.
Types of Decision Trees
• Classification Tree: Predicts categorical outputs (e.g., Yes/No).
• Regression Tree: Predicts continuous outputs (e.g., Price).
• Binary Splits: Two branches per node.
• Multi-way Splits: More than two branches (used in some algorithms).
Key Terminology
 Root Node
The starting point of the tree representing the entire dataset. It’s where the first split happens.
 Leaf Node
The end point of a branch with no further splits. It gives the final prediction or class.
 Splitting
Dividing a node into smaller sub-nodes based on a feature to increase purity.
 Branch / Sub-Tree
A path or section of the tree connecting nodes, representing decision outcomes.
 Pruning
Removing unnecessary branches to reduce overfitting and simplify the tree.
 Parent Node / Child Node
A parent node splits into child nodes; child nodes come directly from a parent node.
Splitting Criteria
Pruning Techniques
 Pruning helps reduce overfitting by removing unnecessary branches from the decision
tree.
Pre-pruning
• Stops the tree from growing too deep based on conditions like maximum depth or
minimum number of samples.
• It's faster but can lead to underfitting.
Post-pruning
• First builds a full tree, then removes branches that do not improve accuracy on a
validation set.
• More accurate but takes more time.
Decision Tree Algorithms
•ID3
Uses Information Gain for splits. Works with categorical data. Prone to
overfitting.
•C4.5
Improved version of ID3. Supports continuous and categorical data. Uses
Gain Ratio and pruning.
•CART
Uses Gini Index. Builds binary trees. Supports both classification and
regression. Prunes trees to avoid overfitting.
•CHAID
Uses Chi-square tests. Handles multi-way splits. Best for categorical data.
Advantages
• Easy to understand and interpret.
• Minimal data preparation (no scaling/normalization).
• Works with both numerical and categorical data.
• Handles missing values and outliers well.
Disadvantages
• High risk of overfitting (especially with deep trees).
• Unstable: Small changes in data can drastically change the tree.
• Greedy algorithm: Doesn't guarantee the globally optimal tree.
• Biased toward features with many levels.
Applications
• Customer Segmentation: Identify customer types for targeting.
• Credit Scoring: Predict likelihood of default.
• Medical Diagnosis: Predict diseases from symptoms.
• Spam Detection: Classify emails as spam or not.
• Fraud Detection: Identify suspicious transactions.
Thank you

23-512(Decision Tree) machine learning ppt

  • 1.
  • 2.
    Contents •What is aDecision Tree? •Types of Decision Trees •Key Terminology •Splitting Criteria •Pruning Techniques •Decision Tree Algorithms •Advantages •Disadvantages •Applications
  • 3.
    What is aDecision Tree? • A tree-like model used in machine learning to make decisions. • Composed of: • Internal nodes = Features • Branches = Decision rules • Leaves = Final output (labels or values)  Decision trees are intuitive supervised learning models. They work for classification and regression by splitting data based on features. The goal is to make accurate predictions through a series of simple decisions.
  • 4.
    Types of DecisionTrees • Classification Tree: Predicts categorical outputs (e.g., Yes/No). • Regression Tree: Predicts continuous outputs (e.g., Price). • Binary Splits: Two branches per node. • Multi-way Splits: More than two branches (used in some algorithms).
  • 5.
    Key Terminology  RootNode The starting point of the tree representing the entire dataset. It’s where the first split happens.  Leaf Node The end point of a branch with no further splits. It gives the final prediction or class.  Splitting Dividing a node into smaller sub-nodes based on a feature to increase purity.  Branch / Sub-Tree A path or section of the tree connecting nodes, representing decision outcomes.  Pruning Removing unnecessary branches to reduce overfitting and simplify the tree.  Parent Node / Child Node A parent node splits into child nodes; child nodes come directly from a parent node.
  • 6.
  • 7.
    Pruning Techniques  Pruninghelps reduce overfitting by removing unnecessary branches from the decision tree. Pre-pruning • Stops the tree from growing too deep based on conditions like maximum depth or minimum number of samples. • It's faster but can lead to underfitting. Post-pruning • First builds a full tree, then removes branches that do not improve accuracy on a validation set. • More accurate but takes more time.
  • 8.
    Decision Tree Algorithms •ID3 UsesInformation Gain for splits. Works with categorical data. Prone to overfitting. •C4.5 Improved version of ID3. Supports continuous and categorical data. Uses Gain Ratio and pruning. •CART Uses Gini Index. Builds binary trees. Supports both classification and regression. Prunes trees to avoid overfitting. •CHAID Uses Chi-square tests. Handles multi-way splits. Best for categorical data.
  • 9.
    Advantages • Easy tounderstand and interpret. • Minimal data preparation (no scaling/normalization). • Works with both numerical and categorical data. • Handles missing values and outliers well.
  • 10.
    Disadvantages • High riskof overfitting (especially with deep trees). • Unstable: Small changes in data can drastically change the tree. • Greedy algorithm: Doesn't guarantee the globally optimal tree. • Biased toward features with many levels.
  • 11.
    Applications • Customer Segmentation:Identify customer types for targeting. • Credit Scoring: Predict likelihood of default. • Medical Diagnosis: Predict diseases from symptoms. • Spam Detection: Classify emails as spam or not. • Fraud Detection: Identify suspicious transactions.
  • 12.