Decision Tree
Algorithm
NAME – BOBBY KUMAR
SINGH
STREAM – BSC DATA
SCIENCE (3rd
SEM)
SUBJECT – MACHINE
LEARNING & ARTIFICIAL
INTELIGENCE (BDS 302)
ROLL NO – 31140523010
What is a Decision Tree?
A Decision Tree is a popular supervised machine learning algorithm used for both
classification and regression tasks. It is a flowchart-like tree structure where:
• Nodes represent features (attributes) or decisions.
• Branches represent the outcome of a decision or test.
• Leaves represent the final outcome or decision.
• In essence, it models decisions and their possible consequences as a tree of decisions. Each
branch of the tree represents a possible decision, outcome, or path to the final
classification or prediction.
How Decision Trees Work
1. Structure of a Decision Tree
• Root Node: The topmost node that represents the entire dataset or the initial feature.
• Decision Nodes: Nodes that represent tests or decisions based on features.
• Leaf Nodes: Nodes at the end of branches that provide the final decision or prediction.
• Branches: Connect nodes and represent the outcome of tests or decisions.
2. Building a Decision Tree
1.Select the Best Feature:
1. Classification Trees: The goal is to choose the feature that best separates the data into
different classes.
2. Regression Trees: The goal is to choose the feature that best reduces the variance of
the output.
2.Common Metrics for Selection:
1. Gini Impurity: Measures the impurity of a node. Lower Gini impurity indicates a better
split.
2. Information Gain: Measures how much information is gained by splitting the data
based on a feature. Higher information gain indicates a better split.
3. Variance Reduction: Measures how much the variance in the target variable is
reduced by splitting the data.
•Split the Data:
•Classification Trees: The data is split based on the chosen feature,
aiming to maximize the purity of the resulting subsets.
•Regression Trees: The data is split to minimize the variance within
each subset.
•Repeat:
•Apply the same process recursively to each resulting subset (i.e.,
create new nodes and splits) until one of the stopping criteria is met.
•Stopping Criteria:
•Maximum Depth: Limit the depth of the tree.
•Minimum Samples per Leaf: Require a minimum number of samples
in each leaf node.
•Minimum Information Gain: Stop splitting if the gain in information is
below a threshold.
• 3. Making Predictions
• Classification Trees: The class label of a new data point is determined by traversing
the tree from the root to a leaf node, following the decisions based on feature
values. The class label of the leaf node is assigned to the data point.
• Regression Trees: The prediction for a new data point is the average value of the
target variable in the leaf node where the data point falls.
• 4. Example
• Imagine a decision tree for classifying whether to play tennis based on weather
conditions:
• Root Node: Weather (Sunny, Overcast, Rainy)
• Decision Nodes:
• If weather is Sunny, check Humidity (High, Normal)
• If weather is Rainy, check Wind (Strong, Weak)
• Leaf Nodes:
• "Play" or "Don't Play" based on the outcomes of the decisions.
Visual Representation of a Simple Decision Tree:
[Weather]
/ | 
Sunny Overcast Rainy
/  / 
Humidity - Wind -
/  / 
High Normal Strong Weak
/  / 
Don't Play Play Don't Play Play
TYPES OF DESCISION TREE
1. Classification Trees:
1. Purpose: Used for categorical target variables where the goal is to assign labels or classes to data points.
2. Example Use Cases: Email spam detection, medical diagnosis, customer segmentation.
2. Metrics for Splitting:
1. Gini Impurity: Measures the likelihood of incorrectly classifying a randomly chosen element from the node.
2. Information Gain: Measures how much knowing the value of a feature improves classification.
3. Regression Trees:
1. Purpose: Used for continuous target variables where the goal is to predict a numerical value.
2. Example Use Cases: Predicting house prices, forecasting sales, estimating risk.
4. Metrics for Splitting:
1. Variance Reduction: Measures how much the variance in the target variable is reduced by the split.
2. Mean Squared Error (MSE): Measures the average of the squared differences between actual and predicted values.
5. Other Variants and Techniques:
1. CART (Classification and Regression Trees): A framework that supports both classification and regression tasks. Uses Gini impurity for classification and variance
reduction for regression.
2. ID3 (Iterative Dichotomiseer 3): An earlier algorithm for classification that uses information gain to select the best feature.
3. C4.5: An improvement over ID3, handling both continuous and categorical data and using gain ratio for feature selection.
4. CHAID (Chi-square Automatic Interaction Detector): Uses chi-square tests for selecting features and is suitable for categorical target variables.
• Example of a Decision Tree
• For a classification task like predicting whether a customer will buy a product based on age and income:
• Root Node: Age (Young, Middle-aged, Old)
• Decision Nodes:
• If Age is Young, check Income (Low, High)
• If Age is Middle-aged or Old, check other features like Employment Status
• Leaf Nodes:
• "Buy" or "Don't Buy" based on the decisions at the end of each branch.
1.Easy to Understand and Interpret:
1. Visual Representation: Decision trees are intuitive and easy to visualize, making them straightforward to interpret and explain to
non-technical stakeholders.
2. Feature Importance: The structure of the tree can highlight the most important features for decision-making.
2.No Need for Feature Scaling:
1. Decision trees do not require normalization or standardization of features, which simplifies preprocessing.
3.Handles Non-Linear Relationships:
1. Decision trees can capture non-linear relationships between features and the target variable without needing transformation or
complex models.
4.Versatile:
1. Classification and Regression: Decision trees can be used for both classification (categorical target) and regression (continuous
target) tasks.
5.Robust to Outliers:
1. Decision trees are relatively robust to outliers compared to some other algorithms because they make decisions based on splitting
criteria rather than absolute values.
6.Feature Selection:
1. Implicitly performs feature selection, as only the most important features are used to split the nodes.
7.Non-parametric:
1. No assumptions about the underlying data distribution, making them suitable for a wide range of problems.
8.Handles Mixed Data Types:
1. Can handle both numerical and categorical data, making them flexible in various contexts.
ADVANTAGES OF DECISION TREE
LIMITATION OF DESCISION TREE
•Overfitting:
•Complex Trees: Decision trees can create overly complex models that fit the training data very well but perform
poorly on unseen data (overfitting).
•Large Trees: Deep trees with many branches can be complex and difficult to interpret.
•Instability:
•Small Changes in Data: Decision trees can be sensitive to small changes in the data, leading to different tree
structures. This can affect model stability and generalization.
•Bias Towards Features with More Levels:
•Features with more possible values can dominate the decision-making process, leading to biased splits.
•Poor Performance on Imbalanced Datasets:
•Decision trees may perform poorly if the dataset has imbalanced classes, as they might favor the majority class.
•Greedy Algorithms:
•Local Optima: Decision trees use a greedy approach to make splits that optimize for the current node but do not
guarantee the globally optimal tree structure.
•Difficulty in Capturing Complex Patterns:
•Simple Models: While effective for many problems, decision trees might struggle with capturing very complex
patterns or interactions in the data.
•Lack of Smooth Decision Boundaries:
•Decision trees create axis-aligned decision boundaries, which can be limiting for some types of data. This can
result in less smooth or less accurate predictions.
Real-World Applications of Decision Trees
• Decision trees are versatile and can be applied across various industries and
domains. Here are some prominent real-world applications:
• 1. Healthcare
• Medical Diagnosis: Decision trees can help in diagnosing diseases by evaluating
symptoms and medical history. For example, they can be used to classify whether a
patient has a specific condition based on test results and symptoms.
• Treatment Planning: Helps in determining the best treatment options by analyzing
patient data and outcomes from similar cases.
• 2. Finance
• Credit Scoring: Financial institutions use decision trees to evaluate the
creditworthiness of loan applicants. They assess factors like credit history, income,
and loan amount to predict the likelihood of default.
• Fraud Detection: Decision trees can identify fraudulent transactions by analyzing
patterns and anomalies in transaction data.

Ai & Machine learning - 31140523010 - BDS302.pptx

  • 1.
    Decision Tree Algorithm NAME –BOBBY KUMAR SINGH STREAM – BSC DATA SCIENCE (3rd SEM) SUBJECT – MACHINE LEARNING & ARTIFICIAL INTELIGENCE (BDS 302) ROLL NO – 31140523010
  • 2.
    What is aDecision Tree? A Decision Tree is a popular supervised machine learning algorithm used for both classification and regression tasks. It is a flowchart-like tree structure where: • Nodes represent features (attributes) or decisions. • Branches represent the outcome of a decision or test. • Leaves represent the final outcome or decision. • In essence, it models decisions and their possible consequences as a tree of decisions. Each branch of the tree represents a possible decision, outcome, or path to the final classification or prediction. How Decision Trees Work 1. Structure of a Decision Tree • Root Node: The topmost node that represents the entire dataset or the initial feature. • Decision Nodes: Nodes that represent tests or decisions based on features. • Leaf Nodes: Nodes at the end of branches that provide the final decision or prediction. • Branches: Connect nodes and represent the outcome of tests or decisions.
  • 3.
    2. Building aDecision Tree 1.Select the Best Feature: 1. Classification Trees: The goal is to choose the feature that best separates the data into different classes. 2. Regression Trees: The goal is to choose the feature that best reduces the variance of the output. 2.Common Metrics for Selection: 1. Gini Impurity: Measures the impurity of a node. Lower Gini impurity indicates a better split. 2. Information Gain: Measures how much information is gained by splitting the data based on a feature. Higher information gain indicates a better split. 3. Variance Reduction: Measures how much the variance in the target variable is reduced by splitting the data.
  • 4.
    •Split the Data: •ClassificationTrees: The data is split based on the chosen feature, aiming to maximize the purity of the resulting subsets. •Regression Trees: The data is split to minimize the variance within each subset. •Repeat: •Apply the same process recursively to each resulting subset (i.e., create new nodes and splits) until one of the stopping criteria is met. •Stopping Criteria: •Maximum Depth: Limit the depth of the tree. •Minimum Samples per Leaf: Require a minimum number of samples in each leaf node. •Minimum Information Gain: Stop splitting if the gain in information is below a threshold.
  • 5.
    • 3. MakingPredictions • Classification Trees: The class label of a new data point is determined by traversing the tree from the root to a leaf node, following the decisions based on feature values. The class label of the leaf node is assigned to the data point. • Regression Trees: The prediction for a new data point is the average value of the target variable in the leaf node where the data point falls. • 4. Example • Imagine a decision tree for classifying whether to play tennis based on weather conditions: • Root Node: Weather (Sunny, Overcast, Rainy) • Decision Nodes: • If weather is Sunny, check Humidity (High, Normal) • If weather is Rainy, check Wind (Strong, Weak) • Leaf Nodes: • "Play" or "Don't Play" based on the outcomes of the decisions.
  • 6.
    Visual Representation ofa Simple Decision Tree: [Weather] / | Sunny Overcast Rainy / / Humidity - Wind - / / High Normal Strong Weak / / Don't Play Play Don't Play Play
  • 7.
    TYPES OF DESCISIONTREE 1. Classification Trees: 1. Purpose: Used for categorical target variables where the goal is to assign labels or classes to data points. 2. Example Use Cases: Email spam detection, medical diagnosis, customer segmentation. 2. Metrics for Splitting: 1. Gini Impurity: Measures the likelihood of incorrectly classifying a randomly chosen element from the node. 2. Information Gain: Measures how much knowing the value of a feature improves classification. 3. Regression Trees: 1. Purpose: Used for continuous target variables where the goal is to predict a numerical value. 2. Example Use Cases: Predicting house prices, forecasting sales, estimating risk. 4. Metrics for Splitting: 1. Variance Reduction: Measures how much the variance in the target variable is reduced by the split. 2. Mean Squared Error (MSE): Measures the average of the squared differences between actual and predicted values. 5. Other Variants and Techniques: 1. CART (Classification and Regression Trees): A framework that supports both classification and regression tasks. Uses Gini impurity for classification and variance reduction for regression. 2. ID3 (Iterative Dichotomiseer 3): An earlier algorithm for classification that uses information gain to select the best feature. 3. C4.5: An improvement over ID3, handling both continuous and categorical data and using gain ratio for feature selection. 4. CHAID (Chi-square Automatic Interaction Detector): Uses chi-square tests for selecting features and is suitable for categorical target variables. • Example of a Decision Tree • For a classification task like predicting whether a customer will buy a product based on age and income: • Root Node: Age (Young, Middle-aged, Old) • Decision Nodes: • If Age is Young, check Income (Low, High) • If Age is Middle-aged or Old, check other features like Employment Status • Leaf Nodes: • "Buy" or "Don't Buy" based on the decisions at the end of each branch.
  • 8.
    1.Easy to Understandand Interpret: 1. Visual Representation: Decision trees are intuitive and easy to visualize, making them straightforward to interpret and explain to non-technical stakeholders. 2. Feature Importance: The structure of the tree can highlight the most important features for decision-making. 2.No Need for Feature Scaling: 1. Decision trees do not require normalization or standardization of features, which simplifies preprocessing. 3.Handles Non-Linear Relationships: 1. Decision trees can capture non-linear relationships between features and the target variable without needing transformation or complex models. 4.Versatile: 1. Classification and Regression: Decision trees can be used for both classification (categorical target) and regression (continuous target) tasks. 5.Robust to Outliers: 1. Decision trees are relatively robust to outliers compared to some other algorithms because they make decisions based on splitting criteria rather than absolute values. 6.Feature Selection: 1. Implicitly performs feature selection, as only the most important features are used to split the nodes. 7.Non-parametric: 1. No assumptions about the underlying data distribution, making them suitable for a wide range of problems. 8.Handles Mixed Data Types: 1. Can handle both numerical and categorical data, making them flexible in various contexts. ADVANTAGES OF DECISION TREE
  • 9.
    LIMITATION OF DESCISIONTREE •Overfitting: •Complex Trees: Decision trees can create overly complex models that fit the training data very well but perform poorly on unseen data (overfitting). •Large Trees: Deep trees with many branches can be complex and difficult to interpret. •Instability: •Small Changes in Data: Decision trees can be sensitive to small changes in the data, leading to different tree structures. This can affect model stability and generalization. •Bias Towards Features with More Levels: •Features with more possible values can dominate the decision-making process, leading to biased splits. •Poor Performance on Imbalanced Datasets: •Decision trees may perform poorly if the dataset has imbalanced classes, as they might favor the majority class. •Greedy Algorithms: •Local Optima: Decision trees use a greedy approach to make splits that optimize for the current node but do not guarantee the globally optimal tree structure. •Difficulty in Capturing Complex Patterns: •Simple Models: While effective for many problems, decision trees might struggle with capturing very complex patterns or interactions in the data. •Lack of Smooth Decision Boundaries: •Decision trees create axis-aligned decision boundaries, which can be limiting for some types of data. This can result in less smooth or less accurate predictions.
  • 10.
    Real-World Applications ofDecision Trees • Decision trees are versatile and can be applied across various industries and domains. Here are some prominent real-world applications: • 1. Healthcare • Medical Diagnosis: Decision trees can help in diagnosing diseases by evaluating symptoms and medical history. For example, they can be used to classify whether a patient has a specific condition based on test results and symptoms. • Treatment Planning: Helps in determining the best treatment options by analyzing patient data and outcomes from similar cases. • 2. Finance • Credit Scoring: Financial institutions use decision trees to evaluate the creditworthiness of loan applicants. They assess factors like credit history, income, and loan amount to predict the likelihood of default. • Fraud Detection: Decision trees can identify fraudulent transactions by analyzing patterns and anomalies in transaction data.