Module 2: Hyperparameters in Decision Tree Learning
CSET 301: Artificial Intelligence and Machine Learning
Dr Sounak Sadhukhan
Issues in Decision Tree Learning
Decision trees are powerful tools for classification and regression, but they are not without their
limitations. Here are some of the key issues that arise in decision tree learning:
Overfitting
● Definition: When a decision tree is too complex, it can learn the training data too well, leading to
poor generalization to new data.
● Consequences: Overfitting can result in a model that is overly sensitive to noise in the data and
performs poorly on unseen examples.
Mitigation:
● Pruning: Remove branches or nodes from the tree to simplify its structure.
● Ensemble methods: Combine multiple decision trees to reduce overfitting and improve
generalization.
Overfitting
Hyperparameters of Decision Trees
Hyperparameter Tuning
Criterion:
How to measure the quality of a split in a decision tree. You can input “gini” for Gini Impurity or “entropy” for information gain.
Default = “gini”
Input Options {“gini”, “entropy”}
→
Max_Depth:
The maximum depth of the tree. If this is not specified in the Decision Tree, the nodes will be expanded until all leaf nodes are
pure or until all leaf nodes contain less than min_samples_split.
Default = None
Input options integer
→
Min_Samples_Split:
The minimum samples required to split an internal node. If the amount of sample in an internal node is less than the
min_samples_split, then that node will become a leaf node.
Default = 2
Input options integer or float (if float, then min_samples_split is fraction)
→
Hyperparameter Tuning (1)
Min_Samples_Leaf:
The minimum samples required to be at a leaf node. Therefore, a split can only happen if it leaves at
least the min_samples_leaf in both of the resulting nodes.
Default = 1
Input options integer or float (if float, then min_samples_leaf is fraction)
→
Max_Features:
The number of features to consider when looking for the best split. For example, if there are 35
features in a dataframe and max_features is 9, only 9 of the 35 features will be used in the decision
tree.
Default = None
Input options integer, float (if float, then max_features is fraction) or {“auto”, “sqrt”, “log2”}
→
“auto”: max_features=sqrt(n_features)
“sqrt”: max_features = sqrt(n_features)
“log2”: max_features=log2(n_features)
Other Issues in Decision Tree Learning
Computational Complexity
● Definition: Decision tree learning can be computationally expensive, especially for large datasets
or complex problems.
● Factors: The number of features, the number of instances, and the depth of the tree can all
contribute to the computational complexity.
● Mitigation:
○ Heuristic algorithms: Use algorithms like ID3, C4.5, and CART that employ heuristics to
reduce the search space.
○ Feature selection: Select a subset of features to reduce the dimensionality of the data and
improve efficiency.
Other Issues in Decision Tree Learning (1)
Instability
● Definition: Small changes in the training data can lead to significant changes in the structure of a
decision tree.
● Consequences: This instability can make it difficult to interpret and trust the model.
● Mitigation:
○ Ensemble methods: Combining multiple decision trees can help to reduce instability and
improve robustness.
Handling Missing Data
● Definition: Decision trees can struggle to handle missing data, as many algorithms rely on having
complete information for each instance.
● Strategies:
○ Imputation: Fill in missing values with estimated values.
○ Ignore instances: Remove instances with missing values.
○ Create a separate category: Treat missing values as a separate category.
Bias and Variance
Bias and variance are two fundamental concepts in machine learning that measure the error in a model's
predictions. Understanding these concepts is crucial for building effective models.
Bias measures the model's ability to capture the underlying pattern.
Variance measures the model's sensitivity to small changes in the data.
The optimal model is one that strikes a balance between bias and variance.
Bias and Variance
Definition: Bias refers to the error introduced by a model's inability to capture the underlying relationship
between the features and the target variable. In other words, it's the difference between the average
prediction of the model and the true function.
● High Bias: A model with high bias is underfitting the data, meaning it's too simple to capture the
complexities of the problem.
● Low Bias: A model with low bias is able to capture the underlying relationship well, but it might be
overfitting the training data.
Variance
Definition: Variance measures the model's sensitivity to small changes in the training data. It's the
variability of the model's predictions across different training sets.
● High Variance: A model with high variance is overfitting the data, meaning it's too complex and is
learning the noise in the data rather than the underlying pattern.
● Low Variance: A model with low variance is more consistent in its predictions across different
training sets.
Bias-Variance Trade-off
● Increasing model complexity: Typically reduces bias but increases variance.
● Decreasing model complexity: Typically increases bias but reduces variance.
The goal is to find the optimal balance between bias and variance to minimize the overall error.

Module 2_ Hyperparameters in Decision Tree Learning.pptx

  • 1.
    Module 2: Hyperparametersin Decision Tree Learning CSET 301: Artificial Intelligence and Machine Learning Dr Sounak Sadhukhan
  • 2.
    Issues in DecisionTree Learning Decision trees are powerful tools for classification and regression, but they are not without their limitations. Here are some of the key issues that arise in decision tree learning: Overfitting ● Definition: When a decision tree is too complex, it can learn the training data too well, leading to poor generalization to new data. ● Consequences: Overfitting can result in a model that is overly sensitive to noise in the data and performs poorly on unseen examples. Mitigation: ● Pruning: Remove branches or nodes from the tree to simplify its structure. ● Ensemble methods: Combine multiple decision trees to reduce overfitting and improve generalization.
  • 3.
  • 17.
  • 18.
    Hyperparameter Tuning Criterion: How tomeasure the quality of a split in a decision tree. You can input “gini” for Gini Impurity or “entropy” for information gain. Default = “gini” Input Options {“gini”, “entropy”} → Max_Depth: The maximum depth of the tree. If this is not specified in the Decision Tree, the nodes will be expanded until all leaf nodes are pure or until all leaf nodes contain less than min_samples_split. Default = None Input options integer → Min_Samples_Split: The minimum samples required to split an internal node. If the amount of sample in an internal node is less than the min_samples_split, then that node will become a leaf node. Default = 2 Input options integer or float (if float, then min_samples_split is fraction) →
  • 19.
    Hyperparameter Tuning (1) Min_Samples_Leaf: Theminimum samples required to be at a leaf node. Therefore, a split can only happen if it leaves at least the min_samples_leaf in both of the resulting nodes. Default = 1 Input options integer or float (if float, then min_samples_leaf is fraction) → Max_Features: The number of features to consider when looking for the best split. For example, if there are 35 features in a dataframe and max_features is 9, only 9 of the 35 features will be used in the decision tree. Default = None Input options integer, float (if float, then max_features is fraction) or {“auto”, “sqrt”, “log2”} → “auto”: max_features=sqrt(n_features) “sqrt”: max_features = sqrt(n_features) “log2”: max_features=log2(n_features)
  • 20.
    Other Issues inDecision Tree Learning Computational Complexity ● Definition: Decision tree learning can be computationally expensive, especially for large datasets or complex problems. ● Factors: The number of features, the number of instances, and the depth of the tree can all contribute to the computational complexity. ● Mitigation: ○ Heuristic algorithms: Use algorithms like ID3, C4.5, and CART that employ heuristics to reduce the search space. ○ Feature selection: Select a subset of features to reduce the dimensionality of the data and improve efficiency.
  • 21.
    Other Issues inDecision Tree Learning (1) Instability ● Definition: Small changes in the training data can lead to significant changes in the structure of a decision tree. ● Consequences: This instability can make it difficult to interpret and trust the model. ● Mitigation: ○ Ensemble methods: Combining multiple decision trees can help to reduce instability and improve robustness. Handling Missing Data ● Definition: Decision trees can struggle to handle missing data, as many algorithms rely on having complete information for each instance. ● Strategies: ○ Imputation: Fill in missing values with estimated values. ○ Ignore instances: Remove instances with missing values. ○ Create a separate category: Treat missing values as a separate category.
  • 22.
    Bias and Variance Biasand variance are two fundamental concepts in machine learning that measure the error in a model's predictions. Understanding these concepts is crucial for building effective models. Bias measures the model's ability to capture the underlying pattern. Variance measures the model's sensitivity to small changes in the data. The optimal model is one that strikes a balance between bias and variance.
  • 23.
    Bias and Variance Definition:Bias refers to the error introduced by a model's inability to capture the underlying relationship between the features and the target variable. In other words, it's the difference between the average prediction of the model and the true function. ● High Bias: A model with high bias is underfitting the data, meaning it's too simple to capture the complexities of the problem. ● Low Bias: A model with low bias is able to capture the underlying relationship well, but it might be overfitting the training data.
  • 24.
    Variance Definition: Variance measuresthe model's sensitivity to small changes in the training data. It's the variability of the model's predictions across different training sets. ● High Variance: A model with high variance is overfitting the data, meaning it's too complex and is learning the noise in the data rather than the underlying pattern. ● Low Variance: A model with low variance is more consistent in its predictions across different training sets.
  • 25.
    Bias-Variance Trade-off ● Increasingmodel complexity: Typically reduces bias but increases variance. ● Decreasing model complexity: Typically increases bias but reduces variance. The goal is to find the optimal balance between bias and variance to minimize the overall error.