This_is _the_Decision_Tree_Basics.pptx__

Decision Trees: Intuition to
Implementation
Classification & Regression Trees
(CART/ID3) — Teaching Deck

Why Decision Trees?
• Interpretable, white-box models with human-
readable rules
• Handle mixed data types (numeric +
categorical) and non-linear boundaries
• Minimal preprocessing (no scaling required);
robust to outliers
• Foundation for powerful ensembles (Random
Forests, Gradient Boosting)

Core Idea
• Recursively split the feature space to create
regions with high class purity (classification)
• Choose the split that maximizes impurity
reduction / information gain at each node
• Continue until stopping criterion; then assign
leaf predictions

Impurity Measures (Classification)
• Gini: G = 1 - sum_k p_k^2
• Entropy: H = -sum_k p_k log_2 p_k
• Information Gain = Parent Impurity −
Weighted Child Impurities

Split Selection
• Numerical features: try candidate thresholds
(e.g., midpoints of sorted unique values)
• Categorical features: group categories (may be
exhaustive or heuristic)
• Pick split with best impurity reduction (ties
broken by secondary criteria)

CART vs. ID3/C4.5
• ID3/C4.5: uses Entropy/Information Gain (or
Gain Ratio), often for categorical features
• CART: uses Gini (classification) and MSE
(regression), builds binary trees
• Modern libraries (e.g., scikit-learn) implement
CART-style trees

Stopping Criteria (Pre-pruning)
• max_depth: limit tree depth
• min_samples_split / min_samples_leaf:
minimum samples to split/at leaf
• max_leaf_nodes: cap number of leaves
• min_impurity_decrease: require sufficient gain
to split

Post-pruning (Cost-Complexity)
• Grow a large tree, then prune back by
penalizing complexity
• Minimize: R_alpha(T) = sum_{leaves} R(t) +
alpha * |leaves|
• In scikit-learn: tune ccp_alpha via cross-
validation

Regression Trees
• Impurity measure: Mean Squared Error (MSE)
or Mean Absolute Error (MAE)
• Leaf prediction: average of targets in the leaf
• Same control parameters; beware of
overfitting on noisy targets

Bias–Variance Trade-off
• Shallow tree: high bias, low variance
(underfits)
• Deep tree: low bias, high variance (overfits)
• Cross-validate depth and leaf sizes to balance
performance

Interpretability & Explanations
• Path explanations: ‘IF (conditions) THEN
prediction’
• Global feature importance (impurity decrease)
• Caveat: impurity-based importance can be
biased toward high-cardinality features

Handling Practicalities
• Missing values: impute before training or use
surrogate splits (if library supports)
• Class imbalance: class_weight, balanced
subsampling, or threshold tuning
• No need for feature scaling; still wise to
encode categoricals consistently

scikit-learn Example (Classification)
• from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV
clf = DecisionTreeClassifier(random_state=42)
param_grid = {
'max_depth': [None, 3, 5, 10],
'min_samples_leaf': [1, 2, 5, 10],
'ccp_alpha': [0.0, 0.001, 0.01]
}
grid = GridSearchCV(clf, param_grid, cv=5, n_jobs=-1)
grid.fit(X_train, y_train)
print(grid.best_params_, grid.best_score_)

Exporting Rules / Visualization
• from sklearn import tree
import matplotlib.pyplot as plt
plt.figure(figsize=(12,6))
tree.plot_tree(grid.best_estimator_, feature_names=feature_names,
class_names=class_names, filled=True)
plt.show()

Pros & Cons (Summary)
• Pros: simple, interpretable, minimal
preprocessing, handles interactions
• Cons: unstable to small data changes, prone to
overfitting, axis-aligned splits only
• Often used as base learners for ensembles (RF,
GBM, XGBoost)

This_is _the_Decision_Tree_Basics.pptx__

More Related Content

Similar to This_is _the_Decision_Tree_Basics.pptx__

More from shivangisingh564490

Recently uploaded

This_is _the_Decision_Tree_Basics.pptx__