This_ is_the_Ensemble_Learning_Deck.pptx

Ensemble Learning: Bagging,
Boosting & Stacking
A practical, teaching-ready deck (with
code snippets)

Why Ensemble Learning?
• Combine multiple models to improve accuracy
and robustness
• Reduce variance (averaging), sometimes bias
(boosting)
• Stronger generalization on tabular data;
competitive baseline in practice
• Natural parallelization (bagging) & strong off-
the-shelf performance (RF)

Bias–Variance Perspective (High-
Level)
• Prediction error = Bias² + Variance +
Irreducible noise
• Bagging ↓ variance by averaging unstable
learners (e.g., trees)
• Boosting can ↓ bias by sequentially correcting
residuals
• Diversity among base learners is key for gains

Ensemble Taxonomy
• Homogeneous vs. Heterogeneous (same vs.
different base models)
• Parallel (Bagging/Random Forest) vs.
Sequential (Boosting)
• Voting/Averaging (hard vs. soft)
• Stacking/Blending with a meta-learner

Bagging: Bootstrap Aggregating
• Train B models on bootstrap samples
(sampling with replacement)
• Each model is high-variance (e.g., deep tree)
→ averaging stabilizes
• Out-of-Bag (OOB) estimation: ~37% data not
seen by a given tree
• Key knobs: n_estimators (B), base_learner
complexity, max_samples

Random Forests (RF)
• Bagging + Random Subspace: each split
considers random subset of features
• De-correlates trees; often best default for
tabular data
• Classification: majority vote; Regression:
average
• Tune: n_estimators, max_features,
max_depth, min_samples_leaf, class_weight

Extremely Randomized Trees
(ExtraTrees)
• Randomized thresholds + random feature
subsets at each split
• Even more de-correlation; often faster
• May increase bias slightly but reduce variance
further

Boosting: Core Idea
• Train weak learners sequentially; each focuses
on previous errors
• Stage-wise additive modeling: f_{t}(x) = f_{t-1}
(x) + η * h_t(x)
• Requires careful regularization (learning rate,
depth, subsampling)

AdaBoost (Binary Classification)
• Re-weights samples; harder examples get
higher weight
• Weak learner: typically shallow decision trees
(stumps)
• Final prediction: weighted vote of weak
learners
• Sensitive to noise/outliers; strong on clean,
small-to-medium data

Gradient Boosting (GBDT/GBM)
• Fit new tree to negative gradient of loss
(residuals)
• Key hyperparameters: n_estimators,
learning_rate, max_depth (or max_leaves),
subsample
• Use early stopping with validation set to
prevent overfit
• Variants: XGBoost, LightGBM, CatBoost

XGBoost vs. LightGBM vs. CatBoost
(At a Glance)
• XGBoost: robust regularization, shrinkage,
column subsampling, wide ecosystem
• LightGBM: leaf-wise growth with depth limits;
fast on large, sparse datasets
• CatBoost: native categorical handling, ordered
boosting (reduces target leakage)
• Pick based on data size, sparsity, categorical
richness, and latency needs

Stacking (Meta-Learning)
• Level-0: diverse base models; Level-1: meta-
learner uses out-of-fold predictions
• Cross-validation is crucial to avoid leakage
• Use simple meta-learners first (logistic/linear)
to avoid overfitting
• Blending: holdout set for meta-features
(simpler, less data-efficient)

Voting & Averaging
• Hard voting: majority class label
• Soft voting: average predicted probabilities
(requires calibrated models)
• Weighted voting: weight by validation
performance or domain knowledge

Imbalanced Data Strategies
• Use class_weight='balanced'
(RF/AdaBoost/etc.) or sampling strategies
• Optimize thresholds using PR curves; use
AUPRC for evaluation
• Consider Balanced Random Forest,
EasyEnsemble, or focal loss (boosting variants)

Interpretability & Diagnostics
• Global: permutation importance, minimal
depth, gain statistics
• Local: SHAP values, tree path analysis,
counterfactuals
• Check calibration (reliability curves) for
probability outputs

Practical Tips
• Start with RF as a baseline for tabular
problems
• For boosting: tune learning_rate and trees
with early stopping
• Use OOB (bagging/RF) for quick model
iteration
• Cross-validate across time splits for temporal
data (avoid leakage)

Key Hyperparameters (Cheat
Sheet)
• RF: n_estimators↑, max_features (sqrt/log2),
min_samples_leaf (1–10)
• GBM: learning_rate (0.01–0.1), n_estimators
(100–1000+), max_depth (3–10), subsample
(0.6–0.9)
• AdaBoost: n_estimators (50–500),
learning_rate (0.01–1.0)
• Stacking: base diversity ↑, meta-learner
regularization (ridge/logistic)

Code: Bagging & Random Forest
(scikit-learn)
• from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.model_selection import train_test_split, cross_val_score
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
bag = BaggingClassifier(n_estimators=200, max_samples=0.8, n_jobs=-1,
random_state=42)
rf = RandomForestClassifier(n_estimators=400, max_features='sqrt',
oob_score=True,
n_jobs=-1, random_state=42)
for model in [bag, rf]:
scores = cross_val_score(model, X_train, y_train, cv=5, n_jobs=-1)
print(model.__class__.__name__, scores.mean(), scores.std())
rf.fit(X_train, y_train); print('OOB:', rf.oob_score_)

Code: AdaBoost &
GradientBoosting (scikit-learn)
• from sklearn.ensemble import AdaBoostClassifier,
GradientBoostingClassifier
ada = AdaBoostClassifier(n_estimators=300, learning_rate=0.1,
random_state=42)
gb = GradientBoostingClassifier(n_estimators=500, learning_rate=0.05,
max_depth=3,
subsample=0.8, random_state=42)
ada.fit(X_train, y_train)
gb.fit(X_train, y_train)
print('AdaBoost test acc:', ada.score(X_test, y_test))
print('GB test acc:', gb.score(X_test, y_test))

Code: Stacking & Voting (scikit-
learn)
• from sklearn.ensemble import StackingClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
base_learners = [
('rf', RandomForestClassifier(n_estimators=300, random_state=42)),
('svc', SVC(probability=True, kernel='rbf', C=2.0, gamma='scale',
random_state=42))
]
meta = LogisticRegression(max_iter=1000)
stack = StackingClassifier(estimators=base_learners,
final_estimator=meta, cv=5, n_jobs=-1)
vote = VotingClassifier(estimators=base_learners, voting='soft',
n_jobs=-1)
stack.fit(X_train, y_train)
vote.fit(X_train, y_train)
print('Stack acc:', stack.score(X_test, y_test))
print('Vote acc:', vote.score(X_test, y_test))

When to Use What
• RF: strong baseline for mixed/tabular data;
low tuning cost
• GBM (XGB/LGBM/CatBoost): when you need
top accuracy and can tune carefully
• Bagging (generic): unstable base learner &
small data → variance reduction
• Stacking: when diverse models each capture
different structure; ensure robust CV

Common Pitfalls & How to Avoid
• Data leakage in stacking/blending → use out-
of-fold predictions
• Overfitting with too-deep trees in boosting →
use small max_depth + regularization
• Poor probability calibration in RF/Boosting →
use calibration on validation set
• Distribution shift → evaluate with time-aware
or group-aware splits

Mini Case Sketch (Credit Risk)
• Goal: predict default; imbalanced (5%
positive)
• Baseline RF with class_weight='balanced' →
tune max_features
• Compare with LightGBM + early stopping;
evaluate AUPRC
• Stack RF + SVC + LR-meta for final model;
check calibration

References & Further Reading
• Breiman, L. (1996) Bagging Predictors; (2001)
Random Forests
• Freund & Schapire (1997) AdaBoost
• Friedman (2001) Greedy Function
Approximation (GBM)
• Chen & Guestrin (2016) XGBoost; Ke et al.
(2017) LightGBM; Dorogush et al. (2018)
CatBoost

This_ is_the_Ensemble_Learning_Deck.pptx

More Related Content

Similar to This_ is_the_Ensemble_Learning_Deck.pptx

More from shivangisingh564490

Recently uploaded

This_ is_the_Ensemble_Learning_Deck.pptx