ENSEMBLE
METHODS
Mehnaz Maharin
Ensemble Methods
Work Flow
Bagging -Random Forest
What?
Why?
• Model Averaging Approach
• A machine learning ensemble meta-algorithm designed to improve the
stability and accuracy of machine learning algorithms used in statistical
classification and regression.
• To reduce Overfitting
• To reduceVariance
Bagging
■ DecisionTree
Type_Name
Score
Status
Risk_factor
Avg_Score
Classification
HRU
Indicator
Alert_Category
Alert_Type
Grouping
Indicator_Heat_Score
Output:
Malicious
Score
Risk
Factor
Classific-
ation
1 0
Bagging –Random Forest
D
m DT1
DT2
DT3
DTn
1
0
1
1
Aggregation
(Majority
Voting)
1
d
n
n<m
D<D
Boosting
What?
Why?
• a family of machine learning algorithms that convert weak learners to
strong ones
• A machine learning ensemble meta-algorithm designed to reduce bias
and also variances .
• To reduce bias
• To reduceVariance
Boosting
Base
Learner 1
Base
Learner 2
Base
Learner 3
Weak Learners
Incorrectly
Classified
f1 f2 f3 ……….. O/P Sample
Weight
n
1/n
f1
Y=4
N=1
f3
fn
STUMPS
f1 f2 f3 ……….. O/P Sample
Weight
n=7
1/7
1/7
1/7
1/7
f1
Y=4
N=1
STUMPS
Step 2
Total Error= 1/7
Step 3:
Performance of
Stump= .5*loge
(1-TE/TE)
Step 1
0.8
Updated
Weight
Step 4
0.6
Normal
Weight
Step 5
0.6
Normal
Weight
Buckets
Step 6 Step 7
f1 f2 f3 ……….. O/P
n=7
Test Dataset
DT1 DT2
DTn
1 0
1

Ensemble Methods