SlideShare a Scribd company logo
1 of 17
Auto-CES
an Automatic pruning method through
Clustering Ensemble Selection
Authors:
• Mojtaba Amiri Maskouni
• Saeid Hosseini
• Hadi Mohammadzadeh Abachi
• Mohammadreza Kangavari
• Xiaofang Zhou
Title and Content Layout with List
• Background
– Ensemble Classification
– Ensemble Diversity
– Random Forests
• Clustering and Ensemble Diversity
– CLUB-DRF
– Experimental Study
• Summary and Future Work
Ensemble Learning:
The learning algorithms that construct a set of trained classifiers whose individual
decisions are combined to classify new examples.
Bagging, boosting, random subspace and random forests are among the major
approaches to build ensemble of classifiers.
Diversity in Ensemble
• Definition:
– Has no general Definition
– the capability to maximize prediction correctness for a set of classifiers that are categorized
into a unique ensemble
– can not always assure an accurate estimation outcome
– Maximize stability
• Augmenting the diversity in Ensemble:
– Improve the efficiency: Higher diversity  elimination of similar classifiers
– promote the generalization performance
• Diversification methods:
– Bootstrap (bagging)
– Random feature selection (random subspace)
Random Forests
• An ensemble classification and regression technique introduced by Leo Breiman.
• It generates a diversified ensemble of decision trees adopting two methods:
– A bootstrap sample is used for the construction of each tree (bagging), resulting in
approximately 63.2% unique samples, and the rest are repeated
– At each node split, only a subset of features are drawn randomly to assess the goodness
of each feature/attribute (F or log2 F is used, where F is the total number of features)
• Adding excessive classifiers in the forest does not improve the accuracy.
– Main Challenge: find optimum number of classifier
Cluster Ensemble Selection (CES)
• Definition:
– A joint process that produces a small ensemble (prune other) that can perform
classification as effective, or even better than the original ensemble.
– A smaller set can perform more efficient than the complete ensemble.
• These methods are two-fold:
– categorize homogeneous classifiers.
– select a subset of clusters to maximize diversity between chosen classifiers.
Random Forest pruning algorithms based-on CES
Methods Description Reference
ERF
1. Sort all trees in their AUC descending order
2. Select the top P trees with high AUC values
3. Cluster these p selected trees to Q cluster
4. Select a tree from each cluster with high AUC
Bharathidason
2014
CLUB-DRF
1. Trees are clustered (K-Modes) according to their
classification pattern
2. One or more representative are chosen from each
cluster based-on random or high AUC
Fawagreh
2015
Main challenge: need to setting parameters
Auto-CES: an Automatic pruning method through
Clustering Ensemble Selection
Auto-CES has Two following stages:
• Clustering: cluster the homogeneous trees based on predefined similarities
• Selection: Select best tree from each cluster based-on the cohesiveness measure
• Nobilities:
– Grouping trees in Automatic way
– Define the cohesiveness measure to select the trees
Clustering step
• Find Epsilon
𝐷𝐹𝑡 𝑖,𝑡 𝑘
=
𝑁00
𝑁
𝐷𝐼𝑆𝑡 𝑖,𝑡 𝑘
= 1 − 𝐷𝐹𝑡 𝑖,𝑡 𝑘
𝜀 =
2
𝑛 𝑛 −1 𝑖=1
𝑛−1
𝑘=𝑖+1
𝑛
𝐷𝐼𝑆𝑡 𝑖,𝑡 𝑘
Notation Description
𝑁00 the number of joint
misclassified samples of the
tree pair (𝑡𝑖 , 𝑡 𝑘)
𝑁 the total number of
validation samples
𝐷𝐹 the double fault criterion
𝐷𝐼𝑆 Dissimilarity value
𝜀 The epsilon threshold
𝑛 Total number of classifier
𝑡𝑖 The 𝑖 𝑡ℎ
𝑡𝑟𝑒𝑒
Selection step
Individual effect ( 𝜃𝑡 𝑖
): the average of accuracy over training and validation of the 𝒊 𝒕𝒉 tree
General effect (𝐷𝐼𝑉𝑡 𝑖
): the average of all dissimilarity for the 𝒊 𝒕𝒉
tree
Cohesiveness measure (𝜁 𝑡 𝑖
): the selection measure that considers both accuracy and
diversity
𝜁 𝑡 𝑖 = 𝜃𝑡 𝑖
∗ 𝐷𝐼𝑉𝑡 𝑖
Test the prune
RF
Prune RF
Generate
RF
Train
Test
Validation
Programing
language
𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 ×
Precision × Recall
Precision + Recall
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
𝐴𝑙𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
Experimental Setup
Evaluation
measures
Competing methods
1. Breiman’s as CART-based RF (BC-RF)
2. CLUB-DRF applies the K-MODES clustering model
3. Auto-CES:
– Based on the Accuracy and Diversity (B-A-D)
– Based on the Accuracy alone (B-A)
– Based on the Diversity alone (B-D)
Description of the Data Sets
Performance Comparison according to Accuracy and test time
Impact of the number of tree on the Accuracy Impact of the number of tree on the test time
Our approaches always give the same or better result
Through accuracy compared with other rivals
Except the results shown for Herman's dataset,
Auto-CES gains the best efficiency.
Performance Comparison according to F-measure
Without noise 20 percent noise10 percent noise
• At least one of the our pruned models give the same or even better effectiveness
through F-measure.
• As a result, the selected trees create the ensembles that achieve higher stability.
Impact of noise on Wilt dataset
Without noise 20 percent noise10 percent noise
Result based on accuracy over the Wilt data set:
• the retrieved trees chosen by our model gain more stability and robustness.
Dissuasion:
Two reasons support the good results of our method:
1. An essential component in calculating of the cohesiveness for each tree is the
average accuracy 𝜃 that is computed during both training and validation.
Hence, the trees that are selected have the highest stability among other trees.
2. The effect of 𝜃 and 𝐷𝐼𝑉 are simultaneously employed to compute the
cohesiveness metric (𝜁). As a result, the selected trees create the ensembles
that achieve higher robustness.
Future work:
• Extend our algorithm in a large-scale environment including the multi-cluster
spark platforms.

More Related Content

Similar to Automatic pruning method for random forest

Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
ensemble learning
ensemble learningensemble learning
ensemble learning
butest
 
Robust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsRobust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labels
Kimin Lee
 

Similar to Automatic pruning method for random forest (20)

decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
decisiontrees.ppt
decisiontrees.pptdecisiontrees.ppt
decisiontrees.ppt
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learning
 
Data mining
Data miningData mining
Data mining
 
RandomForests in artificial intelligence
RandomForests in artificial intelligenceRandomForests in artificial intelligence
RandomForests in artificial intelligence
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
A Multiple-Expert Binarization Framework for Multispectral Images
A Multiple-Expert Binarization Framework for Multispectral ImagesA Multiple-Expert Binarization Framework for Multispectral Images
A Multiple-Expert Binarization Framework for Multispectral Images
 
random forest.pptx
random forest.pptxrandom forest.pptx
random forest.pptx
 
Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification Branch and Bound Feature Selection for Hyperspectral Image Classification
Branch and Bound Feature Selection for Hyperspectral Image Classification
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
 
An introduction to variable and feature selection
An introduction to variable and feature selectionAn introduction to variable and feature selection
An introduction to variable and feature selection
 
Robust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsRobust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labels
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
 
Moviereview prjct
Moviereview prjctMoviereview prjct
Moviereview prjct
 
Survival Analysis Superlearner
Survival Analysis SuperlearnerSurvival Analysis Superlearner
Survival Analysis Superlearner
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Support Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random ForestSupport Vector machine(SVM) and Random Forest
Support Vector machine(SVM) and Random Forest
 
Data mining techniques unit iv
Data mining techniques unit ivData mining techniques unit iv
Data mining techniques unit iv
 

Recently uploaded

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Automatic pruning method for random forest

  • 1. Auto-CES an Automatic pruning method through Clustering Ensemble Selection Authors: • Mojtaba Amiri Maskouni • Saeid Hosseini • Hadi Mohammadzadeh Abachi • Mohammadreza Kangavari • Xiaofang Zhou
  • 2. Title and Content Layout with List • Background – Ensemble Classification – Ensemble Diversity – Random Forests • Clustering and Ensemble Diversity – CLUB-DRF – Experimental Study • Summary and Future Work
  • 3. Ensemble Learning: The learning algorithms that construct a set of trained classifiers whose individual decisions are combined to classify new examples. Bagging, boosting, random subspace and random forests are among the major approaches to build ensemble of classifiers.
  • 4. Diversity in Ensemble • Definition: – Has no general Definition – the capability to maximize prediction correctness for a set of classifiers that are categorized into a unique ensemble – can not always assure an accurate estimation outcome – Maximize stability • Augmenting the diversity in Ensemble: – Improve the efficiency: Higher diversity  elimination of similar classifiers – promote the generalization performance • Diversification methods: – Bootstrap (bagging) – Random feature selection (random subspace)
  • 5. Random Forests • An ensemble classification and regression technique introduced by Leo Breiman. • It generates a diversified ensemble of decision trees adopting two methods: – A bootstrap sample is used for the construction of each tree (bagging), resulting in approximately 63.2% unique samples, and the rest are repeated – At each node split, only a subset of features are drawn randomly to assess the goodness of each feature/attribute (F or log2 F is used, where F is the total number of features) • Adding excessive classifiers in the forest does not improve the accuracy. – Main Challenge: find optimum number of classifier
  • 6. Cluster Ensemble Selection (CES) • Definition: – A joint process that produces a small ensemble (prune other) that can perform classification as effective, or even better than the original ensemble. – A smaller set can perform more efficient than the complete ensemble. • These methods are two-fold: – categorize homogeneous classifiers. – select a subset of clusters to maximize diversity between chosen classifiers.
  • 7. Random Forest pruning algorithms based-on CES Methods Description Reference ERF 1. Sort all trees in their AUC descending order 2. Select the top P trees with high AUC values 3. Cluster these p selected trees to Q cluster 4. Select a tree from each cluster with high AUC Bharathidason 2014 CLUB-DRF 1. Trees are clustered (K-Modes) according to their classification pattern 2. One or more representative are chosen from each cluster based-on random or high AUC Fawagreh 2015 Main challenge: need to setting parameters
  • 8. Auto-CES: an Automatic pruning method through Clustering Ensemble Selection
  • 9. Auto-CES has Two following stages: • Clustering: cluster the homogeneous trees based on predefined similarities • Selection: Select best tree from each cluster based-on the cohesiveness measure • Nobilities: – Grouping trees in Automatic way – Define the cohesiveness measure to select the trees
  • 10. Clustering step • Find Epsilon 𝐷𝐹𝑡 𝑖,𝑡 𝑘 = 𝑁00 𝑁 𝐷𝐼𝑆𝑡 𝑖,𝑡 𝑘 = 1 − 𝐷𝐹𝑡 𝑖,𝑡 𝑘 𝜀 = 2 𝑛 𝑛 −1 𝑖=1 𝑛−1 𝑘=𝑖+1 𝑛 𝐷𝐼𝑆𝑡 𝑖,𝑡 𝑘 Notation Description 𝑁00 the number of joint misclassified samples of the tree pair (𝑡𝑖 , 𝑡 𝑘) 𝑁 the total number of validation samples 𝐷𝐹 the double fault criterion 𝐷𝐼𝑆 Dissimilarity value 𝜀 The epsilon threshold 𝑛 Total number of classifier 𝑡𝑖 The 𝑖 𝑡ℎ 𝑡𝑟𝑒𝑒
  • 11. Selection step Individual effect ( 𝜃𝑡 𝑖 ): the average of accuracy over training and validation of the 𝒊 𝒕𝒉 tree General effect (𝐷𝐼𝑉𝑡 𝑖 ): the average of all dissimilarity for the 𝒊 𝒕𝒉 tree Cohesiveness measure (𝜁 𝑡 𝑖 ): the selection measure that considers both accuracy and diversity 𝜁 𝑡 𝑖 = 𝜃𝑡 𝑖 ∗ 𝐷𝐼𝑉𝑡 𝑖
  • 12. Test the prune RF Prune RF Generate RF Train Test Validation Programing language 𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 × Precision × Recall Precision + Recall 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 𝐴𝑙𝑙 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠 Experimental Setup Evaluation measures
  • 13. Competing methods 1. Breiman’s as CART-based RF (BC-RF) 2. CLUB-DRF applies the K-MODES clustering model 3. Auto-CES: – Based on the Accuracy and Diversity (B-A-D) – Based on the Accuracy alone (B-A) – Based on the Diversity alone (B-D) Description of the Data Sets
  • 14. Performance Comparison according to Accuracy and test time Impact of the number of tree on the Accuracy Impact of the number of tree on the test time Our approaches always give the same or better result Through accuracy compared with other rivals Except the results shown for Herman's dataset, Auto-CES gains the best efficiency.
  • 15. Performance Comparison according to F-measure Without noise 20 percent noise10 percent noise • At least one of the our pruned models give the same or even better effectiveness through F-measure. • As a result, the selected trees create the ensembles that achieve higher stability.
  • 16. Impact of noise on Wilt dataset Without noise 20 percent noise10 percent noise Result based on accuracy over the Wilt data set: • the retrieved trees chosen by our model gain more stability and robustness.
  • 17. Dissuasion: Two reasons support the good results of our method: 1. An essential component in calculating of the cohesiveness for each tree is the average accuracy 𝜃 that is computed during both training and validation. Hence, the trees that are selected have the highest stability among other trees. 2. The effect of 𝜃 and 𝐷𝐼𝑉 are simultaneously employed to compute the cohesiveness metric (𝜁). As a result, the selected trees create the ensembles that achieve higher robustness. Future work: • Extend our algorithm in a large-scale environment including the multi-cluster spark platforms.