SlideShare a Scribd company logo
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Unsupervised Learning Techniques to Diversifying
and Pruning Random Forest
Dr Mohamed Medhat Gaber
School of Computing Science and Digital Media
Robert Gordon University
27 January 2015
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Acknowledgement
Work done in collaboration with PhD student Khaled Fawagreh
and co-supervisor Dr Eyad Elyan
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
1 Background
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
2 Clustering and Ensemble Diversity
CLUB-DRF
Experimental Study
3 Outlier Scoring and Ensemble Diversity
LOFB-DRF
Experimental Study
4 Summary and Future Work
Summary
Future Work
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
What is Data Classification?
Data classification is the process of assigning a class
(labelling) to a data instance, based on the values of a set of
predictive attributes (features).
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
What is Data Classification?
Data classification is the process of assigning a class
(labelling) to a data instance, based on the values of a set of
predictive attributes (features).
The process has two stages:
1 Model construction: potentially a large number of “labelled”
instances are fed to a classification technique to build a model
(classifier).
2 Model usage: once the model is constructed, it can be
deployed and used to classify “unlabelled” instances.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
What is Data Classification?
Data classification is the process of assigning a class
(labelling) to a data instance, based on the values of a set of
predictive attributes (features).
The process has two stages:
1 Model construction: potentially a large number of “labelled”
instances are fed to a classification technique to build a model
(classifier).
2 Model usage: once the model is constructed, it can be
deployed and used to classify “unlabelled” instances.
A large number of techniques have been proposed addressing
the data classification process (e.g., decision trees, artificial
neural networks, and support vector machine).
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
What is Data Classification?
Data classification is the process of assigning a class
(labelling) to a data instance, based on the values of a set of
predictive attributes (features).
The process has two stages:
1 Model construction: potentially a large number of “labelled”
instances are fed to a classification technique to build a model
(classifier).
2 Model usage: once the model is constructed, it can be
deployed and used to classify “unlabelled” instances.
A large number of techniques have been proposed addressing
the data classification process (e.g., decision trees, artificial
neural networks, and support vector machine).
Predictive accuracy has been the major concern when
designing a new classification technique, followed by time
needed for model construction and usage.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similar
procedure
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similar
procedure
Attributes (features) represented in internal nodes with their
values given on the links for tree traversal (a variation of this
exists for binary decision trees)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similar
procedure
Attributes (features) represented in internal nodes with their
values given on the links for tree traversal (a variation of this
exists for binary decision trees)
Leaf nodes are class labels
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similar
procedure
Attributes (features) represented in internal nodes with their
values given on the links for tree traversal (a variation of this
exists for binary decision trees)
Leaf nodes are class labels
Decision trees mainly vary in the goodness measure used to
find the best attribute to split on (e.g., information gain, gain
ratio, Gini index, and Chi-square)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similar
procedure
Attributes (features) represented in internal nodes with their
values given on the links for tree traversal (a variation of this
exists for binary decision trees)
Leaf nodes are class labels
Decision trees mainly vary in the goodness measure used to
find the best attribute to split on (e.g., information gain, gain
ratio, Gini index, and Chi-square)
The first attribute which is called the root is the best
attribute (according to some goodness measure) to spit on.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Decision Tree Classification Techniques
Almost all decision trees are constructed using a similar
procedure
Attributes (features) represented in internal nodes with their
values given on the links for tree traversal (a variation of this
exists for binary decision trees)
Leaf nodes are class labels
Decision trees mainly vary in the goodness measure used to
find the best attribute to split on (e.g., information gain, gain
ratio, Gini index, and Chi-square)
The first attribute which is called the root is the best
attribute (according to some goodness measure) to spit on.
An iterative process to build subtrees is followed with finding
the best attribute (attribute = value) to split on at each
iteration
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winning
class has been thoroughly investigated by machine learning
and data mining communities.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winning
class has been thoroughly investigated by machine learning
and data mining communities.
Bagging, boosting and stacking are among the major
approaches to build ensemble of classifiers.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winning
class has been thoroughly investigated by machine learning
and data mining communities.
Bagging, boosting and stacking are among the major
approaches to build ensemble of classifiers.
Bagging uses bootstrap sampling to generate diverse number
of samples in the dataset.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winning
class has been thoroughly investigated by machine learning
and data mining communities.
Bagging, boosting and stacking are among the major
approaches to build ensemble of classifiers.
Bagging uses bootstrap sampling to generate diverse number
of samples in the dataset.
Boosting builds classifiers in a sequence encouraging later
built classifiers to be expert in classifying incorrectly classified
instances from previous classifiers in the sequence.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Ensemble Classification
Combining a number of classifiers to vote towards the winning
class has been thoroughly investigated by machine learning
and data mining communities.
Bagging, boosting and stacking are among the major
approaches to build ensemble of classifiers.
Bagging uses bootstrap sampling to generate diverse number
of samples in the dataset.
Boosting builds classifiers in a sequence encouraging later
built classifiers to be expert in classifying incorrectly classified
instances from previous classifiers in the sequence.
Stacking uses a hierarchy of classifiers that generates a new
dataset for a single classifier to be built.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictive
accuracy
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictive
accuracy
There are many ways to measuring such diversity; it is not a
straightforward process
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictive
accuracy
There are many ways to measuring such diversity; it is not a
straightforward process
Regardless of the used measure, diversity has been the target
of a number of ‘diversity creation’ methods
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictive
accuracy
There are many ways to measuring such diversity; it is not a
straightforward process
Regardless of the used measure, diversity has been the target
of a number of ‘diversity creation’ methods
Bagging and boosting enforce diversity by input manipulation
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictive
accuracy
There are many ways to measuring such diversity; it is not a
straightforward process
Regardless of the used measure, diversity has been the target
of a number of ‘diversity creation’ methods
Bagging and boosting enforce diversity by input manipulation
Stacking typically imposes diversity using a number of
different classifiers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Diversity and Predictive Accuracy
Diversity among members of the ensemble is key to predictive
accuracy
There are many ways to measuring such diversity; it is not a
straightforward process
Regardless of the used measure, diversity has been the target
of a number of ‘diversity creation’ methods
Bagging and boosting enforce diversity by input manipulation
Stacking typically imposes diversity using a number of
different classifiers
Error correcting code manipulates output to create diversity
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forests: An Overview
An ensemble classification and regression technique
introduced by Leo Breiman
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forests: An Overview
An ensemble classification and regression technique
introduced by Leo Breiman
It generates a diversified ensemble of decision trees adopting
two methods:
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forests: An Overview
An ensemble classification and regression technique
introduced by Leo Breiman
It generates a diversified ensemble of decision trees adopting
two methods:
A bootstrap sample is used for the construction of each tree
(bagging), resulting in approximately 63.2% unique samples,
and the rest are repeated
At each node split, only a subset of features are drawn
randomly to assess the goodness of each feature/attribute (
√
F
or log2 F is used, where F is the total number of features)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forests: An Overview
An ensemble classification and regression technique
introduced by Leo Breiman
It generates a diversified ensemble of decision trees adopting
two methods:
A bootstrap sample is used for the construction of each tree
(bagging), resulting in approximately 63.2% unique samples,
and the rest are repeated
At each node split, only a subset of features are drawn
randomly to assess the goodness of each feature/attribute (
√
F
or log2 F is used, where F is the total number of features)
Trees are allowed to grow without pruning
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forests: An Overview
An ensemble classification and regression technique
introduced by Leo Breiman
It generates a diversified ensemble of decision trees adopting
two methods:
A bootstrap sample is used for the construction of each tree
(bagging), resulting in approximately 63.2% unique samples,
and the rest are repeated
At each node split, only a subset of features are drawn
randomly to assess the goodness of each feature/attribute (
√
F
or log2 F is used, where F is the total number of features)
Trees are allowed to grow without pruning
Typically 100 to 500 trees are used to form the ensemble
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forests: An Overview
An ensemble classification and regression technique
introduced by Leo Breiman
It generates a diversified ensemble of decision trees adopting
two methods:
A bootstrap sample is used for the construction of each tree
(bagging), resulting in approximately 63.2% unique samples,
and the rest are repeated
At each node split, only a subset of features are drawn
randomly to assess the goodness of each feature/attribute (
√
F
or log2 F is used, where F is the total number of features)
Trees are allowed to grow without pruning
Typically 100 to 500 trees are used to form the ensemble
It is now considered among the best performing classifiers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forest Tops State-of-the-art Classifiers
179 classifiers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forest Tops State-of-the-art Classifiers
179 classifiers
121 datasets (the whole UCI repository at the time of the
experiment)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Random Forest Tops State-of-the-art Classifiers
179 classifiers
121 datasets (the whole UCI repository at the time of the
experiment)
Random Forest was the first ranked, followed by SVM with
Gaussian kernel
Reference
Fernandez-Delgado, M., Cernadas, E., Barro, S., & Amorim, D.
(2014). Do we need hundreds of classifiers to solve real world
classification problems?. The Journal of Machine Learning
Research, 15(1), 3133-3181.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Data Classification
Ensemble Classification
Ensemble Diversity
Random Forests
Improving Random Forests
Source: Fawagreh, K., Gaber, M. M., & Elyan, E. (2014). Random forests: from early
developments to recent advancements. Systems Science & Control Engineering: An
Open Access Journal, 2(1), pp. 602-609.
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
How is Diversity Related to Clustering?
The aim of any clustering algorithm is to produce cohesive
clusters that are well separated
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
How is Diversity Related to Clustering?
The aim of any clustering algorithm is to produce cohesive
clusters that are well separated
A good clustering model diversifies among members of
different clusters
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
How is Diversity Related to Clustering?
The aim of any clustering algorithm is to produce cohesive
clusters that are well separated
A good clustering model diversifies among members of
different clusters
Inspired by this observation, we hypothesised that if trees in
the Random Forest are clustered, we can use a small subset
(typically one tree) from each cluster to produce a diversified
Random Forest
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
How is Diversity Related to Clustering?
The aim of any clustering algorithm is to produce cohesive
clusters that are well separated
A good clustering model diversifies among members of
different clusters
Inspired by this observation, we hypothesised that if trees in
the Random Forest are clustered, we can use a small subset
(typically one tree) from each cluster to produce a diversified
Random Forest
The benefits are two fold
An increased diversification
A smaller ensemble, leading to faster classification of
unlabelled instances
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
CLUB-DRF
We termed the method CLUster
Based Diversified Random Forests
(CLUB-DRF)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
CLUB-DRF
We termed the method CLUster
Based Diversified Random Forests
(CLUB-DRF)
Three stages are followed:
A Random Forest is induced
using the traditional method
Trees are clustered according to
their classification pattern
One or more representative are
chosen from each cluster to form
the pruned Random Forest
…....
…....
C(t1, T) C(tn, T)
t1 ……. tn
Parent RF
Training Set
Random Forest Algorithm
Clustering Algorithm
Cluster 1 Cluster k
Representative Selection
Testing Set
t1 ……. tk
CLUB-DRF
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
CLUB-DRF Settings
A number of settings are needed as follows:
The clustering algorithm used
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
CLUB-DRF Settings
A number of settings are needed as follows:
The clustering algorithm used
The number of clusters of trees
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
CLUB-DRF Settings
A number of settings are needed as follows:
The clustering algorithm used
The number of clusters of trees
The number of trees representing each cluster
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
CLUB-DRF Settings
A number of settings are needed as follows:
The clustering algorithm used
The number of clusters of trees
The number of trees representing each cluster
The criteria for choosing the representatives
Random
Best performing
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCI
repository
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
We used the following values for k: 5, 10, 15, 20, 25, 30, 35,
and 40
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
We used the following values for k: 5, 10, 15, 20, 25, 30, 35,
and 40
We used one representative tree per cluster based on the Out
Of Bag (OOB) performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Experimental Setup
We tested the technique over 15 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used k-modes to cluster the trees
We used the following values for k: 5, 10, 15, 20, 25, 30, 35,
and 40
We used one representative tree per cluster based on the Out
Of Bag (OOB) performance
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Summarised Results
0
3
6
9
10 20 30 40
Size (Number of Trees)
NumberofDatasets
Method
CLUB−DRF
RF
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Pruning Results
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
CLUB-DRF
Experimental Study
Sample of Detailed Results
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
How is Diversity Related to Outlier Detection?
Outliers are out of the norm instances that are thought to be
generated by a different mechanism
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
How is Diversity Related to Outlier Detection?
Outliers are out of the norm instances that are thought to be
generated by a different mechanism
By analogy, trees that are significantly different (diverse) from
the set of other trees in the Random Forest can be seen as
outliers
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
How is Diversity Related to Outlier Detection?
Outliers are out of the norm instances that are thought to be
generated by a different mechanism
By analogy, trees that are significantly different (diverse) from
the set of other trees in the Random Forest can be seen as
outliers
Local Outlier Factor (LOF) assigns a real number to each
instance to represent its peculiarity
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
How is Diversity Related to Outlier Detection?
Outliers are out of the norm instances that are thought to be
generated by a different mechanism
By analogy, trees that are significantly different (diverse) from
the set of other trees in the Random Forest can be seen as
outliers
Local Outlier Factor (LOF) assigns a real number to each
instance to represent its peculiarity
Inspired by this analogy, we hypothesised that a diverse
ensemble of trees can be formed using outlier detection
method
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
LOFB-DRF
We termed the method
Local Outlier Factor Based
Diversified Random Forests
(LOFB-DRF)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
LOFB-DRF
We termed the method
Local Outlier Factor Based
Diversified Random Forests
(LOFB-DRF)
It follows similar steps to
CLUB-DRF
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
LOFB-DRF
We termed the method
Local Outlier Factor Based
Diversified Random Forests
(LOFB-DRF)
It follows similar steps to
CLUB-DRF
Each tree is assigned LOF
value
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
LOFB-DRF
We termed the method
Local Outlier Factor Based
Diversified Random Forests
(LOFB-DRF)
It follows similar steps to
CLUB-DRF
Each tree is assigned LOF
value
Trees are then chosen
according to two criteria
Predictive accuracy
LOF value
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
LOFB-DRF Settings
A number of settings are needed as follows:
LOF setting of the number of nearest neighbours
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
LOFB-DRF Settings
A number of settings are needed as follows:
LOF setting of the number of nearest neighbours
Options of combining LOF with predictive accuracy
Using LOF only ruling out predictive accuracy
Using a combination strategy
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF) × accuracy] for each tree,
where normal(LOF), accuracy ∈ [0, 1]
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF) × accuracy] for each tree,
where normal(LOF), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF) × accuracy] for each tree,
where normal(LOF), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
We used the following values for representative trees: 5, 10,
15, 20, 25, 30, 35, and 40
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Experimental Setup
We tested the technique over 10 datasets from the UCI
repository
We generated 500 trees for the main Random Forest
We used LOF with 40 nearest neighbours
We used [rank = normal(LOF) × accuracy] for each tree,
where normal(LOF), accuracy ∈ [0, 1]
Trees with the higher rank are chosen as representatives
We used the following values for representative trees: 5, 10,
15, 20, 25, 30, 35, and 40
Repeated hold-out method used to estimate the performance
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Summarised Results
0
2
4
6
10 20 30 40
Size (Number of Trees)
NumberofDatasets
Method
LOF−DRF
RF
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Pruning Results
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
LOFB-DRF
Experimental Study
Sample of Detailed Results
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Summary
Random Forest has proved superiority over the last few years
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Summary
Random Forest has proved superiority over the last few years
Two methods were presented in this talk aiming at
diversifying and pruning Random Forests
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Summary
Random Forest has proved superiority over the last few years
Two methods were presented in this talk aiming at
diversifying and pruning Random Forests
Results showed the potential of these two methods to further
enhance the predictive accuracy of the method
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Summary
Random Forest has proved superiority over the last few years
Two methods were presented in this talk aiming at
diversifying and pruning Random Forests
Results showed the potential of these two methods to further
enhance the predictive accuracy of the method
The high level of pruning makes these techniques candidates
for real-time applications, as the number of trees to be
traversed are significantly reduced
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives from
each cluster (e.g., varying the number of representatives per
cluster)
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives from
each cluster (e.g., varying the number of representatives per
cluster)
Using other clustering techniques
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives from
each cluster (e.g., varying the number of representatives per
cluster)
Using other clustering techniques
In LOFB-DRF:
Exploring other options for combining LOF value and
predictive accuracy
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives from
each cluster (e.g., varying the number of representatives per
cluster)
Using other clustering techniques
In LOFB-DRF:
Exploring other options for combining LOF value and
predictive accuracy
Using LOF and predictive accuracy for the choice of tree
representatives in each cluster
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Future Work
In CLUB-DRF:
Exploring other methods for choosing tree representatives from
each cluster (e.g., varying the number of representatives per
cluster)
Using other clustering techniques
In LOFB-DRF:
Exploring other options for combining LOF value and
predictive accuracy
Using LOF and predictive accuracy for the choice of tree
representatives in each cluster
Applying both methods to other ensemble classification
techniques
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
Background
Clustering and Ensemble Diversity
Outlier Scoring and Ensemble Diversity
Summary and Future Work
Summary
Future Work
Q & A
Thanks for listening!
Contact Details
Dr Mohamed Medhat Gaber
E-mail: m.gaber1@rgu.ac.uk
Webpage: http://mohamedmgaber.weebly.com/
LinkedIn: https://www.linkedin.com/profile/view?id=21808352
Twitter: https://twitter.com/mmmgaber
ResearchGate:
https://www.researchgate.net/profile/Mohamed Gaber16?ev=prf highl
Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest

More Related Content

What's hot

Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
Decision tree
Decision treeDecision tree
Decision tree
ShraddhaPandey45
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
Marc Garcia
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
Viet-Trung TRAN
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning Method
Honglin Yu
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest
Rupak Roy
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
Rashid Ansari
 
Decision tree
Decision treeDecision tree
Decision tree
Karan Deopura
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning Algorithms
Rupak Roy
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Md. Ariful Hoque
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision Trees
Rupak Roy
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
Rupak Roy
 
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Edureka!
 
Decision tree
Decision treeDecision tree
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
Shweta Ghate
 

What's hot (20)

Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Decision tree
Decision treeDecision tree
Decision tree
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning Method
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest Machine Learning Feature Selection - Random Forest
Machine Learning Feature Selection - Random Forest
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Decision tree
Decision treeDecision tree
Decision tree
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning Algorithms
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Machine Learning - Decision Trees
Machine Learning - Decision TreesMachine Learning - Decision Trees
Machine Learning - Decision Trees
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
Decision Tree Algorithm | Decision Tree in Python | Machine Learning Algorith...
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 

Viewers also liked

Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahout
Gaurav Kasliwal
 
Sdforum 11-04-2010
Sdforum 11-04-2010Sdforum 11-04-2010
Sdforum 11-04-2010
Ted Dunning
 
Orthogonal porjection in statistics
Orthogonal porjection in statisticsOrthogonal porjection in statistics
Orthogonal porjection in statistics
Sahidul Islam
 
Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...
Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...
Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...
IT Arena
 
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferencePruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inference
Kaushalya Madhawa
 
AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...
AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...
AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...
GeeksLab Odessa
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Data
tuxette
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Gilles Louppe
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
Projection In Computer Graphics
Projection In Computer GraphicsProjection In Computer Graphics
Projection In Computer Graphics
Sanu Philip
 
Latent factor models for Collaborative Filtering
Latent factor models for Collaborative FilteringLatent factor models for Collaborative Filtering
Latent factor models for Collaborative Filtering
sscdotopen
 
Introduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringIntroduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative Filtering
DKALab
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in Mahout
Data Science London
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
YONG ZHENG
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
Lei Guo
 

Viewers also liked (15)

Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahout
 
Sdforum 11-04-2010
Sdforum 11-04-2010Sdforum 11-04-2010
Sdforum 11-04-2010
 
Orthogonal porjection in statistics
Orthogonal porjection in statisticsOrthogonal porjection in statistics
Orthogonal porjection in statistics
 
Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...
Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...
Build Your Strategy and Projections with Azure Machine Learning (Sergey Popla...
 
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferencePruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inference
 
AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...
AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...
AI&BigData Lab. Маргарита Остапчук "Алгоритмы в Azure Machine Learning и где ...
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Data
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Projection In Computer Graphics
Projection In Computer GraphicsProjection In Computer Graphics
Projection In Computer Graphics
 
Latent factor models for Collaborative Filtering
Latent factor models for Collaborative FilteringLatent factor models for Collaborative Filtering
Latent factor models for Collaborative Filtering
 
Introduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative FilteringIntroduction to Matrix Factorization Methods Collaborative Filtering
Introduction to Matrix Factorization Methods Collaborative Filtering
 
Simple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in MahoutSimple Matrix Factorization for Recommendation in Mahout
Simple Matrix Factorization for Recommendation in Mahout
 
Matrix Factorization In Recommender Systems
Matrix Factorization In Recommender SystemsMatrix Factorization In Recommender Systems
Matrix Factorization In Recommender Systems
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 

Similar to Unsupervised Learning Techniques to Diversifying and Pruning Random Forest

Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniques
ijsrd.com
 
A Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification TechniquesA Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification Techniques
ijsrd.com
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
Kaviya452563
 
Random Forest
Random ForestRandom Forest
Random Forest
Abdullah al Mamun
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
ssuser33da69
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.ppt
HODECE21
 
ensemble learning
ensemble learningensemble learning
ensemble learning
butest
 
Multiple Classifier Systems
Multiple Classifier SystemsMultiple Classifier Systems
Multiple Classifier Systems
Farzad Vasheghani Farahani
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
Vidya sagar Sharma
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
Esteban Ribero
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
Kai Koenig
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
adil raja
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
Editor IJCATR
 
Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers
ijcsa
 
Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data
Shalin Hai-Jew
 
Talk
TalkTalk
Talk
sumit621
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Yuchen Zhao
 
G44083642
G44083642G44083642
G44083642
IJERA Editor
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Daniel Roggen
 
voice and speech recognition using machine learning
voice and speech recognition using machine learningvoice and speech recognition using machine learning
voice and speech recognition using machine learning
MohammedWahhab4
 

Similar to Unsupervised Learning Techniques to Diversifying and Pruning Random Forest (20)

Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniques
 
A Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification TechniquesA Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification Techniques
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
 
Random Forest
Random ForestRandom Forest
Random Forest
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
 
clustering_classification.ppt
clustering_classification.pptclustering_classification.ppt
clustering_classification.ppt
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Multiple Classifier Systems
Multiple Classifier SystemsMultiple Classifier Systems
Multiple Classifier Systems
 
Mis End Term Exam Theory Concepts
Mis End Term Exam Theory ConceptsMis End Term Exam Theory Concepts
Mis End Term Exam Theory Concepts
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Data Warehousing
Data WarehousingData Warehousing
Data Warehousing
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
 
Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers Regularized Weighted Ensemble of Deep Classifiers
Regularized Weighted Ensemble of Deep Classifiers
 
Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data Using Decision Trees to Analyze Online Learning Data
Using Decision Trees to Analyze Online Learning Data
 
Talk
TalkTalk
Talk
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world Challenges
 
G44083642
G44083642G44083642
G44083642
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
 
voice and speech recognition using machine learning
voice and speech recognition using machine learningvoice and speech recognition using machine learning
voice and speech recognition using machine learning
 

Recently uploaded

一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 

Recently uploaded (20)

一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 

Unsupervised Learning Techniques to Diversifying and Pruning Random Forest

  • 1. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Unsupervised Learning Techniques to Diversifying and Pruning Random Forest Dr Mohamed Medhat Gaber School of Computing Science and Digital Media Robert Gordon University 27 January 2015 Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 2. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Acknowledgement Work done in collaboration with PhD student Khaled Fawagreh and co-supervisor Dr Eyad Elyan Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 3. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work 1 Background Data Classification Ensemble Classification Ensemble Diversity Random Forests 2 Clustering and Ensemble Diversity CLUB-DRF Experimental Study 3 Outlier Scoring and Ensemble Diversity LOFB-DRF Experimental Study 4 Summary and Future Work Summary Future Work Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 4. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests What is Data Classification? Data classification is the process of assigning a class (labelling) to a data instance, based on the values of a set of predictive attributes (features). Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 5. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests What is Data Classification? Data classification is the process of assigning a class (labelling) to a data instance, based on the values of a set of predictive attributes (features). The process has two stages: 1 Model construction: potentially a large number of “labelled” instances are fed to a classification technique to build a model (classifier). 2 Model usage: once the model is constructed, it can be deployed and used to classify “unlabelled” instances. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 6. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests What is Data Classification? Data classification is the process of assigning a class (labelling) to a data instance, based on the values of a set of predictive attributes (features). The process has two stages: 1 Model construction: potentially a large number of “labelled” instances are fed to a classification technique to build a model (classifier). 2 Model usage: once the model is constructed, it can be deployed and used to classify “unlabelled” instances. A large number of techniques have been proposed addressing the data classification process (e.g., decision trees, artificial neural networks, and support vector machine). Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 7. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests What is Data Classification? Data classification is the process of assigning a class (labelling) to a data instance, based on the values of a set of predictive attributes (features). The process has two stages: 1 Model construction: potentially a large number of “labelled” instances are fed to a classification technique to build a model (classifier). 2 Model usage: once the model is constructed, it can be deployed and used to classify “unlabelled” instances. A large number of techniques have been proposed addressing the data classification process (e.g., decision trees, artificial neural networks, and support vector machine). Predictive accuracy has been the major concern when designing a new classification technique, followed by time needed for model construction and usage. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 8. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Decision Tree Classification Techniques Almost all decision trees are constructed using a similar procedure Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 9. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Decision Tree Classification Techniques Almost all decision trees are constructed using a similar procedure Attributes (features) represented in internal nodes with their values given on the links for tree traversal (a variation of this exists for binary decision trees) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 10. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Decision Tree Classification Techniques Almost all decision trees are constructed using a similar procedure Attributes (features) represented in internal nodes with their values given on the links for tree traversal (a variation of this exists for binary decision trees) Leaf nodes are class labels Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 11. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Decision Tree Classification Techniques Almost all decision trees are constructed using a similar procedure Attributes (features) represented in internal nodes with their values given on the links for tree traversal (a variation of this exists for binary decision trees) Leaf nodes are class labels Decision trees mainly vary in the goodness measure used to find the best attribute to split on (e.g., information gain, gain ratio, Gini index, and Chi-square) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 12. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Decision Tree Classification Techniques Almost all decision trees are constructed using a similar procedure Attributes (features) represented in internal nodes with their values given on the links for tree traversal (a variation of this exists for binary decision trees) Leaf nodes are class labels Decision trees mainly vary in the goodness measure used to find the best attribute to split on (e.g., information gain, gain ratio, Gini index, and Chi-square) The first attribute which is called the root is the best attribute (according to some goodness measure) to spit on. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 13. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Decision Tree Classification Techniques Almost all decision trees are constructed using a similar procedure Attributes (features) represented in internal nodes with their values given on the links for tree traversal (a variation of this exists for binary decision trees) Leaf nodes are class labels Decision trees mainly vary in the goodness measure used to find the best attribute to split on (e.g., information gain, gain ratio, Gini index, and Chi-square) The first attribute which is called the root is the best attribute (according to some goodness measure) to spit on. An iterative process to build subtrees is followed with finding the best attribute (attribute = value) to split on at each iteration Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 14. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Ensemble Classification Combining a number of classifiers to vote towards the winning class has been thoroughly investigated by machine learning and data mining communities. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 15. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Ensemble Classification Combining a number of classifiers to vote towards the winning class has been thoroughly investigated by machine learning and data mining communities. Bagging, boosting and stacking are among the major approaches to build ensemble of classifiers. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 16. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Ensemble Classification Combining a number of classifiers to vote towards the winning class has been thoroughly investigated by machine learning and data mining communities. Bagging, boosting and stacking are among the major approaches to build ensemble of classifiers. Bagging uses bootstrap sampling to generate diverse number of samples in the dataset. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 17. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Ensemble Classification Combining a number of classifiers to vote towards the winning class has been thoroughly investigated by machine learning and data mining communities. Bagging, boosting and stacking are among the major approaches to build ensemble of classifiers. Bagging uses bootstrap sampling to generate diverse number of samples in the dataset. Boosting builds classifiers in a sequence encouraging later built classifiers to be expert in classifying incorrectly classified instances from previous classifiers in the sequence. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 18. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Ensemble Classification Combining a number of classifiers to vote towards the winning class has been thoroughly investigated by machine learning and data mining communities. Bagging, boosting and stacking are among the major approaches to build ensemble of classifiers. Bagging uses bootstrap sampling to generate diverse number of samples in the dataset. Boosting builds classifiers in a sequence encouraging later built classifiers to be expert in classifying incorrectly classified instances from previous classifiers in the sequence. Stacking uses a hierarchy of classifiers that generates a new dataset for a single classifier to be built. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 19. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Diversity and Predictive Accuracy Diversity among members of the ensemble is key to predictive accuracy Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 20. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Diversity and Predictive Accuracy Diversity among members of the ensemble is key to predictive accuracy There are many ways to measuring such diversity; it is not a straightforward process Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 21. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Diversity and Predictive Accuracy Diversity among members of the ensemble is key to predictive accuracy There are many ways to measuring such diversity; it is not a straightforward process Regardless of the used measure, diversity has been the target of a number of ‘diversity creation’ methods Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 22. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Diversity and Predictive Accuracy Diversity among members of the ensemble is key to predictive accuracy There are many ways to measuring such diversity; it is not a straightforward process Regardless of the used measure, diversity has been the target of a number of ‘diversity creation’ methods Bagging and boosting enforce diversity by input manipulation Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 23. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Diversity and Predictive Accuracy Diversity among members of the ensemble is key to predictive accuracy There are many ways to measuring such diversity; it is not a straightforward process Regardless of the used measure, diversity has been the target of a number of ‘diversity creation’ methods Bagging and boosting enforce diversity by input manipulation Stacking typically imposes diversity using a number of different classifiers Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 24. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Diversity and Predictive Accuracy Diversity among members of the ensemble is key to predictive accuracy There are many ways to measuring such diversity; it is not a straightforward process Regardless of the used measure, diversity has been the target of a number of ‘diversity creation’ methods Bagging and boosting enforce diversity by input manipulation Stacking typically imposes diversity using a number of different classifiers Error correcting code manipulates output to create diversity Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 25. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forests: An Overview An ensemble classification and regression technique introduced by Leo Breiman Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 26. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forests: An Overview An ensemble classification and regression technique introduced by Leo Breiman It generates a diversified ensemble of decision trees adopting two methods: Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 27. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forests: An Overview An ensemble classification and regression technique introduced by Leo Breiman It generates a diversified ensemble of decision trees adopting two methods: A bootstrap sample is used for the construction of each tree (bagging), resulting in approximately 63.2% unique samples, and the rest are repeated At each node split, only a subset of features are drawn randomly to assess the goodness of each feature/attribute ( √ F or log2 F is used, where F is the total number of features) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 28. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forests: An Overview An ensemble classification and regression technique introduced by Leo Breiman It generates a diversified ensemble of decision trees adopting two methods: A bootstrap sample is used for the construction of each tree (bagging), resulting in approximately 63.2% unique samples, and the rest are repeated At each node split, only a subset of features are drawn randomly to assess the goodness of each feature/attribute ( √ F or log2 F is used, where F is the total number of features) Trees are allowed to grow without pruning Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 29. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forests: An Overview An ensemble classification and regression technique introduced by Leo Breiman It generates a diversified ensemble of decision trees adopting two methods: A bootstrap sample is used for the construction of each tree (bagging), resulting in approximately 63.2% unique samples, and the rest are repeated At each node split, only a subset of features are drawn randomly to assess the goodness of each feature/attribute ( √ F or log2 F is used, where F is the total number of features) Trees are allowed to grow without pruning Typically 100 to 500 trees are used to form the ensemble Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 30. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forests: An Overview An ensemble classification and regression technique introduced by Leo Breiman It generates a diversified ensemble of decision trees adopting two methods: A bootstrap sample is used for the construction of each tree (bagging), resulting in approximately 63.2% unique samples, and the rest are repeated At each node split, only a subset of features are drawn randomly to assess the goodness of each feature/attribute ( √ F or log2 F is used, where F is the total number of features) Trees are allowed to grow without pruning Typically 100 to 500 trees are used to form the ensemble It is now considered among the best performing classifiers Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 31. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forest Tops State-of-the-art Classifiers 179 classifiers Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 32. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forest Tops State-of-the-art Classifiers 179 classifiers 121 datasets (the whole UCI repository at the time of the experiment) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 33. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Random Forest Tops State-of-the-art Classifiers 179 classifiers 121 datasets (the whole UCI repository at the time of the experiment) Random Forest was the first ranked, followed by SVM with Gaussian kernel Reference Fernandez-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems?. The Journal of Machine Learning Research, 15(1), 3133-3181. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 34. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Data Classification Ensemble Classification Ensemble Diversity Random Forests Improving Random Forests Source: Fawagreh, K., Gaber, M. M., & Elyan, E. (2014). Random forests: from early developments to recent advancements. Systems Science & Control Engineering: An Open Access Journal, 2(1), pp. 602-609. Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 35. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study How is Diversity Related to Clustering? The aim of any clustering algorithm is to produce cohesive clusters that are well separated Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 36. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study How is Diversity Related to Clustering? The aim of any clustering algorithm is to produce cohesive clusters that are well separated A good clustering model diversifies among members of different clusters Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 37. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study How is Diversity Related to Clustering? The aim of any clustering algorithm is to produce cohesive clusters that are well separated A good clustering model diversifies among members of different clusters Inspired by this observation, we hypothesised that if trees in the Random Forest are clustered, we can use a small subset (typically one tree) from each cluster to produce a diversified Random Forest Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 38. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study How is Diversity Related to Clustering? The aim of any clustering algorithm is to produce cohesive clusters that are well separated A good clustering model diversifies among members of different clusters Inspired by this observation, we hypothesised that if trees in the Random Forest are clustered, we can use a small subset (typically one tree) from each cluster to produce a diversified Random Forest The benefits are two fold An increased diversification A smaller ensemble, leading to faster classification of unlabelled instances Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 39. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study CLUB-DRF We termed the method CLUster Based Diversified Random Forests (CLUB-DRF) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 40. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study CLUB-DRF We termed the method CLUster Based Diversified Random Forests (CLUB-DRF) Three stages are followed: A Random Forest is induced using the traditional method Trees are clustered according to their classification pattern One or more representative are chosen from each cluster to form the pruned Random Forest ….... ….... C(t1, T) C(tn, T) t1 ……. tn Parent RF Training Set Random Forest Algorithm Clustering Algorithm Cluster 1 Cluster k Representative Selection Testing Set t1 ……. tk CLUB-DRF Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 41. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study CLUB-DRF Settings A number of settings are needed as follows: The clustering algorithm used Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 42. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study CLUB-DRF Settings A number of settings are needed as follows: The clustering algorithm used The number of clusters of trees Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 43. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study CLUB-DRF Settings A number of settings are needed as follows: The clustering algorithm used The number of clusters of trees The number of trees representing each cluster Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 44. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study CLUB-DRF Settings A number of settings are needed as follows: The clustering algorithm used The number of clusters of trees The number of trees representing each cluster The criteria for choosing the representatives Random Best performing Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 45. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Experimental Setup We tested the technique over 15 datasets from the UCI repository Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 46. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Experimental Setup We tested the technique over 15 datasets from the UCI repository We generated 500 trees for the main Random Forest Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 47. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Experimental Setup We tested the technique over 15 datasets from the UCI repository We generated 500 trees for the main Random Forest We used k-modes to cluster the trees Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 48. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Experimental Setup We tested the technique over 15 datasets from the UCI repository We generated 500 trees for the main Random Forest We used k-modes to cluster the trees We used the following values for k: 5, 10, 15, 20, 25, 30, 35, and 40 Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 49. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Experimental Setup We tested the technique over 15 datasets from the UCI repository We generated 500 trees for the main Random Forest We used k-modes to cluster the trees We used the following values for k: 5, 10, 15, 20, 25, 30, 35, and 40 We used one representative tree per cluster based on the Out Of Bag (OOB) performance Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 50. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Experimental Setup We tested the technique over 15 datasets from the UCI repository We generated 500 trees for the main Random Forest We used k-modes to cluster the trees We used the following values for k: 5, 10, 15, 20, 25, 30, 35, and 40 We used one representative tree per cluster based on the Out Of Bag (OOB) performance Repeated hold-out method used to estimate the performance Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 51. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Summarised Results 0 3 6 9 10 20 30 40 Size (Number of Trees) NumberofDatasets Method CLUB−DRF RF Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 52. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Pruning Results Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 53. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work CLUB-DRF Experimental Study Sample of Detailed Results Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 54. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study How is Diversity Related to Outlier Detection? Outliers are out of the norm instances that are thought to be generated by a different mechanism Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 55. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study How is Diversity Related to Outlier Detection? Outliers are out of the norm instances that are thought to be generated by a different mechanism By analogy, trees that are significantly different (diverse) from the set of other trees in the Random Forest can be seen as outliers Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 56. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study How is Diversity Related to Outlier Detection? Outliers are out of the norm instances that are thought to be generated by a different mechanism By analogy, trees that are significantly different (diverse) from the set of other trees in the Random Forest can be seen as outliers Local Outlier Factor (LOF) assigns a real number to each instance to represent its peculiarity Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 57. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study How is Diversity Related to Outlier Detection? Outliers are out of the norm instances that are thought to be generated by a different mechanism By analogy, trees that are significantly different (diverse) from the set of other trees in the Random Forest can be seen as outliers Local Outlier Factor (LOF) assigns a real number to each instance to represent its peculiarity Inspired by this analogy, we hypothesised that a diverse ensemble of trees can be formed using outlier detection method Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 58. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study LOFB-DRF We termed the method Local Outlier Factor Based Diversified Random Forests (LOFB-DRF) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 59. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study LOFB-DRF We termed the method Local Outlier Factor Based Diversified Random Forests (LOFB-DRF) It follows similar steps to CLUB-DRF Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 60. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study LOFB-DRF We termed the method Local Outlier Factor Based Diversified Random Forests (LOFB-DRF) It follows similar steps to CLUB-DRF Each tree is assigned LOF value Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 61. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study LOFB-DRF We termed the method Local Outlier Factor Based Diversified Random Forests (LOFB-DRF) It follows similar steps to CLUB-DRF Each tree is assigned LOF value Trees are then chosen according to two criteria Predictive accuracy LOF value Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 62. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study LOFB-DRF Settings A number of settings are needed as follows: LOF setting of the number of nearest neighbours Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 63. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study LOFB-DRF Settings A number of settings are needed as follows: LOF setting of the number of nearest neighbours Options of combining LOF with predictive accuracy Using LOF only ruling out predictive accuracy Using a combination strategy Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 64. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 65. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository We generated 500 trees for the main Random Forest Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 66. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository We generated 500 trees for the main Random Forest We used LOF with 40 nearest neighbours Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 67. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository We generated 500 trees for the main Random Forest We used LOF with 40 nearest neighbours We used [rank = normal(LOF) × accuracy] for each tree, where normal(LOF), accuracy ∈ [0, 1] Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 68. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository We generated 500 trees for the main Random Forest We used LOF with 40 nearest neighbours We used [rank = normal(LOF) × accuracy] for each tree, where normal(LOF), accuracy ∈ [0, 1] Trees with the higher rank are chosen as representatives Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 69. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository We generated 500 trees for the main Random Forest We used LOF with 40 nearest neighbours We used [rank = normal(LOF) × accuracy] for each tree, where normal(LOF), accuracy ∈ [0, 1] Trees with the higher rank are chosen as representatives We used the following values for representative trees: 5, 10, 15, 20, 25, 30, 35, and 40 Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 70. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Experimental Setup We tested the technique over 10 datasets from the UCI repository We generated 500 trees for the main Random Forest We used LOF with 40 nearest neighbours We used [rank = normal(LOF) × accuracy] for each tree, where normal(LOF), accuracy ∈ [0, 1] Trees with the higher rank are chosen as representatives We used the following values for representative trees: 5, 10, 15, 20, 25, 30, 35, and 40 Repeated hold-out method used to estimate the performance Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 71. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Summarised Results 0 2 4 6 10 20 30 40 Size (Number of Trees) NumberofDatasets Method LOF−DRF RF Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 72. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Pruning Results Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 73. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work LOFB-DRF Experimental Study Sample of Detailed Results Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 74. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Summary Random Forest has proved superiority over the last few years Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 75. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Summary Random Forest has proved superiority over the last few years Two methods were presented in this talk aiming at diversifying and pruning Random Forests Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 76. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Summary Random Forest has proved superiority over the last few years Two methods were presented in this talk aiming at diversifying and pruning Random Forests Results showed the potential of these two methods to further enhance the predictive accuracy of the method Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 77. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Summary Random Forest has proved superiority over the last few years Two methods were presented in this talk aiming at diversifying and pruning Random Forests Results showed the potential of these two methods to further enhance the predictive accuracy of the method The high level of pruning makes these techniques candidates for real-time applications, as the number of trees to be traversed are significantly reduced Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 78. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Future Work In CLUB-DRF: Exploring other methods for choosing tree representatives from each cluster (e.g., varying the number of representatives per cluster) Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 79. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Future Work In CLUB-DRF: Exploring other methods for choosing tree representatives from each cluster (e.g., varying the number of representatives per cluster) Using other clustering techniques Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 80. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Future Work In CLUB-DRF: Exploring other methods for choosing tree representatives from each cluster (e.g., varying the number of representatives per cluster) Using other clustering techniques In LOFB-DRF: Exploring other options for combining LOF value and predictive accuracy Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 81. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Future Work In CLUB-DRF: Exploring other methods for choosing tree representatives from each cluster (e.g., varying the number of representatives per cluster) Using other clustering techniques In LOFB-DRF: Exploring other options for combining LOF value and predictive accuracy Using LOF and predictive accuracy for the choice of tree representatives in each cluster Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 82. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Future Work In CLUB-DRF: Exploring other methods for choosing tree representatives from each cluster (e.g., varying the number of representatives per cluster) Using other clustering techniques In LOFB-DRF: Exploring other options for combining LOF value and predictive accuracy Using LOF and predictive accuracy for the choice of tree representatives in each cluster Applying both methods to other ensemble classification techniques Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest
  • 83. Background Clustering and Ensemble Diversity Outlier Scoring and Ensemble Diversity Summary and Future Work Summary Future Work Q & A Thanks for listening! Contact Details Dr Mohamed Medhat Gaber E-mail: m.gaber1@rgu.ac.uk Webpage: http://mohamedmgaber.weebly.com/ LinkedIn: https://www.linkedin.com/profile/view?id=21808352 Twitter: https://twitter.com/mmmgaber ResearchGate: https://www.researchgate.net/profile/Mohamed Gaber16?ev=prf highl Dr Mohamed Medhat Gaber Diversifying and Pruning Random Forest