Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Robust Ensemble Classifier Combination Based on Noise
Removal with One-Class SVM
Ferhat Özgür Çatak
ozgur.catak@tubitak.gov.tr
TÜB˙ITAK - B˙ILGEM - Cyber Security Institute
Kocaeli, Turkey
Nov. 10 2015
22nd International Conference on Neural Information Processing
(ICONIP2015)
Ferhat Özgür Çatak ozgur.catak@tubitak.gov.tr TÜB˙ITAK - B˙ILGEM - Cyber Security Institute Kocaeli, TurkeyRobust Ensemble Classifier Combination Based on Noise Removal with One-Class S

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Table of Contents
1 Introduction
Noise
Contributions
2 Preliminaries
One-Class SVM
AdaBoost
Data Partitioning Strategies
3 Proposed Approach
Basic Idea
Analysis of the proposed algorithm
4 Experiments
Experimental Setup
Eﬀect of Noise Removing on Input Matrix
Simulation Results
5 Conclusions
Conclusions

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Noise
Contributions
Table of Contents
1 Introduction
Noise
Contributions
2 Preliminaries
One-Class SVM
AdaBoost
3 Proposed Approach
Basic Idea
4 Experiments
Experimental Setup
Simulation Results
5 Conclusions
Conclusions

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Noise
Contributions
Noise in Machine Learning I
Definition
Noise: random error or variance in a measured variable
Noise is ”irrelevant or meaningless data”
Real-world data are affected by several components; the presence of noise
is a key factor.
Data in the real world is noisy, mainly due to:
With Big Data technologies, we store every type of data
Messages, images, tweets etc.
80% of the information generated today is of an unstructured variety 1
.
Noisy data unnecessarily increases the amount of storage space required
and can also adversely affect the results of any machine learning
algorithm.

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Noise
Contributions
Noise in Machine Learning II
The performance of the
classifiers will heavily
depend on
Quality of the
training data
Robustness against
noise of the
classifier itself
Classification
Maximum Accuracy
Data Quality
Corruptions
Learning Algorithm
Figure: The performance of a classifier
1Forbes/Tech, http://www.forbes.com/sites/ciocentral/2013/02/04/big-data-big-hype/

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Noise
Contributions
Contributions
Our Aim :
Partition the entire data set into sub-datasets to reduce the training time
The detection and removal of noise that is the result of an imperfect data
collection process
In literature, One-Class SVM is a solution to create Representational
Model of a Data Set.
Improve the classiﬁcation accuracy using ensemble method (AdaBoost)
For many classiﬁcation algorithms, AdaBoost results in dramatic
improvements in performance.

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
One-Class SVM
AdaBoost
Table of Contents
1 Introduction
Noise
Contributions
2 Preliminaries
One-Class SVM
AdaBoost
3 Proposed Approach
Basic Idea
4 Experiments
Experimental Setup
Simulation Results
5 Conclusions
Conclusions

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
One-Class SVM
AdaBoost
One-Class SVM I
SVM method is used to find
classification models using
the maximum margin
separating hyper plane.
Schölkopf et al. proposed a
training methodology that
handles only one class
classification called as
”one-class” classification.
Basically,
The method finds soft
boundaries of the data
set
Then, model determines
whether new instance
belongs to this dataset
or not
−1 0 1 2 3 4 5
X1
−1
0
1
2
3
4
5
X2
Origin
-1
+1
Decision
Boundary
One-Class SVM
Decision Boundary
Training Instances
Figure: One-class SVM. The origin, (0, 0), is the single
instance with label −1.

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
One-Class SVM
AdaBoost
One-Class SVM II
Input instances : S = {(xi , yi )|xi ∈ Rn
, y ∈ {1, . . . , K}}m
i=1
Kernel function : φ : X → H
minw,ξ,ρ
1
2
||w||2
+
1
mC i
ξi − ρ
subject to
(w.φ(xi )) ≥ ρ − ξi
ξi ≥ 0, ∀i = 1, . . . , m
(1)

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
One-Class SVM
AdaBoost
AdaBoost
AdaBoost
AdaBoost (adaptive boosting) is an ensemble learning algorithm that can
be used for classification or regression
Adaboost improves the accuracy performance of learning algorithms
Given a space of feature vectors X and two possible class labels,
y ∈ {−1, +1}, AdaBoost goal is to learn a strong classifier H(x) as a
weighted ensemble of weak classifiers ht(x) predicting the label of any
instance x ∈ X.
AdaBoost is an algorithm for constructing a strong classifier as linear
combination
H(x) =
T
t=1
αtht(x)
of simple weak classifiers ht(x)
ht(x) : weak classifier/hypothesis
H(x) = sign(f (x)) : strong or final classifier/hypothesis

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
One-Class SVM
AdaBoost
Data Partitioning
In explatory data analysis, partition the large-scale data sets is a general
need.
Computationally infeasible data sets.
Data partitioning reasons:
Diversity : Uncorrelated base classifiers 2 3
Complexity : reducing the input complexity of large-scale data sets 4
Specific : build classifier models for the specific part of the input instances 5
.
2Kuncheva, Ludmila I. ”Using diversity measures for generating error-correcting output codes in classifier ensembles.” Pattern Recognition Letters
26.1 (2005): 83-90.
3Dara, Rozita, Masoud Makrehchi, and Mohamed S. Kamel. ”Filter-based data partitioning for training multiple classifier systems.” Knowledge and
Data Engineering, IEEE Transactions on 22.4 (2010): 508-522.
4Chawla, Nitesh V., et al. ”Distributed learning with bagging-like performance.” Pattern recognition letters 24.1 (2003): 455-471.
5Woods, Kevin, Kevin Bowyer, and W. Philip Kegelmeyer Jr. ”Combination of multiple classifiers using local accuracy estimates.” Computer Vision
and Pattern Recognition, 1996. Proceedings CVPR’96, 1996 IEEE Computer Society Conference on. IEEE, 1996.

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Basic Idea
Table of Contents
1 Introduction
Noise
Contributions
2 Preliminaries
One-Class SVM
AdaBoost
3 Proposed Approach
Basic Idea
4 Experiments
Experimental Setup
Simulation Results
5 Conclusions
Conclusions

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Basic Idea
Basic Idea
Main Objective
Partition the input data set into sub-data sets, (Xm, Ym)
Pre-processing: Noise removing is applied to each individual sub-data set.
Create local classifier ensembles for each sub data chunk
Classifier combination: Weighted voting method is used to combine the
each ensemble classifier
Sub data set: X1
Sub data set: X2
Sub data set: Xm
...
X1 → X1,clean
X2 → X2,clean
Xm → Xm,clean
H(X1,clean) → Y
H(X2,clean) → Y
H(Xm,clean) → Y
Ensemble Model Building:
... ...
InputDataSet:
S={(xi,yi)|xi∈Rn
,y∈{1,...,K}}m
i=1
EnsembleModelsCombination
Noise Removing
Figure: Overall of proposed approach.

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Basic Idea
Ensemble methods of neural networks gets better accuracy performance
over unseen examples 6
Proposed method :
For each sub-data set, there is a set of classifier functions (ensemble
classifier), H(m)
H(m)
(x) = arg max
k
M
t=1
αtht(x) (2)
combined into one single classification model, ˆH(x)
ˆH(x) = arg max
k
m
i=1
βmH(m)
(x) (3)
where βm is the accuracy of the ensemble classifier m.
6Krogh, Anders, and Jesper Vedelsby. ”Neural network ensembles, cross validation, and
active learning.” Advances in neural information processing systems 7 (1995): 231-238.

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Experimental Setup
Simulation Results
Table of Contents
1 Introduction
Noise
Contributions
2 Preliminaries
One-Class SVM
AdaBoost
3 Proposed Approach
Basic Idea
4 Experiments
Experimental Setup
Simulation Results
5 Conclusions
Conclusions

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Experimental Setup
Simulation Results
Experimental Setup
Our approach is applied to five different data sets to verify its model
effectivity and efficiency.
Table: Description of the testing data sets used in the experiments.
Data set #Train #Test #Classes #Attributes
cod-rna 59,535 157,413 2 8
ijcnn1 49,990 91,701 2 22
letter 15,000 5,000 26 16
shuttle 43,500 14,500 7 9
SensIT Vehicle 78,823 19,705 3 100

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Experimental Setup
Simulation Results
Eﬀect of Noise Removing on Input Matrix I
Gini
Gini impurity is a measure of how often a randomly chosen element from the set
would be incorrectly labeled if it were randomly labeled according to the
distribution of labels in the subset.
Gini Impurity is used to measure the quality of procedure.
Gini approaches deal appropriately with data diversity of a data.
The Gini measures the class distribution of variable y = {y1, · · · , ym}
g = 1 −
k
p2
j (4)
where pj is the probability of class k, in data set D.

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Experimental Setup
Simulation Results
Eﬀect of Noise Removing on Input Matrix II
0 0.1 0.2 0.3 0.4 0.5 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Noise Removing Percantage
GiniImpurity
cod−rna
SensIT Vehicle
ijcnn1
letter
shuttle
(a) Cleaned part.
0 0.1 0.2 0.3 0.4 0.5 0.6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Noise Removing Percantage
GiniImpurity
cod−rna
SensIT Vehicle
ijcnn1
letter
shuttle
(b) Removed part.
Figure: The impact of one-class SVM on the performance in terms of Gini impurity.
The Gini impurity value decreases with the cleaning of noisy instances from
the data, and increases on separated noisy data.

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Experimental Setup
Simulation Results
Eﬀect of Noise Removing on Input Matrix III
Aim is to minimizing the Gini impurity on clean data set Xclean, maximizing
the value on noisy data set Xnoisy .
Split Percantage = arg min
p
Gini(Xclean, p)
Gini(Xnoisy , p)
(5)
Table: The best noise removal percentages of each data sets.
Data Sets Percentage Gini(Xclean) Gini(Xnoisy ) Gini(Xclean)
Gini(Xnoisy )
cod-rna 0,55 0,331344656 0,56831775 0,583027112
SensIT Vehicle 0,60 0,461155751 0,568847982 0,810683638
ijcnn1 0,60 0,167652825 0,179397223 0,934534117
letter 0,60 0,9039108 0,926430562 0,975691905
shuttle 0,30 0,214142833 0,506938229 0,422423918

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Experimental Setup
Simulation Results
Simulation Results
Table: Classiﬁcation performance on example datasets using One-Class SVM noise
removing and without removing for the proposed learning algorithm.
Extra trees K-nn SVM
Data set All Clean All Clean All Clean
cod-rna 0,75929 0,78652 0,91955 0,93513 0,88553 0,89806
ijcnn1 0,69758 0,72175 0,75533 0,77833 0,82989 0,84326
letter 0,91255 0,90853 0,90594 0,90318 0,92121 0,9199
shuttle 0,47801 0,44792 0,20262 0,26974 0,62301 0,63939
SensIT Vehicle 0,9602 0,90558 0,87904 0,88266 0,99232 0,99174

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Conclusions
Table of Contents
1 Introduction
Noise
Contributions
2 Preliminaries
One-Class SVM
AdaBoost
3 Proposed Approach
Basic Idea
4 Experiments
Experimental Setup
Simulation Results
5 Conclusions
Conclusions

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Conclusions
Conclusions
Classification Performance
Improving classification performance
Removing the noisy instances using one-class SVM
Find best noise removing ratio with Gini impurity value
Experiments
Experimental results show that
Input complexity of the classification algorithms reduced remarkably
The accuracy increased by using the noise removal process

Introduction
Preliminaries
Proposed Approach
Experiments
Conclusions
Conclusions
Thank You
Ferhat Özgür Çatak
ozgur.catak@tubitak.gov.tr

Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Viewers also liked

Viewers also liked (20)

Similar to Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM

Similar to Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM (20)

Recently uploaded

Recently uploaded (20)

Robust Ensemble Classifier Combination Based on Noise Removal with One-Class SVM