A preliminary study of diversity in ELM ensembles (HAIS 2018)

A preliminary study of diversity in Extreme Learning
Machines ensembles
Carlos Perales-González1
Mariano Carbonero-Ruz1
David Becerra Fernández-Navarro1
Francisco Fernández-Navarro1
1Universidad Loyola Andaluc´ıa
HAIS 2018
C. Perales-González (ULOYOLA) Diversity for ELM 2018-06-21 1 / 25

Overview
1 Introduction
Abstract
Extreme Learning Machine
2 Diverse ELM
Other ensembles
Diversity as metric
DELM loss function
3 Experiments, results and conclusions
Description
Results
Conclusions and future work

Abstract
In this paper, the neural network version of Extreme Learning Machine
(ELM) is used as a base learner for an ensemble meta-algorithm which
promotes diversity explicitly in the ELM loss function. The cost
function proposed encourages orthogonality (scalar product) in the
parameter space. Other ensemble-based meta-algorithms from AdaBoost
family are used for comparison purposes. Both accuracy and diversity
presented in our proposal are competitive, thus reinforcing the idea of
introducing diversity explicitly.

ELM I
Extreme Learning Machine a.k.a. Ridge classiﬁcation [1], [2], [3]. First
layer connections are random. Mathematically, the classiﬁcation is a
multiregression.
f(x) = h (x) β, (1)
1-of-J
Output
Features
input
h(x) β

ELM II
Where
x ∈ Rm is the vector of attributes, m is the dimension of the input
space.
h : Rm → Rd is the mapping function and d is the number of hidden
nodes (the dimension of the transformed space).
β = (βj , j = 1, . . . , J) ∈ Rd×J is ELM matrix.
Predicted label y is obtained from function f(x)
y (x) = arg max
j=1,...,J
f(x)j . (2)

ELM III
We worked with the neural version of ELM, so h(xi ) is deﬁned as
h(xi ) = (φ(xi ; wj , bj ), j = 1, . . . , d), (3)
where
φ(·; wj , bj ) : Rm
→ R (4)
is the activation function of the j-th hidden node. In this case, a sigmoid
was used.

ELM IV
Learning problem: Let us also denote
H = (h (xi ) , i = 1, . . . , n) ∈ Rn×d as the transformation of the training
set.
Y ∈ Rn×J is the matrix of labels ”1-of-J” encoded
min
β∈Rd×J
β 2
+ C Hβ − Y
2
, (5)
where C ∈ R+ is a cross-validated hyper-parameter for regularization.
Matrix solution is
β =
I
C
+ H H
−1
H Y (6)

Other ensembles
The two main approaches to combine several classiﬁers into one predictive
model:
Bagging (bootstrap aggregating): Several versions of a base learner
by selecting some subsets from the training set [4]. Random data
sampling.
Boosting focus on combining base learners over several iterations
and generate a weighted majority hypothesis [5]. Data sampling
depends on performance

Diversity as metric
These ensembles search diversity by data sampling.
Our proposal: promote diversity explicitly using orthogonality. For two
vectors u, v ∈ Rn
d (u, v) = 1 −
u, v 2
u 2
v 2
(7)

Our proposal I
Loss function for explicit diverse ELM:
min
β(s)
∈Rd×J
1
2

 β(s) 2
+ C Hβ(s)
− Y
2
+ D +
n
s
J
j=1
s−1
k=1
β
(s)
j , u
(k)
j
2


(8)
where:
u(k) ∈ Rd×J is the column-by-column normalized β(k)
from the
iteration k of the ensemble.
D > 0 is a hyperparameter like C.

Our proposal II
Hence, β
(s)
j could be obtained analytically as:
β
(s)
j =
I
C
+ H H +
1
C
D +
n
s
M
(s)
j
−1
H Yj j = 1, . . . , J (9)
where M
(s)
j is deﬁned as
M
(s)
j ≡
s−1
k=1
u
(k)
j u
(k)
j (10)

Datasets I
To sum up,
The datasets were extracted from the UCI Machine Learning [6] and
the mldata.org repositories.
10 datasets, 5 of them ≥ 1000 instances.
Experimental design using a 10-fold cross-validation, with 5 nested
cross-validation for hyperparameters.
Nominal variables were transformed to binary variables.
Features were standarized.

Datasets II
Table: Characteristics of the data sets, ordered by size and number of classes
Data sets
Dataset Size #Attr. #Classes Class distribution
car 1728 21 4 (1210, 384, 69, 65)
winequality-red 1599 11 6 (10, 53, 681, 638, 199, 18)
ERA 1000 4 9 (92, 142, 181, 172,
158, 118, 88, 31, 18)
LEV 1000 4 5 (93, 280, 403, 197, 27)
SWD 1000 10 4 (32, 352, 399, 217)
newthyroid 215 5 3 (30, 150, 35)
automobile 205 71 6 (3, 22, 67, 54, 32, 27)
squash-stored 52 51 3 (23, 21, 8)
squash-unstored 52 52 3 (24, 24, 4)
pasture 36 25 3 (12, 12, 12)

Metrics I
Acc Accuracy rate: It is the number of successful hits relative to
the number of total classiﬁcations.
Acc =
1
n
n
i=1
I (˜y (xi ) = yi ) (11)
Div Diversity: For an ELM ensemble with s individuals, this
metric is obtained this metric is obtained applying Eq. (7) to
the β1
, . . . , βs
matrices:
d β(1)
, . . . , β(s)
=
1
J s
2 k<l
J
j=1
d β
(k)
j , β
(l)
j (12)

Algorithms
Three boosting ensembles were compared against our proposal:
AELM AdaBoost Extreme Learning Machine, deﬁned in [7].
BRELM Boosting Ridge Extreme Learning Machine, explained in [8].
NCELM Negative Correlation Extreme Learning Machine, deﬁned in
[9].
DELM Diverse Extreme Learning Machine, our proposal.

Results I
Accuracy (Acc)
DELM AELM BRELM NCELM
car 0.929711 0.834618 0.901805 0.905111
winequality-red 0.853687 0.840085 0.839670 0.837363
ERA 0.829479 0.822201 0.828019 0.828428
LEV 0.836345 0.786404 0.792371 0.798220
SWD 0.787940 0.764487 0.759893 0.760442
newthyroid 0.932035 0.817172 0.812035 0.819509
automobile 0.867376 0.834618 0.841636 0.846499
squash-stored 0.694286 0.751429 0.694063 0.711937
squash-unstored 0.814286 0.830952 0.813810 0.812381
pasture 0.833333 0.766667 0.811111 0.826667

Results II

Results III
Diversity (Div)
DELM AELM BRELM NCELM
car 0.999206 0.180213 0.176621 0.181212
winequality-red 0.926890 0.152451 0.124054 0.185804
ERA 0.968917 0.138991 0.143748 0.156551
LEV 0.992860 0.098013 0.089886 0.133830
SWD 0.980953 0.130116 0.138222 0.137884
newthyroid 0.886023 0.043061 0.040141 0.057340
automobile 0.932272 0.314529 0.311612 0.317588
squash-stored 0.668662 0.216839 0.181023 0.217834
squash-unstored 0.568838 0.130116 0.145780 0.155101
pasture 0.081884 0.181297 0.175300 0.187400

Results IV

Conclusions
Introduction of diversity explicitly while improving accuracy.
Improvement diversity of solutions / classiﬁers.
Sensitivity analysis (this could helps to establish a ranking of good
solutions).

Future work
Reduce computational time:
Remove hyperparameter D.
Reduce the number of inverses
Extend the experiments (in datasets and algoritms).
Study the performance on unbalanced data.

References I
A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation
for nonorthogonal problems,” Technometrics, vol. 12, no. 1,
pp. 55–67, 1970.
G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, “Extreme learning machine:
theory and applications,” Neurocomputing, vol. 70, no. 1-3,
pp. 489–501, 2006.
G.-B. Huang, H. Zhou, X. Ding, and R. Zhang, “Extreme learning
machine for regression and multiclass classiﬁcation,” IEEE
Transactions on Systems, Man, and Cybernetics, Part B
(Cybernetics), vol. 42, no. 2, pp. 513–529, 2012.
L. Bbeiman, “Bagging Predictors,” Machine Learning, vol. 24,
pp. 123–140, 1996.

References II
Y. Freund and R. E. Schapire, “A Short Introduction to Boosting,”
Journal of Japanese Society for Artiﬁcial Intelligence, vol. 14, no. 5,
pp. 771–780, 1999.
D. Dheeru and E. Karra Taniskidou, “UCI machine learning
repository,” 2017.
A. Riccardi, F. Fern´andez-Navarro, and S. Carloni, “Cost-sensitive
AdaBoost algorithm for ordinal regression based on extreme learning
machine,” IEEE Transactions on Cybernetics, vol. 44, no. 10,
pp. 1898–1909, 2014.
Y. Ran, X. Sun, H. Sun, L. Sun, X. Wang, and W. X. Ran Y, Sun X,
Sun H, Sun L, “Boosting Ridge Extreme Learning Machine,”
Proceedings - 2012 IEEE Symposium on Robotics and Applications,
ISRA 2012, pp. 881–884, 2012.

References III
S. Wang, H. Chen, and X. Yao, “Negative correlation learning for
classiﬁcation ensembles,” Proceedings of the International Joint
Conference on Neural Networks, 2010.

END
THANK YOU

A preliminary study of diversity in ELM ensembles (HAIS 2018)

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to A preliminary study of diversity in ELM ensembles (HAIS 2018)

Similar to A preliminary study of diversity in ELM ensembles (HAIS 2018) (20)

More from Carlos Perales

More from Carlos Perales (9)

Recently uploaded

Recently uploaded (20)

A preliminary study of diversity in ELM ensembles (HAIS 2018)