ELM: Extreme Learning Machine: Learning without iterative tuning
1. Outline
Extreme Learning Machine:
Learning Without Iterative Tuning
Guang-Bin HUANG
Associate Professor
School of Electrical and Electronic Engineering
Nanyang Technological University, Singapore
tu-logo
Talks in ELM2010 (Adelaide, Australia, Dec 7 2010), HP Labs (Palo Alto,
USA), Microsoft Research (Redmond, USA), UBC (Canada), Institute for
InfoComm Research (Singapore), EEE/NTU (Singapore) ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
2. Outline
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
3. Outline
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
4. Outline
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
5. Outline
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
6. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
7. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Feedforward Neural Networks with Additive Nodes
Output of hidden nodes
G(ai , bi , x) = g(ai · x + bi ) (1)
ai : the weight vector connecting the ith hidden
node and the input nodes.
bi : the threshold of the ith hidden node.
Output of SLFNs
L
X
fL (x) = β i G(ai , bi , x) (2)
i=1
tu-logo
β i : the weight vector connecting the ith hidden
node and the output nodes.
Figure 1: SLFN: additive hidden nodes
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
8. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Feedforward Neural Networks with Additive Nodes
Output of hidden nodes
G(ai , bi , x) = g(ai · x + bi ) (1)
ai : the weight vector connecting the ith hidden
node and the input nodes.
bi : the threshold of the ith hidden node.
Output of SLFNs
L
X
fL (x) = β i G(ai , bi , x) (2)
i=1
tu-logo
β i : the weight vector connecting the ith hidden
node and the output nodes.
Figure 1: SLFN: additive hidden nodes
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
9. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Feedforward Neural Networks with Additive Nodes
Output of hidden nodes
G(ai , bi , x) = g(ai · x + bi ) (1)
ai : the weight vector connecting the ith hidden
node and the input nodes.
bi : the threshold of the ith hidden node.
Output of SLFNs
L
X
fL (x) = β i G(ai , bi , x) (2)
i=1
tu-logo
β i : the weight vector connecting the ith hidden
node and the output nodes.
Figure 1: SLFN: additive hidden nodes
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
10. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Feedforward Neural Networks with RBF Nodes
Output of hidden nodes
G(ai , bi , x) = g (bi x − ai ) (3)
ai : the center of the ith hidden node.
bi : the impact factor of the ith hidden node.
Output of SLFNs
L
X
fL (x) = β i G(ai , bi , x) (4)
i=1
tu-logo
β i : the weight vector connecting the ith hidden
Figure 2: Feedforward Network Architecture: RBF hidden node and the output nodes.
nodes
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
11. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Feedforward Neural Networks with RBF Nodes
Output of hidden nodes
G(ai , bi , x) = g (bi x − ai ) (3)
ai : the center of the ith hidden node.
bi : the impact factor of the ith hidden node.
Output of SLFNs
L
X
fL (x) = β i G(ai , bi , x) (4)
i=1
tu-logo
β i : the weight vector connecting the ith hidden
Figure 2: Feedforward Network Architecture: RBF hidden node and the output nodes.
nodes
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
12. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Feedforward Neural Networks with RBF Nodes
Output of hidden nodes
G(ai , bi , x) = g (bi x − ai ) (3)
ai : the center of the ith hidden node.
bi : the impact factor of the ith hidden node.
Output of SLFNs
L
X
fL (x) = β i G(ai , bi , x) (4)
i=1
tu-logo
β i : the weight vector connecting the ith hidden
Figure 2: Feedforward Network Architecture: RBF hidden node and the output nodes.
nodes
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
13. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
14. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Function Approximation of Neural Networks
Mathematical Model
Any continuous target function f (x) can be
approximated by SLFNs with adjustable hidden
nodes. In other words, given any small positive
value , for SLFNs with enough number of
hidden nodes (L) we have
fL (x) − f (x) < (5)
Learning Issue
tu-logo
In real applications, target function f is usually
unknown. One wishes that unknown f could be
Figure 3: SLFN. approximated by SLFNs fL appropriately.
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
15. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Function Approximation of Neural Networks
Mathematical Model
Any continuous target function f (x) can be
approximated by SLFNs with adjustable hidden
nodes. In other words, given any small positive
value , for SLFNs with enough number of
hidden nodes (L) we have
fL (x) − f (x) < (5)
Learning Issue
tu-logo
In real applications, target function f is usually
unknown. One wishes that unknown f could be
Figure 3: SLFN. approximated by SLFNs fL appropriately.
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
16. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Function Approximation of Neural Networks
Mathematical Model
Any continuous target function f (x) can be
approximated by SLFNs with adjustable hidden
nodes. In other words, given any small positive
value , for SLFNs with enough number of
hidden nodes (L) we have
fL (x) − f (x) < (5)
Learning Issue
tu-logo
In real applications, target function f is usually
unknown. One wishes that unknown f could be
Figure 3: SLFN. approximated by SLFNs fL appropriately.
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
17. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Function Approximation of Neural Networks
Learning Model
For N arbitrary distinct samples
(xi , ti ) ∈ Rn × Rm , SLFNs with L
hidden nodes and activation function
g(x) are mathematically modeled as
fL (xj ) = oj , j = 1, · · · , N (6)
PN ‚ ‚
Cost function: E = ‚oj − tj ‚ .
‚ ‚
j=1 2
The target is to minimize the cost tu-logo
function E by adjusting the network
parameters: β i , ai , bi .
Figure 4: SLFN.
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
18. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Function Approximation of Neural Networks
Learning Model
For N arbitrary distinct samples
(xi , ti ) ∈ Rn × Rm , SLFNs with L
hidden nodes and activation function
g(x) are mathematically modeled as
fL (xj ) = oj , j = 1, · · · , N (6)
PN ‚ ‚
Cost function: E = ‚oj − tj ‚ .
‚ ‚
j=1 2
The target is to minimize the cost tu-logo
function E by adjusting the network
parameters: β i , ai , bi .
Figure 4: SLFN.
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
19. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Function Approximation of Neural Networks
Learning Model
For N arbitrary distinct samples
(xi , ti ) ∈ Rn × Rm , SLFNs with L
hidden nodes and activation function
g(x) are mathematically modeled as
fL (xj ) = oj , j = 1, · · · , N (6)
PN ‚ ‚
Cost function: E = ‚oj − tj ‚ .
‚ ‚
j=1 2
The target is to minimize the cost tu-logo
function E by adjusting the network
parameters: β i , ai , bi .
Figure 4: SLFN.
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
20. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Function Approximation of Neural Networks
Learning Model
For N arbitrary distinct samples
(xi , ti ) ∈ Rn × Rm , SLFNs with L
hidden nodes and activation function
g(x) are mathematically modeled as
fL (xj ) = oj , j = 1, · · · , N (6)
PN ‚ ‚
Cost function: E = ‚oj − tj ‚ .
‚ ‚
j=1 2
The target is to minimize the cost tu-logo
function E by adjusting the network
parameters: β i , ai , bi .
Figure 4: SLFN.
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
21. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
22. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Classification Capability of SLFNs
As long as SLFNs can
approximate any
continuous target function
f (x), such SLFNs can
differentiate any disjoint
regions.
tu-logo
Figure 5: SLFN.
G.-B. Huang, et al., “Classification Ability of Single Hidden Layer Feedforward Neural Networks,” IEEE Transactions ur-logo
on Neural Networks, vol. 11, no. 3, pp. 799-801, 2000.
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
23. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
24. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Learning Algorithms of Neural Networks
Learning Methods
Many learning methods mainly based
on gradient-descent/iterative
approaches have been developed over
the past two decades.
Back-Propagation (BP) and its variants
are most popular.
Least-square (LS) solution for RBF
network, with single impact factor for all
hidden nodes. tu-logo
Figure 6: Feedforward Network Architecture.
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
25. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Learning Algorithms of Neural Networks
Learning Methods
Many learning methods mainly based
on gradient-descent/iterative
approaches have been developed over
the past two decades.
Back-Propagation (BP) and its variants
are most popular.
Least-square (LS) solution for RBF
network, with single impact factor for all
hidden nodes. tu-logo
Figure 6: Feedforward Network Architecture.
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
26. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Learning Algorithms of Neural Networks
Learning Methods
Many learning methods mainly based
on gradient-descent/iterative
approaches have been developed over
the past two decades.
Back-Propagation (BP) and its variants
are most popular.
Least-square (LS) solution for RBF
network, with single impact factor for all
hidden nodes. tu-logo
Figure 6: Feedforward Network Architecture.
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
27. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Learning Algorithms of Neural Networks
Learning Methods
Many learning methods mainly based
on gradient-descent/iterative
approaches have been developed over
the past two decades.
Back-Propagation (BP) and its variants
are most popular.
Least-square (LS) solution for RBF
network, with single impact factor for all
hidden nodes. tu-logo
Figure 6: Feedforward Network Architecture.
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
28. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Advantages and Disadvantages
Popularity
Widely used in various applications: regression, classification, etc.
Limitations
Usually different learning algorithms used in different SLFNs architectures.
Some parameters have to be tuned manually.
Overfitting.
Local minima.
Time-consuming. tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
29. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Advantages and Disadvantages
Popularity
Widely used in various applications: regression, classification, etc.
Limitations
Usually different learning algorithms used in different SLFNs architectures.
Some parameters have to be tuned manually.
Overfitting.
Local minima.
Time-consuming. tu-logo
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
30. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
31. Feedforward Neural Networks SLFN Models
ELM Function Approximation
ELM and SVM Classification Capability
OS-ELM Learning Methods
Summary SVM
Support Vector Machine
SVM optimization formula:
N
1 2
X
Minimize: LP = w +C ξi
2 i=1 (7)
Subject to: ti (w · φ(xi ) + b) ≥ 1 − ξi , i = 1, · · · , N
ξi ≥ 0, i = 1, · · · , N
The decision function of SVM is:
0 1
Ns
X
f (x) = sign @ αs ts K (x, xs ) + bA (8)
s=1
Figure 7: SVM Architecture.
tu-logo
´
B. Frenay and M. Verleysen, “Using SVMs with Randomised Feature Spaces: an Extreme Learning Approach,”
ESANN, Bruges, Belgium, pp. 315-320, 28-30 April, 2010.
ur-logo
G.-B. Huang, et al., “Optimization Method Based Extreme Learning Machine for Classification,” Neurocomputing,
vol. 74, pp. 155-163, 2010.
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
32. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
33. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Generalized SLFNs
General Hidden Layer Mapping
PL
Output function: f (x) = i=1 β i G(ai , bi , x)
The hidden layer output function (hidden layer
mapping):
h(x) = [G(a1 , b1 , x), · · · , G(aL , bL , x)]
The output function needn’t be:
Sigmoid: G(ai , bi , x) = g(ai · x + bi )
RBF: G(ai , bi , x) = g (bi x − ai )
Figure 8: SLFN: any type of piecewise continuous G(ai , bi , x).
G.-B. Huang, et al., “Universal Approximation Using Incremental Constructive Feedforward Networks with Random tu-logo
Hidden Nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006.
G.-B. Huang, et al., “Convex Incremental Extreme Learning Machine,” Neurocomputing, vol. 70, pp. 3056-3062, ur-logo
2007.
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
34. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Generalized SLFNs
General Hidden Layer Mapping
PL
Output function: f (x) = i=1 β i G(ai , bi , x)
The hidden layer output function (hidden layer
mapping):
h(x) = [G(a1 , b1 , x), · · · , G(aL , bL , x)]
The output function needn’t be:
Sigmoid: G(ai , bi , x) = g(ai · x + bi )
RBF: G(ai , bi , x) = g (bi x − ai )
Figure 8: SLFN: any type of piecewise continuous G(ai , bi , x).
G.-B. Huang, et al., “Universal Approximation Using Incremental Constructive Feedforward Networks with Random tu-logo
Hidden Nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006.
G.-B. Huang, et al., “Convex Incremental Extreme Learning Machine,” Neurocomputing, vol. 70, pp. 3056-3062, ur-logo
2007.
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
35. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
36. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
New Learning Theory: Learning Without Iterative
Tuning
New Learning View
Learning Without Iterative Tuning: Given any nonconstant piecewise continuous function g, if continuous
target function f (x) can be approximated by SLFNs with adjustable hidden nodes g then the hidden node
parameters of such SLFNs needn’t be tuned.
All these hidden node parameters can be randomly generated without the knowledge of the training data.
That is, for any continuous target function f and any randomly generated sequence {(ai , bi )L },
i=1
limL→∞ f (x) − fL (x) = limL→∞ f (x) − L β i G(ai , bi , x) = 0 holds with probability one if βi is
P
i=1
chosen to minimize f (x) − fL (x) , i = 1, · · · , L.
G.-B. Huang, et al., “Universal Approximation Using Incremental Constructive Feedforward Networks with Random
Hidden Nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006.
tu-logo
G.-B. Huang, et al., “Convex Incremental Extreme Learning Machine,” Neurocomputing, vol. 70, pp. 3056-3062,
2007.
G.-B. Huang, et al., “Enhanced Random Search Based Incremental Extreme Learning Machine,” Neurocomputing, ur-logo
vol. 71, pp. 3460-3468, 2008.
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
37. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
New Learning Theory: Learning Without Iterative
Tuning
New Learning View
Learning Without Iterative Tuning: Given any nonconstant piecewise continuous function g, if continuous
target function f (x) can be approximated by SLFNs with adjustable hidden nodes g then the hidden node
parameters of such SLFNs needn’t be tuned.
All these hidden node parameters can be randomly generated without the knowledge of the training data.
That is, for any continuous target function f and any randomly generated sequence {(ai , bi )L },
i=1
limL→∞ f (x) − fL (x) = limL→∞ f (x) − L β i G(ai , bi , x) = 0 holds with probability one if βi is
P
i=1
chosen to minimize f (x) − fL (x) , i = 1, · · · , L.
G.-B. Huang, et al., “Universal Approximation Using Incremental Constructive Feedforward Networks with Random
Hidden Nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006.
tu-logo
G.-B. Huang, et al., “Convex Incremental Extreme Learning Machine,” Neurocomputing, vol. 70, pp. 3056-3062,
2007.
G.-B. Huang, et al., “Enhanced Random Search Based Incremental Extreme Learning Machine,” Neurocomputing, ur-logo
vol. 71, pp. 3460-3468, 2008.
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
38. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Unified Learning Platform
Mathematical Model
For N arbitrary distinct samples
(xi , ti ) ∈ Rn × Rm , SLFNs with L hidden nodes
each with output function G(ai , bi , x) are
mathematically modeled as
L
X
β i G(ai , bi , xj ) = tj , j = 1, · · · , N (9)
i=1
(ai , bi ): hidden node parameters.
tu-logo
β i : the weight vector connecting the ith hidden
Figure 9: Generalized SLFN: any type of piecewise node and the output node.
continuous G(ai , bi , x).
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
39. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Unified Learning Platform
Mathematical Model
For N arbitrary distinct samples
(xi , ti ) ∈ Rn × Rm , SLFNs with L hidden nodes
each with output function G(ai , bi , x) are
mathematically modeled as
L
X
β i G(ai , bi , xj ) = tj , j = 1, · · · , N (9)
i=1
(ai , bi ): hidden node parameters.
tu-logo
β i : the weight vector connecting the ith hidden
Figure 9: Generalized SLFN: any type of piecewise node and the output node.
continuous G(ai , bi , x).
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
40. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Extreme Learning Machine (ELM)
Mathematical Model
PL
i=1 β i G(ai , bi , xj ) = tj , j = 1, · · · , N, is equivalent to Hβ = T, where
h(x1 ) G(a1 , b1 , x1 ) ··· G(aL , bL , x1 )
2 3 2 3
6 . 7 6 . . 7
H=6 . 7=6 . . 7 (10)
4 . 5 4 . ··· . 5
h(xN ) G(a1 , b1 , xN ) ··· G(aL , bL , xN ) N×L
βT tT
2 3 2 3
1 1
. .
6 7 6 7
β=6 and T = 6 (11)
6 7 6 7
. 7 . 7
4 . 5 4 . 5
βTL
T
tN
L×m N×m tu-logo
H is called the hidden layer output matrix of the neural network; the ith column of H is the output of the ith
hidden node with respect to inputs x1 , x2 , · · · , xN .
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
41. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
42. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Extreme Learning Machine (ELM)
Three-Step Learning Model
Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node
output function G(a, b, x), and the number of hidden nodes L,
1 Assign randomly hidden node parameters (ai , bi ), i = 1, · · · , L.
2 Calculate the hidden layer output matrix H.
3 Calculate the output weight β: β = H† T.
where H† is the Moore-Penrose generalized inverse of hidden layer output
matrix H.
tu-logo
Source Codes of ELM
http://www.extreme-learning-machines.org
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
43. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Extreme Learning Machine (ELM)
Three-Step Learning Model
Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node
output function G(a, b, x), and the number of hidden nodes L,
1 Assign randomly hidden node parameters (ai , bi ), i = 1, · · · , L.
2 Calculate the hidden layer output matrix H.
3 Calculate the output weight β: β = H† T.
where H† is the Moore-Penrose generalized inverse of hidden layer output
matrix H.
tu-logo
Source Codes of ELM
http://www.extreme-learning-machines.org
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
44. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Extreme Learning Machine (ELM)
Three-Step Learning Model
Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node
output function G(a, b, x), and the number of hidden nodes L,
1 Assign randomly hidden node parameters (ai , bi ), i = 1, · · · , L.
2 Calculate the hidden layer output matrix H.
3 Calculate the output weight β: β = H† T.
where H† is the Moore-Penrose generalized inverse of hidden layer output
matrix H.
tu-logo
Source Codes of ELM
http://www.extreme-learning-machines.org
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
45. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Extreme Learning Machine (ELM)
Three-Step Learning Model
Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node
output function G(a, b, x), and the number of hidden nodes L,
1 Assign randomly hidden node parameters (ai , bi ), i = 1, · · · , L.
2 Calculate the hidden layer output matrix H.
3 Calculate the output weight β: β = H† T.
where H† is the Moore-Penrose generalized inverse of hidden layer output
matrix H.
tu-logo
Source Codes of ELM
http://www.extreme-learning-machines.org
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
46. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Extreme Learning Machine (ELM)
Three-Step Learning Model
Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node
output function G(a, b, x), and the number of hidden nodes L,
1 Assign randomly hidden node parameters (ai , bi ), i = 1, · · · , L.
2 Calculate the hidden layer output matrix H.
3 Calculate the output weight β: β = H† T.
where H† is the Moore-Penrose generalized inverse of hidden layer output
matrix H.
tu-logo
Source Codes of ELM
http://www.extreme-learning-machines.org
ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
47. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
ELM Learning Algorithm
Salient Features
“Simple Math is Enough.” ELM is a simple tuning-free three-step algorithm.
The learning speed of ELM is extremely fast.
The hidden node parameters ai and bi are not only independent of the training data but also of each other.
Unlike conventional learning methods which MUST see the training data before generating the hidden node
parameters, ELM could generate the hidden node parameters before seeing the training data.
Unlike traditional gradient-based learning algorithms which only work for differentiable activation functions,
ELM works for all bounded nonconstant piecewise continuous activation functions.
Unlike traditional gradient-based learning algorithms facing several issues like local minima, improper
learning rate and overfitting, etc, ELM tends to reach the solutions straightforward without such trivial issues.
The ELM learning algorithm looks much simpler than other popular learning algorithms: neural networks
and support vector machines. tu-logo
G.-B. Huang, et al., “Can Threshold Networks Be Trained Directly?” IEEE Transactions on Circuits and Systems II,
vol. 53, no. 3, pp. 187-191, 2006.
M.-B. Li, et al., “Fully Complex Extreme Learning Machine” Neurocomputing, vol. 68, pp. 306-314, 2005. ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
48. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Output Functions of Generalized SLFNs
Ridge regression theory based ELM
«−1
I
„
T −1
“ ”
T T T
f(x) = h(x)β = h(x)H HH ⇒
T = h(x)H + HH T
C
and «−1
I
“ ”−1 „
T T T T
f(x) = h(x)β = h(x) H H ⇒
H T = h(x) +H H H T
C
Ridge Regression Theory
A positive value I/C can be added to the diagonal of HT H or HHT of the Moore-Penrose generalized inverse H the
resultant solution is stabler and tends to have better generalization performance.
tu-logo
A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems”, Technometrics,
vol. 12, no. 1, pp. 55-67, 1970.
Citation: G.-B. Huang, H. Zhou, X. Ding and R. Zhang, “Extreme Learning Machine for Regression and Multi-Class
ur-logo
Classification”, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence, October 2010.
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
49. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Output Functions of Generalized SLFNs
Ridge regression theory based ELM
«−1
I
„
T −1
“ ”
T T T
f(x) = h(x)β = h(x)H HH ⇒
T = h(x)H + HH T
C
and «−1
I
“ ”−1 „
T T T T
f(x) = h(x)β = h(x) H H ⇒
H T = h(x) +H H H T
C
Ridge Regression Theory
A positive value I/C can be added to the diagonal of HT H or HHT of the Moore-Penrose generalized inverse H the
resultant solution is stabler and tends to have better generalization performance.
tu-logo
A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems”, Technometrics,
vol. 12, no. 1, pp. 55-67, 1970.
Citation: G.-B. Huang, H. Zhou, X. Ding and R. Zhang, “Extreme Learning Machine for Regression and Multi-Class
ur-logo
Classification”, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence, October 2010.
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
50. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Output functions of Generalized SLFNs
Valid for both kernel and non-kernel learning
1 Non-kernel based: «−1
I
„
T T
f(x) = h(x)H + HH T
C
and «−1
I
„
T T
f(x) = h(x) +H H H T
C
2 Kernel based: (if h(x) is unknown)
K (x, x1 ) T
2 3
«−1
I
„
6 . 7
f(x) = 6 . 7 + ΩELM T
4 . 5 C
K (x, xN )
tu-logo
where ΩELM i,j = h(xi ) · h(xj ) = K (xi ,xj )
ur-logo
Citation: G.-B. Huang, H. Zhou, X. Ding and R. Zhang, “Extreme Learning Machine for Regression and Multi-Class
Classification”, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence, October 2010.
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
51. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Output functions of Generalized SLFNs
Valid for both kernel and non-kernel learning
1 Non-kernel based: «−1
I
„
T T
f(x) = h(x)H + HH T
C
and «−1
I
„
T T
f(x) = h(x) +H H H T
C
2 Kernel based: (if h(x) is unknown)
K (x, x1 ) T
2 3
«−1
I
„
6 . 7
f(x) = 6 . 7 + ΩELM T
4 . 5 C
K (x, xN )
tu-logo
where ΩELM i,j = h(xi ) · h(xj ) = K (xi ,xj )
ur-logo
Citation: G.-B. Huang, H. Zhou, X. Ding and R. Zhang, “Extreme Learning Machine for Regression and Multi-Class
Classification”, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence, October 2010.
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
52. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Output functions of Generalized SLFNs
Valid for both kernel and non-kernel learning
1 Non-kernel based: «−1
I
„
T T
f(x) = h(x)H + HH T
C
and «−1
I
„
T T
f(x) = h(x) +H H H T
C
2 Kernel based: (if h(x) is unknown)
K (x, x1 ) T
2 3
«−1
I
„
6 . 7
f(x) = 6 . 7 + ΩELM T
4 . 5 C
K (x, xN )
tu-logo
where ΩELM i,j = h(xi ) · h(xj ) = K (xi ,xj )
ur-logo
Citation: G.-B. Huang, H. Zhou, X. Ding and R. Zhang, “Extreme Learning Machine for Regression and Multi-Class
Classification”, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence, October 2010.
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
53. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Outline
1 Feedforward Neural Networks
Single-Hidden Layer Feedforward Networks (SLFNs)
Function Approximation of SLFNs
Classification Capability of SLFNs
Conventional Learning Algorithms of SLFNs
Support Vector Machines
2 Extreme Learning Machine
Generalized SLFNs
New Learning Theory: Learning Without Iterative Tuning
ELM Algorithm tu-logo
Differences between ELM and LS-SVM
3 ELM and Conventional SVM ur-logo
4 Online Sequential ELM
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
54. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
Differences between ELM and LS-SVM
ELM (unified for regression, binary/multi-class cases) LS-SVM (for binary class case)
1 Non-kernel based:
" #» – " #» –
TT TT
» –
«−1 0 b 0 b 0
I = =
„
f(x) = h(x)H
T T
+ HH T T I
C
+ ΩLS−SVM α T I
C
+ ZZT α 1
C
where
and
t1 φ(x1 )
2 3
«−1
I
„
T T
f(x) = h(x) +H H H T 6 . 7
C Z=6 . 7
4 . 5
tN φ(xN )
T
2 Kernel based: (if h(x) is unknown) ΩLS−SVM = ZZ
tu-logo
K (x, x1 ) T
2 3
«−1
I
„
6 . 7 For details, refer to: G.-B. Huang, H. Zhou, X. Ding and R. Zhang,
f(x) = 6 . 7 + ΩELM T
4 . 5 C “Extreme Learning Machine for Regression and Multi-Class
K (x, xN ) Classification”, submitted to IEEE Transactions on Pattern Analysis
ur-logo
and Machine Intelligence, October 2010.
where ΩELM i,j = h(xi ) · h(xj ) = K (xi ,xj )
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning
55. Feedforward Neural Networks
Generalized SLFNs
ELM
ELM Learning Theory
ELM and SVM
ELM Algorithm
OS-ELM
ELM and LS-SVM
Summary
ELM Classification Boundaries
2 2
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1.5 −1.5
−2 −2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 tu-logo
Figure 10: XOR Problem Figure 11: Banana Case ur-logo
ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning