ELM: Extreme Learning Machine: Learning without iterative tuning

1. Outline Extreme Learning Machine: Learning Without Iterative Tuning Guang-Bin HUANG Associate Professor School of Electrical and Electronic Engineering Nanyang Technological University, Singapore tu-logo Talks in ELM2010 (Adelaide, Australia, Dec 7 2010), HP Labs (Palo Alto, USA), Microsoft Research (Redmond, USA), UBC (Canada), Institute for InfoComm Research (Singapore), EEE/NTU (Singapore) ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

2. Outline Outline 1 Feedforward Neural Networks Single-Hidden Layer Feedforward Networks (SLFNs) Function Approximation of SLFNs Classiﬁcation Capability of SLFNs Conventional Learning Algorithms of SLFNs Support Vector Machines 2 Extreme Learning Machine Generalized SLFNs New Learning Theory: Learning Without Iterative Tuning ELM Algorithm tu-logo Differences between ELM and LS-SVM 3 ELM and Conventional SVM ur-logo 4 Online Sequential ELM ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

6. Feedforward Neural Networks SLFN Models ELM Function Approximation ELM and SVM Classiﬁcation Capability OS-ELM Learning Methods Summary SVM Outline 1 Feedforward Neural Networks Single-Hidden Layer Feedforward Networks (SLFNs) Function Approximation of SLFNs Classiﬁcation Capability of SLFNs Conventional Learning Algorithms of SLFNs Support Vector Machines 2 Extreme Learning Machine Generalized SLFNs New Learning Theory: Learning Without Iterative Tuning ELM Algorithm tu-logo Differences between ELM and LS-SVM 3 ELM and Conventional SVM ur-logo 4 Online Sequential ELM ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

7. Feedforward Neural Networks SLFN Models ELM Function Approximation ELM and SVM Classiﬁcation Capability OS-ELM Learning Methods Summary SVM Feedforward Neural Networks with Additive Nodes Output of hidden nodes G(ai , bi , x) = g(ai · x + bi ) (1) ai : the weight vector connecting the ith hidden node and the input nodes. bi : the threshold of the ith hidden node. Output of SLFNs L X fL (x) = β i G(ai , bi , x) (2) i=1 tu-logo β i : the weight vector connecting the ith hidden node and the output nodes. Figure 1: SLFN: additive hidden nodes ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

10. Feedforward Neural Networks SLFN Models ELM Function Approximation ELM and SVM Classiﬁcation Capability OS-ELM Learning Methods Summary SVM Feedforward Neural Networks with RBF Nodes Output of hidden nodes G(ai , bi , x) = g (bi x − ai ) (3) ai : the center of the ith hidden node. bi : the impact factor of the ith hidden node. Output of SLFNs L X fL (x) = β i G(ai , bi , x) (4) i=1 tu-logo β i : the weight vector connecting the ith hidden Figure 2: Feedforward Network Architecture: RBF hidden node and the output nodes. nodes ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

14. Feedforward Neural Networks SLFN Models ELM Function Approximation ELM and SVM Classiﬁcation Capability OS-ELM Learning Methods Summary SVM Function Approximation of Neural Networks Mathematical Model Any continuous target function f (x) can be approximated by SLFNs with adjustable hidden nodes. In other words, given any small positive value , for SLFNs with enough number of hidden nodes (L) we have fL (x) − f (x) < (5) Learning Issue tu-logo In real applications, target function f is usually unknown. One wishes that unknown f could be Figure 3: SLFN. approximated by SLFNs fL appropriately. ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

17. Feedforward Neural Networks SLFN Models ELM Function Approximation ELM and SVM Classiﬁcation Capability OS-ELM Learning Methods Summary SVM Function Approximation of Neural Networks Learning Model For N arbitrary distinct samples (xi , ti ) ∈ Rn × Rm , SLFNs with L hidden nodes and activation function g(x) are mathematically modeled as fL (xj ) = oj , j = 1, · · · , N (6) PN ‚ ‚ Cost function: E = ‚oj − tj ‚ . ‚ ‚ j=1 2 The target is to minimize the cost tu-logo function E by adjusting the network parameters: β i , ai , bi . Figure 4: SLFN. ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

22. Feedforward Neural Networks SLFN Models ELM Function Approximation ELM and SVM Classification Capability OS-ELM Learning Methods Summary SVM Classification Capability of SLFNs As long as SLFNs can approximate any continuous target function f (x), such SLFNs can differentiate any disjoint regions. tu-logo Figure 5: SLFN. G.-B. Huang, et al., “Classification Ability of Single Hidden Layer Feedforward Neural Networks,” IEEE Transactions ur-logo on Neural Networks, vol. 11, no. 3, pp. 799-801, 2000. ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

24. Feedforward Neural Networks SLFN Models ELM Function Approximation ELM and SVM Classiﬁcation Capability OS-ELM Learning Methods Summary SVM Learning Algorithms of Neural Networks Learning Methods Many learning methods mainly based on gradient-descent/iterative approaches have been developed over the past two decades. Back-Propagation (BP) and its variants are most popular. Least-square (LS) solution for RBF network, with single impact factor for all hidden nodes. tu-logo Figure 6: Feedforward Network Architecture. ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

28. Feedforward Neural Networks SLFN Models ELM Function Approximation ELM and SVM Classification Capability OS-ELM Learning Methods Summary SVM Advantages and Disadvantages Popularity Widely used in various applications: regression, classification, etc. Limitations Usually different learning algorithms used in different SLFNs architectures. Some parameters have to be tuned manually. Overfitting. Local minima. Time-consuming. tu-logo ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

29. Feedforward Neural Networks SLFN Models ELM Function Approximation ELM and SVM Classification Capability OS-ELM Learning Methods Summary SVM Advantages and Disadvantages Popularity Widely used in various applications: regression, classification, etc. Limitations Usually different learning algorithms used in different SLFNs architectures. Some parameters have to be tuned manually. Overfitting. Local minima. Time-consuming. tu-logo ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

31. Feedforward Neural Networks SLFN Models ELM Function Approximation ELM and SVM Classiﬁcation Capability OS-ELM Learning Methods Summary SVM Support Vector Machine SVM optimization formula: N 1 2 X Minimize: LP = w +C ξi 2 i=1 (7) Subject to: ti (w · φ(xi ) + b) ≥ 1 − ξi , i = 1, · · · , N ξi ≥ 0, i = 1, · · · , N The decision function of SVM is: 0 1 Ns X f (x) = sign @ αs ts K (x, xs ) + bA (8) s=1 Figure 7: SVM Architecture. tu-logo ´ B. Frenay and M. Verleysen, “Using SVMs with Randomised Feature Spaces: an Extreme Learning Approach,” ESANN, Bruges, Belgium, pp. 315-320, 28-30 April, 2010. ur-logo G.-B. Huang, et al., “Optimization Method Based Extreme Learning Machine for Classiﬁcation,” Neurocomputing, vol. 74, pp. 155-163, 2010. ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

32. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary Outline 1 Feedforward Neural Networks Single-Hidden Layer Feedforward Networks (SLFNs) Function Approximation of SLFNs Classiﬁcation Capability of SLFNs Conventional Learning Algorithms of SLFNs Support Vector Machines 2 Extreme Learning Machine Generalized SLFNs New Learning Theory: Learning Without Iterative Tuning ELM Algorithm tu-logo Differences between ELM and LS-SVM 3 ELM and Conventional SVM ur-logo 4 Online Sequential ELM ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

33. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary Generalized SLFNs General Hidden Layer Mapping PL Output function: f (x) = i=1 β i G(ai , bi , x) The hidden layer output function (hidden layer mapping): h(x) = [G(a1 , b1 , x), · · · , G(aL , bL , x)] The output function needn’t be: Sigmoid: G(ai , bi , x) = g(ai · x + bi ) RBF: G(ai , bi , x) = g (bi x − ai ) Figure 8: SLFN: any type of piecewise continuous G(ai , bi , x). G.-B. Huang, et al., “Universal Approximation Using Incremental Constructive Feedforward Networks with Random tu-logo Hidden Nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006. G.-B. Huang, et al., “Convex Incremental Extreme Learning Machine,” Neurocomputing, vol. 70, pp. 3056-3062, ur-logo 2007. ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

34. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary Generalized SLFNs General Hidden Layer Mapping PL Output function: f (x) = i=1 β i G(ai , bi , x) The hidden layer output function (hidden layer mapping): h(x) = [G(a1 , b1 , x), · · · , G(aL , bL , x)] The output function needn’t be: Sigmoid: G(ai , bi , x) = g(ai · x + bi ) RBF: G(ai , bi , x) = g (bi x − ai ) Figure 8: SLFN: any type of piecewise continuous G(ai , bi , x). G.-B. Huang, et al., “Universal Approximation Using Incremental Constructive Feedforward Networks with Random tu-logo Hidden Nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006. G.-B. Huang, et al., “Convex Incremental Extreme Learning Machine,” Neurocomputing, vol. 70, pp. 3056-3062, ur-logo 2007. ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

36. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary New Learning Theory: Learning Without Iterative Tuning New Learning View Learning Without Iterative Tuning: Given any nonconstant piecewise continuous function g, if continuous target function f (x) can be approximated by SLFNs with adjustable hidden nodes g then the hidden node parameters of such SLFNs needn’t be tuned. All these hidden node parameters can be randomly generated without the knowledge of the training data. That is, for any continuous target function f and any randomly generated sequence {(ai , bi )L }, i=1 limL→∞ f (x) − fL (x) = limL→∞ f (x) − L β i G(ai , bi , x) = 0 holds with probability one if βi is P i=1 chosen to minimize f (x) − fL (x) , i = 1, · · · , L. G.-B. Huang, et al., “Universal Approximation Using Incremental Constructive Feedforward Networks with Random Hidden Nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006. tu-logo G.-B. Huang, et al., “Convex Incremental Extreme Learning Machine,” Neurocomputing, vol. 70, pp. 3056-3062, 2007. G.-B. Huang, et al., “Enhanced Random Search Based Incremental Extreme Learning Machine,” Neurocomputing, ur-logo vol. 71, pp. 3460-3468, 2008. ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

37. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary New Learning Theory: Learning Without Iterative Tuning New Learning View Learning Without Iterative Tuning: Given any nonconstant piecewise continuous function g, if continuous target function f (x) can be approximated by SLFNs with adjustable hidden nodes g then the hidden node parameters of such SLFNs needn’t be tuned. All these hidden node parameters can be randomly generated without the knowledge of the training data. That is, for any continuous target function f and any randomly generated sequence {(ai , bi )L }, i=1 limL→∞ f (x) − fL (x) = limL→∞ f (x) − L β i G(ai , bi , x) = 0 holds with probability one if βi is P i=1 chosen to minimize f (x) − fL (x) , i = 1, · · · , L. G.-B. Huang, et al., “Universal Approximation Using Incremental Constructive Feedforward Networks with Random Hidden Nodes,” IEEE Transactions on Neural Networks, vol. 17, no. 4, pp. 879-892, 2006. tu-logo G.-B. Huang, et al., “Convex Incremental Extreme Learning Machine,” Neurocomputing, vol. 70, pp. 3056-3062, 2007. G.-B. Huang, et al., “Enhanced Random Search Based Incremental Extreme Learning Machine,” Neurocomputing, ur-logo vol. 71, pp. 3460-3468, 2008. ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

38. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary Uniﬁed Learning Platform Mathematical Model For N arbitrary distinct samples (xi , ti ) ∈ Rn × Rm , SLFNs with L hidden nodes each with output function G(ai , bi , x) are mathematically modeled as L X β i G(ai , bi , xj ) = tj , j = 1, · · · , N (9) i=1 (ai , bi ): hidden node parameters. tu-logo β i : the weight vector connecting the ith hidden Figure 9: Generalized SLFN: any type of piecewise node and the output node. continuous G(ai , bi , x). ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

39. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary Uniﬁed Learning Platform Mathematical Model For N arbitrary distinct samples (xi , ti ) ∈ Rn × Rm , SLFNs with L hidden nodes each with output function G(ai , bi , x) are mathematically modeled as L X β i G(ai , bi , xj ) = tj , j = 1, · · · , N (9) i=1 (ai , bi ): hidden node parameters. tu-logo β i : the weight vector connecting the ith hidden Figure 9: Generalized SLFN: any type of piecewise node and the output node. continuous G(ai , bi , x). ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

40. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary Extreme Learning Machine (ELM) Mathematical Model PL i=1 β i G(ai , bi , xj ) = tj , j = 1, · · · , N, is equivalent to Hβ = T, where h(x1 ) G(a1 , b1 , x1 ) ··· G(aL , bL , x1 ) 2 3 2 3 6 . 7 6 . . 7 H=6 . 7=6 . . 7 (10) 4 . 5 4 . ··· . 5 h(xN ) G(a1 , b1 , xN ) ··· G(aL , bL , xN ) N×L βT tT 2 3 2 3 1 1 . . 6 7 6 7 β=6 and T = 6 (11) 6 7 6 7 . 7 . 7 4 . 5 4 . 5 βTL T tN L×m N×m tu-logo H is called the hidden layer output matrix of the neural network; the ith column of H is the output of the ith hidden node with respect to inputs x1 , x2 , · · · , xN . ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

42. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary Extreme Learning Machine (ELM) Three-Step Learning Model Given a training set ℵ = {(xi , ti )|xi ∈ Rn , ti ∈ Rm , i = 1, · · · , N}, hidden node output function G(a, b, x), and the number of hidden nodes L, 1 Assign randomly hidden node parameters (ai , bi ), i = 1, · · · , L. 2 Calculate the hidden layer output matrix H. 3 Calculate the output weight β: β = H† T. where H† is the Moore-Penrose generalized inverse of hidden layer output matrix H. tu-logo Source Codes of ELM http://www.extreme-learning-machines.org ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

47. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary ELM Learning Algorithm Salient Features “Simple Math is Enough.” ELM is a simple tuning-free three-step algorithm. The learning speed of ELM is extremely fast. The hidden node parameters ai and bi are not only independent of the training data but also of each other. Unlike conventional learning methods which MUST see the training data before generating the hidden node parameters, ELM could generate the hidden node parameters before seeing the training data. Unlike traditional gradient-based learning algorithms which only work for differentiable activation functions, ELM works for all bounded nonconstant piecewise continuous activation functions. Unlike traditional gradient-based learning algorithms facing several issues like local minima, improper learning rate and overﬁtting, etc, ELM tends to reach the solutions straightforward without such trivial issues. The ELM learning algorithm looks much simpler than other popular learning algorithms: neural networks and support vector machines. tu-logo G.-B. Huang, et al., “Can Threshold Networks Be Trained Directly?” IEEE Transactions on Circuits and Systems II, vol. 53, no. 3, pp. 187-191, 2006. M.-B. Li, et al., “Fully Complex Extreme Learning Machine” Neurocomputing, vol. 68, pp. 306-314, 2005. ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

48. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary Output Functions of Generalized SLFNs Ridge regression theory based ELM «−1 I „ T −1 “ ” T T T f(x) = h(x)β = h(x)H HH ⇒ T = h(x)H + HH T C and «−1 I “ ”−1 „ T T T T f(x) = h(x)β = h(x) H H ⇒ H T = h(x) +H H H T C Ridge Regression Theory A positive value I/C can be added to the diagonal of HT H or HHT of the Moore-Penrose generalized inverse H the resultant solution is stabler and tends to have better generalization performance. tu-logo A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems”, Technometrics, vol. 12, no. 1, pp. 55-67, 1970. Citation: G.-B. Huang, H. Zhou, X. Ding and R. Zhang, “Extreme Learning Machine for Regression and Multi-Class ur-logo Classiﬁcation”, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence, October 2010. ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

49. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary Output Functions of Generalized SLFNs Ridge regression theory based ELM «−1 I „ T −1 “ ” T T T f(x) = h(x)β = h(x)H HH ⇒ T = h(x)H + HH T C and «−1 I “ ”−1 „ T T T T f(x) = h(x)β = h(x) H H ⇒ H T = h(x) +H H H T C Ridge Regression Theory A positive value I/C can be added to the diagonal of HT H or HHT of the Moore-Penrose generalized inverse H the resultant solution is stabler and tends to have better generalization performance. tu-logo A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems”, Technometrics, vol. 12, no. 1, pp. 55-67, 1970. Citation: G.-B. Huang, H. Zhou, X. Ding and R. Zhang, “Extreme Learning Machine for Regression and Multi-Class ur-logo Classiﬁcation”, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence, October 2010. ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

50. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary Output functions of Generalized SLFNs Valid for both kernel and non-kernel learning 1 Non-kernel based: «−1 I „ T T f(x) = h(x)H + HH T C and «−1 I „ T T f(x) = h(x) +H H H T C 2 Kernel based: (if h(x) is unknown) K (x, x1 ) T 2 3 «−1 I „ 6 . 7 f(x) = 6 . 7 + ΩELM T 4 . 5 C K (x, xN ) tu-logo where ΩELM i,j = h(xi ) · h(xj ) = K (xi ,xj ) ur-logo Citation: G.-B. Huang, H. Zhou, X. Ding and R. Zhang, “Extreme Learning Machine for Regression and Multi-Class Classiﬁcation”, submitted to IEEE Transactions on Pattern Analysis and Machine Intelligence, October 2010. ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

54. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary Differences between ELM and LS-SVM ELM (uniﬁed for regression, binary/multi-class cases) LS-SVM (for binary class case) 1 Non-kernel based: " #» – " #» – TT TT » – «−1 0 b 0 b 0 I = = „ f(x) = h(x)H T T + HH T T I C + ΩLS−SVM α T I C + ZZT α 1 C where and t1 φ(x1 ) 2 3 «−1 I „ T T f(x) = h(x) +H H H T 6 . 7 C Z=6 . 7 4 . 5 tN φ(xN ) T 2 Kernel based: (if h(x) is unknown) ΩLS−SVM = ZZ tu-logo K (x, x1 ) T 2 3 «−1 I „ 6 . 7 For details, refer to: G.-B. Huang, H. Zhou, X. Ding and R. Zhang, f(x) = 6 . 7 + ΩELM T 4 . 5 C “Extreme Learning Machine for Regression and Multi-Class K (x, xN ) Classiﬁcation”, submitted to IEEE Transactions on Pattern Analysis ur-logo and Machine Intelligence, October 2010. where ΩELM i,j = h(xi ) · h(xj ) = K (xi ,xj ) ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

55. Feedforward Neural Networks Generalized SLFNs ELM ELM Learning Theory ELM and SVM ELM Algorithm OS-ELM ELM and LS-SVM Summary ELM Classiﬁcation Boundaries 2 2 1.5 1.5 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1.5 −1.5 −2 −2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 tu-logo Figure 10: XOR Problem Figure 11: Banana Case ur-logo ELM Web Portal: www.extreme-learning-machines.org Learning Without Iterative Tuning

ELM: Extreme Learning Machine: Learning without iterative tuning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (17)

Similar to ELM: Extreme Learning Machine: Learning without iterative tuning

Similar to ELM: Extreme Learning Machine: Learning without iterative tuning (20)

More from zukun

More from zukun (20)

Recently uploaded

Recently uploaded (20)

ELM: Extreme Learning Machine: Learning without iterative tuning