Extreme learning machine:Theory and applicationsG.-B. Huang, Q.-Y. Zhu, and C.-K. SiewNeurocomputing, 2006 Presenter: James Chou 2012/03/15
Outline2 Introduction Single-hidden layer feed-forward neural networks Neural Network Mathematical Model Back Propagation algorithm ELM Mathematical Model Performance Evaluation Conclusion
Introduction3 For past decades, gradient descent based methods have mainly been used in many learning algorithms of feed-forward neural networks. Traditionally, all the parameters of the feed-forward neural networks need to tune iterative and need a very long time to learn. When the input weights and the hidden layer biases are randomly assigned, SLFNs (single-hidden layer feed-forward neural networks) can be simply considered as a linear system and the output weights (linking the hidden layer to the output layer) can be computed through simple generalized inverse operation.
Introduction (Cont.)4 Based on this idea, this paper proposes a simple learning algorithm for SLFNs called extreme learning. Different from traditional learning algorithms the extreme learning algorithm not only provide the smaller training error but also the better performance.
Single-hidden layer feed-forward5 neural networks N Output F ( i xi ) i 1 θ is the threshold F(．) is activation function Hard Limiter function 1, when x f ( x) 0, when x Sigmoid function 1 f ( x) 1 e x
Single-hidden layer feed-forward6 neural networks (Cont.) G() is activation function L is number of hidden layer nodes
Regression of SinC Function (Cont.)16 100000 training data with 5-20% noise. 100000 testing data is noise free. Demo The result of training 50 times in the ELM! following table. Noise TrainingTime_AVG(sec) TrainingRMS_AVG TestingRMS_AVG 5% 0.6462 0.0113 2.201e-04=0.00022 10% 0.6306 0.0224 2.753e-04=0.00027 15% 0.6427 0.0334 8.336e-04=0.00083 20% 0.6452 0.0449 11.541e-04=0.00115
Conclusion24 Advantages ELM needs less training time compared to popular BP and SVM/SVR. The prediction performance of ELM is usually a little better than BP and close to SVM/SVR in many applications. Only need to turn the parameter L (hidden layer nodes). Nonlinear activation function still can work in ELM. Disadvantages How to find the optimal soluction? Local minima issue. Easy Overfitting.