SlideShare a Scribd company logo
1 of 21
Download to read offline
Comparison of Extreme Learning Machine with SVM
and Performance in Classification
Xiaoyu Sun
Department of Mathematics and Statistics
xysun@bu.edu
May 8, 2015
Xiaoyu Sun (BU) ELM May 8, 2015 1 / 21
Multiple Feed-forward Perceptrons Nerual Networks
Figure: A simple feed-forward perceptron with 8 input units, 2 layers of hidden
units, and 1 output unit. The gray-shading of the vector entries reflects their
numeric value. Cortes and Vapnik [1]
Xiaoyu Sun (BU) ELM May 8, 2015 2 / 21
Support Vector Networks
Figure: SVM can be considered as a specific type of single-hidden-layer
feed-forward network(SLFN). Cortes and Vapnik [1]
Xiaoyu Sun (BU) ELM May 8, 2015 3 / 21
BRIEF of SVMs
Suppose we have a training set {yi , xi }, i = 1, 2..., N, where yi
represents the class of the ith sample
Decision Function
f (x) = sign(
N
i=1
αi yi K(x, xi ) + b)
where K(x, xi ) is the ith hidden node in the last hidden layer of a
perception. αi yi is the output weight.
Optimization Problem
ˆf = argmin(
1
N
N
i=1
V (f (xi ), yi ) + λ f 2
)
Subject to: yi f (xi ) ≥ 1, i = 1, 2..., N
where λ is a user-specified parameter and provides a tradeoff between
training error and separating the margins.
Xiaoyu Sun (BU) ELM May 8, 2015 4 / 21
Soft Margins
Sometimes, we want to make sure we can always get a solution even in
high dimensional feature space, we need to consider the case that data
can’t be separated without errors.
Then we have
yi f (xi ) ≥ 1 − ξi , i = 1, 2..., N
ξi ≥ 0, i = 1, 2..., N
Xiaoyu Sun (BU) ELM May 8, 2015 5 / 21
More SVMs
We can choose different functions V and norm for training error and
smooth penalty. Therefore, there are some variants of SVM.
Here are two widely used examples.
1 Least Square Support Vector Machine (LS-SVM)
2 Proximal Support Vector Machine (PSVM)
Xiaoyu Sun (BU) ELM May 8, 2015 6 / 21
LS-SVM
In LS-SVM,
Minimize: LLS−SVM = 1
2 w 2 + λ1
2
N
i=1 ξ2
i
Subject to: yi (wφ(xi ) + b) = 1 − ξi , i = 1, 2..., N
Here, equality constraints are used, a set of linear equations instead of
quadratic constraints.
LS-SVM is proven to behave well generally and has lower
computational cost in many applications.
The decision function is the same as the conventional SVM.
f (x) = sign(
N
i=1
αi yi K(x, xi ) + b)
Where the Lagrange multipliers αi are proportional to the training
error ξi , while in the conventional SVM, many αi are typically equal
to zero.
Xiaoyu Sun (BU) ELM May 8, 2015 7 / 21
ELM
Suppose training set is {xi , yi };
A standard SLFN with L hidden neurons and activation function g(x) are
mathematically modeled as
L
i=1
βi g(wi xj + bi ) = oj , j = 1, 2..., N (∗)
where wi is the weight connecting the ith hidden neuron and the input
neurons. wi xj denotes the inner product.
βi is the weight connecting the ith hedden neuron and the output neurons.
Xiaoyu Sun (BU) ELM May 8, 2015 8 / 21
Support Vector Networks
Xiaoyu Sun (BU) ELM May 8, 2015 9 / 21
ELM
(∗) can be written in matrix form.
Hβ = Y
where the hidden-layer output matrix, H(w, x)
=



g(w1, x1 + b1) . . . g(wL, x1 + bL)
...
...
...
g(w1, xN + b1) . . . g(wL, xN + bL)



Xiaoyu Sun (BU) ELM May 8, 2015 10 / 21
ELM
Optimization Problem
We need to find ˆw, ˆb, ˆβ, such that,
Minimize: Hβ − Y and β
where Hβ − Y is the training error,
and the distance of margins of the two classes in the feature space is: 2
β
Xiaoyu Sun (BU) ELM May 8, 2015 11 / 21
Theorem
Moore-Penrose generalized inverse of matrix
A matrix G is the Moore-Penrose generalized inverse of matrix A, denoted
as A† , if
AGA = A, GAG = G, (AG)T
= AG, (GA)T
= GA
Theorem
Let there exist a matrix G such that Gy is a minimum norm least-square
solution of a linear system Ax = y as well as
Gy = arg min
x
Ax − y
Then it’s necessary and sufficient that G = A†, the Moore-Penrose
generalized inverse of matrix
Xiaoyu Sun (BU) ELM May 8, 2015 12 / 21
ELM
Therefore, there exists a unique solution of Hβ = Y
β = H†
Y (∗∗)
Algorithm ELM: Given a training set and an activation function g(x), and
hidden neuron number L.
Step 1: Assign arbitrary input weight wi and bias bi , i=1,2...,N.
Step 2: Calculate the hidden layer output matrix H.
Step 3: Calculate the output weight β using (**).
Xiaoyu Sun (BU) ELM May 8, 2015 13 / 21
Comparison Between ELM and SVM
Figure: Effect of sample size on performance of SVM and ELM, red line is SVM
and black line is ELM.
Xiaoyu Sun (BU) ELM May 8, 2015 14 / 21
Comparison Between ELM and SVM
Figure: Effect of feature space dimension on performance of SVM and ELM, red
line is SVM and black line is ELM.
Xiaoyu Sun (BU) ELM May 8, 2015 15 / 21
Comparison Between ELM and SVM
Number of Hidden Nodes Time Accuracy
10 9.560 0.8235294
100 8.164 0.8470588
1000 8.756 0.8529412
7129 18.904 0.8235294
Table: ELM with different number of nodes.
Xiaoyu Sun (BU) ELM May 8, 2015 16 / 21
Comparison Between ELM and SVM
Model Time Accuracy
SVM 55.400 0.8235294
LS-SVM 19.406 0.8529412
ELM 8.756 0.8529412
Table: Results of different model on cancer classification.
Xiaoyu Sun (BU) ELM May 8, 2015 17 / 21
Comparison Between ELM and SVM
Less Human Intervention than SVMs
In ELM, the hidden node parameter (w, b) are generated randomly,
and the performance is not very sensitive to the number of hidden
nodes L, although it has not been proved in theory yet, so the users
only need to specify the cost parameter C.
How does ELM behave with a wide range of number of hidden nodes
and what is the oscillation bound?
Xiaoyu Sun (BU) ELM May 8, 2015 18 / 21
Comparison Between ELM and SVM
Smaller Computational Complexity than SVMs with some dataset
It can be proved when the number of hidden nodes L is much smaller
than the number of training samples N, a.w.a, L N,ELM has
smaller computational cost than SVMs. However, what if N L,
high dimensions dataset such as cancer classification problem.
When comparing ELM and SVMs, we need to set the number of
hidden nodes L and the kernel the same, will ELM always has faster
speed or similar or even better generalization performance than
SVMs?
Xiaoyu Sun (BU) ELM May 8, 2015 19 / 21
Summary
In recent years, ELM has been proved can be applied in regression or
multiclass classification problems
ELM can do learning without iterative tuning which leads to less
human intervention and faster speed. It may be possible to
implement the online sequential variants of the kernel based ELM.
Many discussion about performance of ELM is based on data,
theoretical proof is still needed.
Xiaoyu Sun (BU) ELM May 8, 2015 20 / 21
Thank You
Xiaoyu Sun (BU) ELM May 8, 2015 21 / 21

More Related Content

What's hot

Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbfkylin
 
15 Machine Learning Multilayer Perceptron
15 Machine Learning Multilayer Perceptron15 Machine Learning Multilayer Perceptron
15 Machine Learning Multilayer PerceptronAndres Mendez-Vazquez
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationMohammed Bennamoun
 
Convolution Neural Networks
Convolution Neural NetworksConvolution Neural Networks
Convolution Neural NetworksAhmedMahany
 
Deep learning: Mathematical Perspective
Deep learning: Mathematical PerspectiveDeep learning: Mathematical Perspective
Deep learning: Mathematical PerspectiveYounusS2
 
Isspit presentation
Isspit presentationIsspit presentation
Isspit presentationELVINUGONNA
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Nural network ER.Abhishek k. upadhyay
Nural network  ER.Abhishek k. upadhyayNural network  ER.Abhishek k. upadhyay
Nural network ER.Abhishek k. upadhyayabhishek upadhyay
 
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...Nimai Chand Das Adhikari
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Mohammed Bennamoun
 
Back propagation network
Back propagation networkBack propagation network
Back propagation networkHIRA Zaidi
 
"Deep Learning" Chap.6 Convolutional Neural Net
"Deep Learning" Chap.6 Convolutional Neural Net"Deep Learning" Chap.6 Convolutional Neural Net
"Deep Learning" Chap.6 Convolutional Neural NetKen'ichi Matsui
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 

What's hot (20)

Perceptron in ANN
Perceptron in ANNPerceptron in ANN
Perceptron in ANN
 
Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
 
Associative memory network
Associative memory networkAssociative memory network
Associative memory network
 
15 Machine Learning Multilayer Perceptron
15 Machine Learning Multilayer Perceptron15 Machine Learning Multilayer Perceptron
15 Machine Learning Multilayer Perceptron
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & BackpropagationArtificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
Artificial Neural Networks Lect5: Multi-Layer Perceptron & Backpropagation
 
Convolution Neural Networks
Convolution Neural NetworksConvolution Neural Networks
Convolution Neural Networks
 
Deep learning: Mathematical Perspective
Deep learning: Mathematical PerspectiveDeep learning: Mathematical Perspective
Deep learning: Mathematical Perspective
 
Unit 1
Unit 1Unit 1
Unit 1
 
Isspit presentation
Isspit presentationIsspit presentation
Isspit presentation
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Nural network ER.Abhishek k. upadhyay
Nural network  ER.Abhishek k. upadhyayNural network  ER.Abhishek k. upadhyay
Nural network ER.Abhishek k. upadhyay
 
Perceptron working
Perceptron workingPerceptron working
Perceptron working
 
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
 
Back propagation network
Back propagation networkBack propagation network
Back propagation network
 
Back propagation method
Back propagation methodBack propagation method
Back propagation method
 
"Deep Learning" Chap.6 Convolutional Neural Net
"Deep Learning" Chap.6 Convolutional Neural Net"Deep Learning" Chap.6 Convolutional Neural Net
"Deep Learning" Chap.6 Convolutional Neural Net
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 

Viewers also liked

Version ofslideshare
Version ofslideshareVersion ofslideshare
Version ofslideshareimenurok
 
Principales piedras naturales y mármoles nacionales
Principales piedras naturales y mármoles nacionalesPrincipales piedras naturales y mármoles nacionales
Principales piedras naturales y mármoles nacionalesFrancis Romero
 
Patient Care as Collaboration
Patient Care as CollaborationPatient Care as Collaboration
Patient Care as CollaborationDaniel Sands
 
Vladimir propp (1)
Vladimir propp (1)Vladimir propp (1)
Vladimir propp (1)orla mcd
 
கனவுகளின் நாயகன் டாக்டர் அப்துல் கலாம்
கனவுகளின் நாயகன் டாக்டர் அப்துல் கலாம்கனவுகளின் நாயகன் டாக்டர் அப்துல் கலாம்
கனவுகளின் நாயகன் டாக்டர் அப்துல் கலாம்Lawyer Dr Chandrika Subramaniyan
 
Patient Care as Collaboration: Why Health Care Fails and How IT Can Maintain ...
Patient Care as Collaboration: Why Health Care Fails and How IT Can Maintain ...Patient Care as Collaboration: Why Health Care Fails and How IT Can Maintain ...
Patient Care as Collaboration: Why Health Care Fails and How IT Can Maintain ...Daniel Sands
 
Participatory Medicine: A New Model of Health Care Delivery
Participatory Medicine: A New Model of Health Care DeliveryParticipatory Medicine: A New Model of Health Care Delivery
Participatory Medicine: A New Model of Health Care DeliveryDaniel Sands
 
The Evolution of Healthcare, or How Did We Get Here?
The Evolution of Healthcare, or How Did We Get Here?The Evolution of Healthcare, or How Did We Get Here?
The Evolution of Healthcare, or How Did We Get Here?Daniel Sands
 
Powerpoint prepositions
Powerpoint prepositionsPowerpoint prepositions
Powerpoint prepositionslbarrios1985
 
Manual de initiere in limba romana si de orientare culturala pentru straini
Manual de initiere in limba romana si de orientare   culturala pentru strainiManual de initiere in limba romana si de orientare   culturala pentru straini
Manual de initiere in limba romana si de orientare culturala pentru strainiFlorina Pirjol
 

Viewers also liked (20)

Version ofslideshare
Version ofslideshareVersion ofslideshare
Version ofslideshare
 
Principales piedras naturales y mármoles nacionales
Principales piedras naturales y mármoles nacionalesPrincipales piedras naturales y mármoles nacionales
Principales piedras naturales y mármoles nacionales
 
4 seasian buddhism
4   seasian buddhism4   seasian buddhism
4 seasian buddhism
 
Patient Care as Collaboration
Patient Care as CollaborationPatient Care as Collaboration
Patient Care as Collaboration
 
Vladimir propp (1)
Vladimir propp (1)Vladimir propp (1)
Vladimir propp (1)
 
Simple compound complex
Simple compound complexSimple compound complex
Simple compound complex
 
Simple compound complex
Simple compound complexSimple compound complex
Simple compound complex
 
Reading strategy
Reading strategyReading strategy
Reading strategy
 
Outline sbm
Outline sbmOutline sbm
Outline sbm
 
கனவுகளின் நாயகன் டாக்டர் அப்துல் கலாம்
கனவுகளின் நாயகன் டாக்டர் அப்துல் கலாம்கனவுகளின் நாயகன் டாக்டர் அப்துல் கலாம்
கனவுகளின் நாயகன் டாக்டர் அப்துல் கலாம்
 
Hinduism part 1 1
Hinduism part 1 1Hinduism part 1 1
Hinduism part 1 1
 
Konsep syntax
Konsep syntaxKonsep syntax
Konsep syntax
 
Patient Care as Collaboration: Why Health Care Fails and How IT Can Maintain ...
Patient Care as Collaboration: Why Health Care Fails and How IT Can Maintain ...Patient Care as Collaboration: Why Health Care Fails and How IT Can Maintain ...
Patient Care as Collaboration: Why Health Care Fails and How IT Can Maintain ...
 
Participatory Medicine: A New Model of Health Care Delivery
Participatory Medicine: A New Model of Health Care DeliveryParticipatory Medicine: A New Model of Health Care Delivery
Participatory Medicine: A New Model of Health Care Delivery
 
Answers sheet
Answers sheetAnswers sheet
Answers sheet
 
counselling PROCESS
counselling PROCESScounselling PROCESS
counselling PROCESS
 
The Evolution of Healthcare, or How Did We Get Here?
The Evolution of Healthcare, or How Did We Get Here?The Evolution of Healthcare, or How Did We Get Here?
The Evolution of Healthcare, or How Did We Get Here?
 
Powerpoint prepositions
Powerpoint prepositionsPowerpoint prepositions
Powerpoint prepositions
 
AQA GCSE P1
AQA GCSE P1 AQA GCSE P1
AQA GCSE P1
 
Manual de initiere in limba romana si de orientare culturala pentru straini
Manual de initiere in limba romana si de orientare   culturala pentru strainiManual de initiere in limba romana si de orientare   culturala pentru straini
Manual de initiere in limba romana si de orientare culturala pentru straini
 

Similar to elm

High performance extreme learning machines a complete toolbox for big data a...
High performance extreme learning machines  a complete toolbox for big data a...High performance extreme learning machines  a complete toolbox for big data a...
High performance extreme learning machines a complete toolbox for big data a...redpel dot com
 
An introduction to deep learning
An introduction to deep learningAn introduction to deep learning
An introduction to deep learningVan Thanh
 
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSSLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSIJCI JOURNAL
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..butest
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..butest
 
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector MachinesA Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector MachinesMohamed Farouk
 
Support Vector Machines for Computing Action Mappings in Learning Classifier ...
Support Vector Machines for Computing Action Mappings in Learning Classifier ...Support Vector Machines for Computing Action Mappings in Learning Classifier ...
Support Vector Machines for Computing Action Mappings in Learning Classifier ...guest9b3f63
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Ono Shigeru
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorialbutest
 
Tutorial on Support Vector Machine
Tutorial on Support Vector MachineTutorial on Support Vector Machine
Tutorial on Support Vector MachineLoc Nguyen
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learningKazuki Fujikawa
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machinesbutest
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsDrBaljitSinghKhehra
 

Similar to elm (20)

High performance extreme learning machines a complete toolbox for big data a...
High performance extreme learning machines  a complete toolbox for big data a...High performance extreme learning machines  a complete toolbox for big data a...
High performance extreme learning machines a complete toolbox for big data a...
 
An introduction to deep learning
An introduction to deep learningAn introduction to deep learning
An introduction to deep learning
 
SVM (2).ppt
SVM (2).pptSVM (2).ppt
SVM (2).ppt
 
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKSSLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
SLIDING WINDOW SUM ALGORITHMS FOR DEEP NEURAL NETWORKS
 
large scale Machine learning
large scale Machine learninglarge scale Machine learning
large scale Machine learning
 
N ns 1
N ns 1N ns 1
N ns 1
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..
 
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector MachinesA Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
A Multi-Objective Genetic Algorithm for Pruning Support Vector Machines
 
UofT_ML_lecture.pptx
UofT_ML_lecture.pptxUofT_ML_lecture.pptx
UofT_ML_lecture.pptx
 
Support Vector Machines for Computing Action Mappings in Learning Classifier ...
Support Vector Machines for Computing Action Mappings in Learning Classifier ...Support Vector Machines for Computing Action Mappings in Learning Classifier ...
Support Vector Machines for Computing Action Mappings in Learning Classifier ...
 
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
Goodfellow, Bengio, Couville (2016) "Deep Learning", Chap. 7
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
SVM Tutorial
SVM TutorialSVM Tutorial
SVM Tutorial
 
Tutorial on Support Vector Machine
Tutorial on Support Vector MachineTutorial on Support Vector Machine
Tutorial on Support Vector Machine
 
Matching networks for one shot learning
Matching networks for one shot learningMatching networks for one shot learning
Matching networks for one shot learning
 
Lecture4 xing
Lecture4 xingLecture4 xing
Lecture4 xing
 
Linear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector MachinesLinear Discrimination Centering on Support Vector Machines
Linear Discrimination Centering on Support Vector Machines
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 
Artificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning ModelsArtificial Neural Networks-Supervised Learning Models
Artificial Neural Networks-Supervised Learning Models
 

elm

  • 1. Comparison of Extreme Learning Machine with SVM and Performance in Classification Xiaoyu Sun Department of Mathematics and Statistics xysun@bu.edu May 8, 2015 Xiaoyu Sun (BU) ELM May 8, 2015 1 / 21
  • 2. Multiple Feed-forward Perceptrons Nerual Networks Figure: A simple feed-forward perceptron with 8 input units, 2 layers of hidden units, and 1 output unit. The gray-shading of the vector entries reflects their numeric value. Cortes and Vapnik [1] Xiaoyu Sun (BU) ELM May 8, 2015 2 / 21
  • 3. Support Vector Networks Figure: SVM can be considered as a specific type of single-hidden-layer feed-forward network(SLFN). Cortes and Vapnik [1] Xiaoyu Sun (BU) ELM May 8, 2015 3 / 21
  • 4. BRIEF of SVMs Suppose we have a training set {yi , xi }, i = 1, 2..., N, where yi represents the class of the ith sample Decision Function f (x) = sign( N i=1 αi yi K(x, xi ) + b) where K(x, xi ) is the ith hidden node in the last hidden layer of a perception. αi yi is the output weight. Optimization Problem ˆf = argmin( 1 N N i=1 V (f (xi ), yi ) + λ f 2 ) Subject to: yi f (xi ) ≥ 1, i = 1, 2..., N where λ is a user-specified parameter and provides a tradeoff between training error and separating the margins. Xiaoyu Sun (BU) ELM May 8, 2015 4 / 21
  • 5. Soft Margins Sometimes, we want to make sure we can always get a solution even in high dimensional feature space, we need to consider the case that data can’t be separated without errors. Then we have yi f (xi ) ≥ 1 − ξi , i = 1, 2..., N ξi ≥ 0, i = 1, 2..., N Xiaoyu Sun (BU) ELM May 8, 2015 5 / 21
  • 6. More SVMs We can choose different functions V and norm for training error and smooth penalty. Therefore, there are some variants of SVM. Here are two widely used examples. 1 Least Square Support Vector Machine (LS-SVM) 2 Proximal Support Vector Machine (PSVM) Xiaoyu Sun (BU) ELM May 8, 2015 6 / 21
  • 7. LS-SVM In LS-SVM, Minimize: LLS−SVM = 1 2 w 2 + λ1 2 N i=1 ξ2 i Subject to: yi (wφ(xi ) + b) = 1 − ξi , i = 1, 2..., N Here, equality constraints are used, a set of linear equations instead of quadratic constraints. LS-SVM is proven to behave well generally and has lower computational cost in many applications. The decision function is the same as the conventional SVM. f (x) = sign( N i=1 αi yi K(x, xi ) + b) Where the Lagrange multipliers αi are proportional to the training error ξi , while in the conventional SVM, many αi are typically equal to zero. Xiaoyu Sun (BU) ELM May 8, 2015 7 / 21
  • 8. ELM Suppose training set is {xi , yi }; A standard SLFN with L hidden neurons and activation function g(x) are mathematically modeled as L i=1 βi g(wi xj + bi ) = oj , j = 1, 2..., N (∗) where wi is the weight connecting the ith hidden neuron and the input neurons. wi xj denotes the inner product. βi is the weight connecting the ith hedden neuron and the output neurons. Xiaoyu Sun (BU) ELM May 8, 2015 8 / 21
  • 9. Support Vector Networks Xiaoyu Sun (BU) ELM May 8, 2015 9 / 21
  • 10. ELM (∗) can be written in matrix form. Hβ = Y where the hidden-layer output matrix, H(w, x) =    g(w1, x1 + b1) . . . g(wL, x1 + bL) ... ... ... g(w1, xN + b1) . . . g(wL, xN + bL)    Xiaoyu Sun (BU) ELM May 8, 2015 10 / 21
  • 11. ELM Optimization Problem We need to find ˆw, ˆb, ˆβ, such that, Minimize: Hβ − Y and β where Hβ − Y is the training error, and the distance of margins of the two classes in the feature space is: 2 β Xiaoyu Sun (BU) ELM May 8, 2015 11 / 21
  • 12. Theorem Moore-Penrose generalized inverse of matrix A matrix G is the Moore-Penrose generalized inverse of matrix A, denoted as A† , if AGA = A, GAG = G, (AG)T = AG, (GA)T = GA Theorem Let there exist a matrix G such that Gy is a minimum norm least-square solution of a linear system Ax = y as well as Gy = arg min x Ax − y Then it’s necessary and sufficient that G = A†, the Moore-Penrose generalized inverse of matrix Xiaoyu Sun (BU) ELM May 8, 2015 12 / 21
  • 13. ELM Therefore, there exists a unique solution of Hβ = Y β = H† Y (∗∗) Algorithm ELM: Given a training set and an activation function g(x), and hidden neuron number L. Step 1: Assign arbitrary input weight wi and bias bi , i=1,2...,N. Step 2: Calculate the hidden layer output matrix H. Step 3: Calculate the output weight β using (**). Xiaoyu Sun (BU) ELM May 8, 2015 13 / 21
  • 14. Comparison Between ELM and SVM Figure: Effect of sample size on performance of SVM and ELM, red line is SVM and black line is ELM. Xiaoyu Sun (BU) ELM May 8, 2015 14 / 21
  • 15. Comparison Between ELM and SVM Figure: Effect of feature space dimension on performance of SVM and ELM, red line is SVM and black line is ELM. Xiaoyu Sun (BU) ELM May 8, 2015 15 / 21
  • 16. Comparison Between ELM and SVM Number of Hidden Nodes Time Accuracy 10 9.560 0.8235294 100 8.164 0.8470588 1000 8.756 0.8529412 7129 18.904 0.8235294 Table: ELM with different number of nodes. Xiaoyu Sun (BU) ELM May 8, 2015 16 / 21
  • 17. Comparison Between ELM and SVM Model Time Accuracy SVM 55.400 0.8235294 LS-SVM 19.406 0.8529412 ELM 8.756 0.8529412 Table: Results of different model on cancer classification. Xiaoyu Sun (BU) ELM May 8, 2015 17 / 21
  • 18. Comparison Between ELM and SVM Less Human Intervention than SVMs In ELM, the hidden node parameter (w, b) are generated randomly, and the performance is not very sensitive to the number of hidden nodes L, although it has not been proved in theory yet, so the users only need to specify the cost parameter C. How does ELM behave with a wide range of number of hidden nodes and what is the oscillation bound? Xiaoyu Sun (BU) ELM May 8, 2015 18 / 21
  • 19. Comparison Between ELM and SVM Smaller Computational Complexity than SVMs with some dataset It can be proved when the number of hidden nodes L is much smaller than the number of training samples N, a.w.a, L N,ELM has smaller computational cost than SVMs. However, what if N L, high dimensions dataset such as cancer classification problem. When comparing ELM and SVMs, we need to set the number of hidden nodes L and the kernel the same, will ELM always has faster speed or similar or even better generalization performance than SVMs? Xiaoyu Sun (BU) ELM May 8, 2015 19 / 21
  • 20. Summary In recent years, ELM has been proved can be applied in regression or multiclass classification problems ELM can do learning without iterative tuning which leads to less human intervention and faster speed. It may be possible to implement the online sequential variants of the kernel based ELM. Many discussion about performance of ELM is based on data, theoretical proof is still needed. Xiaoyu Sun (BU) ELM May 8, 2015 20 / 21
  • 21. Thank You Xiaoyu Sun (BU) ELM May 8, 2015 21 / 21