SlideShare a Scribd company logo
1 of 83
Download to read offline
EXTREME LEARNING MACHINES
Nimai Chand Das Adhikari
INDIAN INSTITUTE OF SPACE SCIENCE AND TECHNOLOGY
Advisor:
Dr. Raju K George
Dean (R & D and Student Welfare)
6th June, 2016
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 1 / 83
Overview
1 Introduction
2 Brief Outline of the Thesis
3 Concepts Used
4 Theorems Used
5 Extreme learning Machine
6 Extreme Learning Machines - Two Hidden Layer Feed Forward Network
7 Extreme Learning Machines - Auto Encoders
8 Heirarchical Extreme Learning Machines
9 Recommender Sytems for building a music app using ELM
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 2 / 83
Introduction
Mainly the neural networks are trained with the training sets (xi , ti )N
i=1.
Where N is the number of the training examples fed to the neural
network.
1 For a function in a training set, A SLFN with maximum of N hidden
nodes with any non-linear activation function can easily learn N
distinct observations with zero error.
2 The gradient based learning persisted to a huge extent and was the
sole theory behind all the learning algorithms of the feed forward
neural networks.
3 The drawbacks of this method i.e the slow learning due to improper
learning steps and converging to a local minima, has asked for a
change in the methodology. Apart from this, many iterative steps are
required for the optimized results to occur.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 3 / 83
Brief Outline of the thesis
1 A brief Description on the concepts behind the ELM.
2 A new approach to the ELM using the Two Hidden layers instead of
the Single layer which has been the basis for the ELM.
3 ELM Auto Encoders and Heirarchical ELM which have two basic
learnings in that:
1 unsupervised learning for the features extraction.
2 basic ELM for the classification in the last layer.
4 Concepts and ideas behind the building of a new musical app using
the Recommendation Systems and Neural Networks.
5 Conclusions and Future Works.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 4 / 83
Single Layer Feed Forward Neural Networks
The SLFN’s with an arbitrary chosen inputs weights can easily learn N
distinct observations with a very small error.
Input weights and biases is not an issue.
The system becomes linear and output can be easily calculated using
the generalized inverse.
The learning speed becomes extremely fast.
The algorithm that we are going to learn is Extreme Learning Machine
(ELM).
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 5 / 83
Single Layer Feed Forward Neural Networks
Figure: Single Layer Feed Forward Neural Networks
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 6 / 83
Concepts Used
Moore Penrose generalized Inverse
Least Squares Solution
Random Features Mappings and Kernels
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 7 / 83
Concepts Used
Moore Penrose generalized Inverse
Least Squares Solution
Random Features Mappings and Kernels
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 8 / 83
Moore Penrose Generalized Inverse
Consider a system of n × n linear system:
Ax = b, A ∈ Mnn, b ∈ n
This system will have a unique solution iff matrix A is full rank. Then
x = A−1
b
Again, let the system be
Ax = b, A ∈ Mmn, b ∈ m
There are two cases:
m > n, System is overdetermined
m < n, System is under determines
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 9 / 83
Moore Penrose Generalized Inverse: Over Determined
Systems
In this case, there is no such x ∈ n such that:
Ax = b
b − Ax = 0
Hence, there will be no exact solution and the residual for this system can
be:
r(x) = b − Ax, x ∈ n
Therefore we will find a vector for which
||r(x)|| = ||b − Ax||
is minimum.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 10 / 83
Moore Penrose Generalized Inverse: Definitions
A vector which minimizes ||r(x)||2 is the least squares solution for the
system defined in the above slide. The least squares solution x which
has the minimum2-norm is called the minimum norm least squares
solution.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 11 / 83
Moore penrose Inverse
Let A be any m × n matrix then the pseudo inverse of A is an n × m
matrix A† which satisfies the following Moore-Penrose conditions:
(MP1) AA†A = A
(MP2) A†AA† = A†
(MP3) (AA†)T = AA†
(MP4) (A†A)T = A†A
Theorem:
Let A be any real m × n matrix. Then
the pseudoinverse of A is unique
A††
= A
(AT )† = (A†)T
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 12 / 83
Moore Penrose Inverse - How to compute?
If A is an n × n matrix and non singular, then
A†
= A−1
If A is an m × n real matrix with m ≥ n and having full rank, i.e.
rank(A) = n, then
A†
= (AT
A)−1
)AT
If A is an m × n real matrix with m ≤ n and having full rank, i.e.
rank(A) = m, then
A†
= AT
(AAT
)−1
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 13 / 83
Moore Penrose Inverse - Examples
Consider,
A =
1
2
has an inverse,
A†
= 1/5 2/5
While its other inverse A−L = [3 − 1] doesn’t satify the fourth condition
while the previous satisfies the above four conditions.
Note:
1. A matrix that satisfies the first two conditions is called a Generalized
inverse.
2. While the uniqueness is established by the last two conditions
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 14 / 83
Concepts Used
Moore Penrose generalized Inverse
Least Squares Solution
Random Features Mappings and Kernels
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 15 / 83
Least Squares Solution
Two different cases that arise for solving the linear system:
Ax = b, A ∈ Mmn, x ∈ n
, b ∈ m
When m > n, the linear system is called over determined
When m < n, the linear system is called under determined
A system is solvable iff
rank(A) = rank(A|b)
If there is no exact solution for a linear system, we get the residual as
r(x) = b − Ax, x ∈ n
Then we try to seek a vector x ∈ n for which ||r(x)||2 = ||b − Ax||2 is
minimum.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 16 / 83
Minimum Norm Least Squares Solution
The least-squares solution x which has the minimum 2-norm is called the
minimum norm least squares solution, i.e. if we say that z is any other
least squares solution to the above system Ax = b, then we must have
||x||2 < ||z||2
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 17 / 83
Basic Least Squares Theorem
Let there be a linear system
Ax = b
with A being a real m × n matrix with m ≥ n and b ∈ m. Then
1 The above linear system has a unique least-squares solution x iff A
has a full rank.
2 The above linear system has infinitely many least-squares solutions iff
A is rank-deficient.
3 The minimum norm least-squares solution to the system is given by
x = A†
b
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 18 / 83
Concepts Used
Moore Penrose generalized Inverse
Least Squares Solution
Random Features Mappings and Kernels
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 19 / 83
Random Features Mappings and Kernels
Random Feature Mappings: Hidden Layer Output vector i.e
h(x) = [h1(x), ..., hL(x)] is mostly calculated if all its hidden node
parameters are randomly generated according to any continuous sampling
distribution probability and such that h(x) has the universal approximation
capability, i.e. ||h(x)β − f (x)|| = limL−→∞|| L
i=1 βi hi (x) − f (x)|| = 0
holds with a probability 1 with appropriate output weights. Thus,
h(x) = G(a1, b1, x), ..., G(aL, bL, x)
here, the G(a,b,x) is a non linear piecewise function that satisfies the ELM
universal approximation capability theorem.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 20 / 83
Random Features Mappings and Kernels
Kernels: Instead of h(x) we can apply kernel matrix in ELM.
ˆH = HHT
: ˆHi,j = h(xi ).h(xj ) = K(xi , xj )
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 21 / 83
Random Features Mappings and Kernels
Feature Mapping Matrix: hi (x) denotes the output of the hidden layer
with the regard to the input space, and the feature mapping matrix is
mostly irrelevant to the target ti . Also the feature mapping is reasonable
to having the feature mapping matrix which is independent from its target
values.
Figure: ELM Feature Mapping
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 22 / 83
Theorems
Universal Approximation Capability
Given any bounded non-constant piece-wise continuous function as the
activation function used in the hidden neurons, if by tuning parameters of
hidden neuron activation function SLFNs can easily approximate any
target continuous function, then for any continuous target function f (x)
and any randomly generated function sequence
[hi (x)]L
i=1, limL−→∞|| L
i=1 βi hi (x) − f (x)|| = 0 holds with a probability 1
with appropriate output weights β.
Classification Capability Theorem
Given a feature mapping h(x), if h(x)β is dense in C(Rd ) or in C(M),
where M is a compact set in Rd , then a generalized SLFN with such a
hidden layer mapping h(x) can seperate arbitrary disjoint region of any
shapes in Rd or M.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 23 / 83
Extreme Learning Machines-Introduction
Gradient descent based methods have mainly been the backbone for
the feed forward neural networks.
In such case, the parameters for the feed forward neural networks
need to be tuned and for learning, the time taken is very large.
If in case of SLFNs, the input hidden layer weights and biases are
randomly assigned then the system will be linear and can be
computed through the simple generalized inverse operation.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 24 / 83
Extreme Learning Machines
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 25 / 83
Extreme Learning Machines - SLFNs
Output of the hidden nodes
G(ai , bi , x) = g(ai .x + bi )
here, ai : the weight vector connecting the ith hidden node and the input
nodes.
and bi : the threshold of the ith hidden node.
Output of the Network
fL(x) =
L
i=1
βi .G(ai , bi , x)
here, G(.) is the activation function.
and L is the number of the hidden layer nodes.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 26 / 83
Extreme Learning Machine - Mathematical Model
Mathematical Model
L
i=1 βi G(ai , bi , xj ) = tj , j = 1, ..., N is equivalent to Hβ = T, where
H(a1, ..., aL, b1, ..., bL, x1, ..., xN) =






G(a1, b1, x1) ... G(aL, bL, x1)
. .
. .
. ... .
G(a1, b1, xN) ... G(aL.bL, xN)






N×L
β =






βT
1
.
.
.
βT
L






L×m
and T =






tT
1
.
.
.
tT
N






N×m
, here H is the hidden-layer-output
matrix
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 27 / 83
Extreme Learning Machines - Mathematical and Learning
Models
Mathematical Model
Any continuous traget function f (x) can be approximated by SLFNs.
Thus, given a small positive value , for SLFNs with enough number of
hidden layer nodes (L) we will have
||fL(x) − f (x)|| <
Learning Model
For N arbitrary distinct samples (xi , ti ) ∈ Rn × Rm, SLFNs with L
hidden nodes and activation function g(x) are mathematically
modeled as
fL(xj ) = oj , j = 1, ..., N
Cost Function: E = N
j=1 ||oj − tj ||2
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 28 / 83
Extreme Learning Machines - Learning Model
Learning Model
The target is to minimize the cost function E by adjusting the network
parameters βi , ai , bj
If,
= 0
FL(x) = f (x) = T, T is the known target and
Cost function = 0
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 29 / 83
Extreme Learning Machines - Algorithm
Three Step Learning Model
Given a training set S = (xi , ti )|xi ∈ Rn, ti ∈ Rm, i = 1, ..., N, activation
function G and the number of hidden nodes L,
1 Assign randomly input weight vectors ai and the hidden node bias bi ,
i = 1, ..., L.
2 Calculate the hidden layer output matrix H.
3 Calculate the output weight β :
β = H†
T
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 30 / 83
Extreme Learning Machines - Performance Comparison
Algorithm Training Rate Testing Rate Training Time (secs)
ELM 78.06 76.78 0.405290
SVM-linear 78.76 77.6518 0.6563
BP 77.88 73.04348 15.2361
Table: Performance Comparison for the Diabetes dataset
Figure: performance for diabates
dataset
Figure: performance for diabates
dataset
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 31 / 83
Extreme Learning Machines - Performance Comparison
DATASET Testing Rate Training Time (secs)
Glass 74.2115 0.044677
Hepatitis 70.380 0.037706
Breast Cancer 83.3333 0.024485
Table: Performance Comparison For various DATASETS by ELM
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 32 / 83
Extreme Learning Machines - Performance Comparison
DATASET Testing Rate Training Time (secs)
Glass 37.4129 10.548456
Hepatitis 48.3585 7.4125
Breast Cancer 77.609 2.3564
Table: Performance Comparison For various DATASETS by BP
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 33 / 83
Extreme Learning Machines - Performance Comparison
DATASET Testing Rate Training Time (secs)
Glass 34.5750 0.2354677
Hepatitis 63.0435 0.0632
Breast Cancer 66.67 0.2261
Table: Performance Comparison For various DATASETS by SVM
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 34 / 83
Extreme Learning Machines - Performance Comparison
Figure: performance comparison
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 35 / 83
Extreme Learning Machines - Performance Comparison:
SinC function
Figure: performance for SinC
Function
Figure: performance for SinC
function
Figure: Comparison for the Training Time
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 36 / 83
Extreme Learning Machines - Two Hidden Layer Feed
Forward Neural Networks
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 37 / 83
Extreme Learning Machines - Two Hidden Layer Feed
Forward Neural Networks
1 Extreme Learning Machine (ELM) for single-hidden layer feed-forward
neural networks (SLFNs) which randomly chooses hidden nodes and
biases and then analytically determines the output weights of SLFNs.
2 This algorithm tends to provide good generalization performance with
an extremely fast learning speed.
3 If instead one hidden layer two hidden layers are put and the weights
and biases are randomly generated according to theories and the
theorems discussed above, then we get ELM-TLFN.
4 Similar to the SLFN ELMS, after the input weights and the hidden
layer biases are chosen arbitrarily, SLFNs can be simply considered as
a linear system and the output weights of the TLFN-ELM can be
analytically determined through the simple generalized inverse
operation of the hidden layer output matrices.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 38 / 83
Extreme Learning Machines - Two Hidden Layer Feed
Forward Neural Networks : Proposal
The weights and the biases of the two hidden layers are generated
randomly as done in case of the SLFNs.Then,
ˆN2
i=1
βi g2(Wi .(
ˆN1
j=1
g1(wj .xk + bj ))) = Ok, ∀k = 1, ..., N
Here, N is the total number of the training examples of the system. Also
wi = [wi1, ..., win]T and wj = [wj1, ..., wj ˆN1
]T are the weight matrices that
are generated randomly for the system. Alsoβi = [βi1, ..., βim]T be the
output weight matrix. Then, like the SLFn network the above system can
be written as:
Hβ = T
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 39 / 83
Extreme Learning Machines - Two Hidden Layer Feed
Forward Neural Networks: H matrix
Hidden Layer Output Matrix
Here the H is the hidden layer output matrix for the system and can be
generated as
H =


g2(wj1.g1 + b1) . . . g2(wj ˆN1
.g1 + b ˆN2
)
. . . . .
g2(wj1.g ˆN1
+ b1) . . . g2(wj ˆN1
.g ˆN1
+ b ˆN2
)


ˆN1× ˆN2
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 40 / 83
Extreme Learning Machines - Two Hidden layer Feed
Forward Neural Networks: Algorithm
Given a training set, activation function g1(x) and g2(x) and the hidden
neurons ˆN1 and ˆN2 for the first and second hidden layers respectively
1 Assign arbitrary input weight w1 and w2. Also the biases B1 and B2.
2 Calculate the hidden-layer output matrix of the system.
3 Calculate the output weight β:
β = H†
T
Here, H, β and T are already described in the previous sections.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 41 / 83
Extreme Learning Machines - Two Hidden layer Feed
Forward Neural Networks: Performance Comparison
Algorithm Testing Rate Training Time (secs) Hidden Nodes
ELM-SLFN 74.2115 0.044677 20
ELM-TLFN 75.46929 0.16709 10-20
Table: PERFORMANCE COMPARISON for GLASS Dataset with ELM SLFN and
ELM TLFN
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 42 / 83
Extreme Learning Machines - Two Hidden layer Feed
Forward Neural Networks: Performance Comparison
Figure: performance comparison for various dataset
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 43 / 83
Sparse Autoencoders- Introduction
Sparse Autoencoder learning algorithm is one approach to automatically
learn features from unlabeled data.In some domains, such as computer
vision, this approach is not by itself competitive with the best
hand-engineered features, but the features it can learn do turn out to be
useful for a range of problems (including ones in audio, text, etc).
Further, there are more sophisticated versions of the sparse autoencoder
that do surprisingly well, and in many cases are competitive with or
superior to even the best hand-engineered representations.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 44 / 83
Sparse Autoencoders- Structural View
Figure: Sparse Autoencoders network
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 45 / 83
Sparse Autoencoders-A brief Idea
So far, we have described the application of neural networks to supervised
learning, in which we are have labeled training examples.Now suppose we
have only unlabeled training examples set {x1, x2, x3...} where xi ∈ n.
An autoencoder neural network is an unsupervised learning algorithm
that applies backpropagation, setting the target values to be equal to
the inputs. i.e yi ≈ xi
The autoencoder tries to learn a function hW ,b(x) ≈ x. In other
words, it is trying to learn an approximation to the identity function,
so as to output ˆx that is similar to x.
The identity function seems a particularly trivial function to be trying
to learn; but by placing constraints on the network, such as by
limiting the number of hidden units, we can discover interesting
structure about the data.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 46 / 83
Sparse Autoencoder- Example
Features Representation
suppose the inputs x are the pixel intensity values from a 10 × 10 i.e
x ∈ 100, image (100 pixels) so n = 100, and there are s2 = 50 hidden
units in layer L2. but also we have y ∈ 100.
Reconstruction
Since there are only 50 hidden units, the network is forced to learn a
compressed representation of the input. i.e. given only the vector of
hidden unit activations a(2) ∈ 50 it must try to reconstruct the 100-pixel
input x.
Different layers
When the number of hidden units is large, we can still discover interesting
structure, by imposing other constraints on the network. In particular, if
we impose a sparsity constraint on the hidden units, then the autoencoder
will still discover interesting structure in the data.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 47 / 83
Sparse Autoencoders- Sparsity Constraints
Informally, we will think of a neuron as being active (or as firing) if its
output value is close to 1, or as being inactive if its output value is close
to 0. We would like to constrain the neurons to be inactive most of the
time. Now a
(2)
j (x) denotes the activation of the hidden unit j using the
input x. Let us define
ˆρj = 1/m
m
i=1
[a
(2)
j (xi
)]
be the average activation of the hidden unit j (averaged over the training
set. Then we would like to enforce the constraint
ˆρj = ρ
where ρ is a sparsity parameter(typically a small value ≈ 0.05).
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 48 / 83
Sparse Autoencoders- Sparsity Constraints
We would like the average activation of each hidden neuron j to be close
to 0.05 (say). To satisfy this constraint, the hidden units activations must
mostly be near 0. To achieve this another penalty term is introduced to
our optimization objective which penalizes ˆρj deviating from the ρ.
s2
j=1
ρ log (ρ/ ˆρj ) + (1 − ρ) log ((1 − ρ)/(1 − ˆρj ))
Here s2 is the number of neurons in the hidden layer, and the index jis the
summing over the hidden units in out network.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 49 / 83
Sparse Autoencoders- Kullback-Leibler (KL) divergence
The above penalty term
ρ log (ρ/ ˆρj ) + (1 − ρ) log ((1 − ρ)/(1 − ˆρj ))
is known as Kullback-Leibler (KL) divergence between a Bernoulli random
variable with mean ρ and a Bernoulli random variable with mean ˆρj .
KL-divergence is a standard function for measuring how different two
different distributions are.
Property:
KL(ρ|| ˆρj ) =



0 : ˆρj = ρ
increases
monotonically : ˆρj = ρ
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 50 / 83
Sparse Autoencoders- KL Divergence Graph
Figure: KL Divergence plot
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 51 / 83
Sparse Autoencoders- Cost Function
Now the overall cost function can be written as :
Jsparse(W , b) = J(W , b) + β
s2
j=1
KL(ρ|| ˆρj )
Let us define the two terms:
J(W,b) is the cost function used for the BP Algorithm. i.e
J(W , b) = [1/m
m
i=1
J(W , b; Xi
, yi
)] + λ/2
nl −1
l=i
sl
i=1
sl +1
j=1
(W
(l)
ji )2
The first term in the definition of J(W , b) is an average
sum-of-squares error term. The second term is a regularization term
(also called a weight decay term) that tends to decrease the
magnitude of the weights, and helps prevent overfitting.
In the second term λ controls the weight of the sparsity penalty term.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 52 / 83
Representational Learning with ELM for Big Data
(ELM-AE) - Introduction
A machine learning algorithms generalization capability depends on the
dataset, which is why engineering a datasets features to represent the
datas salient structure is important. However,feature engineering requires
domain knowledge and human ingenuity to generate appropriate features.
Similar to deep networks, ELM (ML-ELM) performs layer-by-layer
unsupervised learning. This article also introduces the ELM auto-encoder
(ELM-AE), which represents features based on singular values. Resembling
deep networks, ML-ELM stacks on top of ELM-AE to create a multilayer
neural network. It learns significantly faster than existing deep networks,
outperforming DBNs, SAEs, and SDAEs and performing on par with
DBMs on the MNIST dataset.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 53 / 83
(ELM-AE)- A brief of ELM
The ELM for SLFNs shows that hidden nodes can be randomly generated.
The input data is mapped to L-dimensional ELM random feature space,
and the network output is:
fL(x) =
L
i=1
βi hi (x) = h(x)β
where β = [β1, β2, β3, ..., βL]T is the output weight matrix between the
hidden nodes and the output nodes. Also h(x) = [g1(x), g2(x), ..., gL(x)]
are the hidden node outputs (random hidden features) for the input x and
gi (x) is the output of the i-th hidden node.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 54 / 83
(ELM-AE)- ELM
Given that there are N training samples (xi , ti )N
i=1, ELM is to resolve the
following learning problems:
Hβ = T
where T = [t1, ..., tN]T are the target labels and hT (x1), ..., hT (xN)]T .
Hence the output weights β can be calculated as given by:
β = H†
T
where H† is the Moore-Penrose generalized inverse of a Matrix ’H’.
To have a better generalization performance and to make the solution
more robust, one can add a regularization term as shown below:
β = (I/λ + HT
H)−1
HT
T
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 55 / 83
Extreme Learning Machines as (ELM-AE)
The main objective of ELM-AE is to represent the input features
meaningfully in three different representations:
Compressed representations Represent features from a higher
dimensional input data space to a lower dimensional feature space.
Sparse representation Represent features from a lower dimensional
input data space to a higher dimensional feature space.
Equal dimension representation Represent features from an input
data space dimension equal to feature space dimension.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 56 / 83
(ELM-AE)-Network
Figure: ELM-AE Orthogonal Weights
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 57 / 83
(ELM-AE)- Orthogonalisation
Othogonalisation of randomly generated hidden parameters(weights
and biases) tends to make the generalization performance of the
ELM-AE better.
ELM-AE orthogonal random weights and biases of the hidden nodes
project the input data to a different or equal dimension space as
shown by the Johnson-Lindenstrauss Lemma.
The weights and the biases are calculated as below:
h = g(ax + b)
aT
a = I, bT
b = 1
where a = [a1, a2, ..., aL] are the orthogonal random weights and
b = [b1, b2, ..., bL] are the orthogonal random biases between the
input nodes and hidden nodes.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 58 / 83
(ELM-AE)-Johnson-Lindenstrauss Lemma
” The lemma states that a small set of points in a high-dimensional space
can be embedded into a space of much lower dimension in such a way that
distances between the points are nearly preserved.”
Given 0 < < 1 a set X of m points in N, and a number n > ln(m)/ 2,
there is a linear map f : N −→ n such that,
(1 − )||u − v||2
≤ ||f (u) − f (v)||2
≤ (1 + )||u + v||2
for all u,v ∈ X.
One proof of the lemma takes to be a suitable multiple of the orthogonal
projection onto a random subspace of dimension n in RN, and exploits the
phenomenon of concentration of measure. Obviously an orthogonal
projection will, in general, reduce the average distance between points, but
the lemma can be viewed as dealing with relative distances, which do not
change under scaling.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 59 / 83
(ELM-AE)-Singular Value Decomposition
The SVD of the equation:
β = (I/λ + HT
H)−1
HT
T
is shown as:
Hβ =
N
i=1
ui d2
i /(d2
i + λ)uT
i X
where u are the eigen vectors of HHT , d are the singular values of H,
realted to the SVD of the input data X (as H is projected feature space of
X squashed via a sigmoid function). The hypothesis is that the output
weight β of the ELM-Ae will be learning to represent the features of the
input data via singular values.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 60 / 83
(ELM-AE)-Multilayer Extreme Learning Machine
Multilayer neural networks perform poorly when trained with back
propagation (BP) only, so we initialize hidden layer weights in a deep
network by using layer-wise unsupervised training and fine-tune the
whole neural network with BP.
Similar to deep networks, ML-ELM hidden layer weights are initialized
with ELM-AE, which performs layer-wise unsupervised training.
However, in contrast to deep networks, ML-ELM doesnt require fine
tuning.
ML-ELM hidden layer activation functions can be either linear or
nonlinear piecewise.
If the number of nodes Lk in the kth hidden layer is equal to the
number of nodes Lk1 in the (k 1)th hidden layer, g is chosen as
linear; otherwise, g is chosen as nonlinear piecewise, such as a
sigmoidal function:
Hk
= g((βk
)T
Hk−1
)
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 61 / 83
(ELM-AE)- Adding Layers in ML-ELM
Figure: ELM-AE addition of the layers and the working
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 62 / 83
(ELM-AE)-Performance Comparison
Algorithm Training (rate) Testing (rate) Training Time
ELM 95.45 98.03 1120.365
ML-ELM 99.51 98.75 785.235
SAE 98.5645 98.7812 -
DBN 94.568 98.56 20548
DL-CNN 97 96 61872
Table: Performance Comparison for MNIST dataset
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 63 / 83
(ELM-AE)- Performance Comparison
Figure: ELM Autoencoder performance graph
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 64 / 83
Heirarchical ELM (H-ELM)- ELM for Multilayer perceptron
It consists of two basic components: 1. unsupervised feature learning 2.
supervised feature claasification. The ELM for both semisupervised and
unsupervised tasks based on the manifold regularization,and the unlabeled
or partially labeled samples are clustered using ELM.
The major difference between the H-ELM and the original ELM is that
before the ELM-based feature classification is done , H-ELM uses the
training to obtain the multilayer sparse representation to the raw input
data, whereas in ELM the raw data is used for regression or classification.
Hence the compact features can help to remove the redundancy of the
original inputs and thus improve the efficiency.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 65 / 83
(H-ELM)-Theorems
Theorem
Given any bounded nonconstant piecewise continuous function g:
−→ , if the span G(a,b,x) : (a,b) ∈ d X is dense in L2, for any
target function f and any function sequence gL(x) = G(aL, bL, x randomly
generated based on any comtinuous sampling distribution,
limn→∞ ||f − fn|| = 0 holds with probability one if the output weights βi
are determined by the ordinary least square to minimize
||f (x) − L
i=1 βi gi (x)||
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 66 / 83
(H-ELM)-Network
Figure: Hierarchical ELM Network
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 67 / 83
(H-ELM)-Framework
H-ELM training architecture is structurally divided into two separate
phases:
Unsupervised hierarchical feature representation
Supervised feature classification. For the former phase
a new ELM-based autoencoder is developed to extract multilayer sparse
features of the input data, which is to be discussed in the next section;
while for the latter one, the original ELM-based regression is performed for
final decision making.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 68 / 83
H-ELM - Performance Comparison
Algorithm Testing Accuracy Training Time(secs)
ELM-AE(ML-ELM) 98.75 785.235
H-ELM 99.0121 456.258
Table: Performance Comparison for MNIST dataset
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 69 / 83
Recommender System for a Music App using ELM
Recommender Algorithms are best known for the use in the e-commerce
websites, where they use the input about the customer’s interest to
generate a list of the recommended items.
There are three kinds of Recommender Algorithms:
Collaborative Filtering : This filtering approaches building a model
from the user’s past behaviour as well as similar decisions made by
other users. This model is then used to predict the items that the
user may have an interest in.
Content Based Filtering: This filtering approaches to utilize a series
of discrete characteristics of an item in order to recommend
additional items with similar properties.
hybrid System: When the above two approaches are combined
together then a new recommender system is formed.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 70 / 83
Recommender System for a Music App using ELM-
Proximity / Similarity Measure
It is the distance between the two customers using either the correlation or
the cosine measure
Correlation: In this case the similarity between the two users is
measured by computing the Pearson Correlation and is given by:
corra,b = i (rai − ˆra)
i (rai−ˆra )2
i (rbi − ˆrb)2
Cosine: In this case two customers a and b are thought of as two
vectors in the m dimensional product space (or the k-dimensional
space in case of reduced representation). The proximity between
them is measured by computing the cosine of the angle between the
two vectors, which is given by
cos(a, b) =
a.b
||a||2 ∗ ||b||2
Using this value the similarity matrix between the two users is made.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 71 / 83
Recommender System for a Music App using ELM- Idea
In this section we would be discussing how this music app will be working.
Let there are 10 songs in the system, then each song will be having the
following attributes:
Attribute 1: Genre : A music genre is a conventional category that
identifies some pieces of music as belonging to a shared tradition or
set of conventions. It is to be distinguished from musical form and
musical style, although in practice these terms are sometimes used
interchangeably.
Attribute 2: Number of times played: If a song is played more than
the 75 % of the total time of that particular song than the song is
played otherwise it is assumed to be not played. This is used so that
a boundary can be formed for distinguishing between the songs being
not played and forwarded and a song being played till a particular
time.
Attribute 3: Output : This is simply the song number when it was
downloaded to the device.
Here the most important attribute is the ’genre’.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 72 / 83
Recommender System for a Music App using ELM-
Dataset Making
There are three datasets used in this making of the app.
Datasetsongnames : This is the dataset for the songs that are kept in
the device. If a new song is added the dataset will have one more row
having the song and the details along with it.
Playlist : This matrix is made within the application of this system. It
is an t × m matrix. In which t is the number of the times the app is
played and the background data tries to store the song details and m
is the number of times songs are played from the app in one go.
RecommPlaylist: This is a n X n matrix where ’n’ is the number of
songs in this device. Each cell tries to depict whether that particular
song is played in that playlist.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 73 / 83
Recommender System for a Music App using ELM-
Procedure
1. Measure Similarity using
Jaccard Distance
Cosine Distance
2. Suggestions By ELM
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 74 / 83
Recommender System for a Music App using ELM- Results
Table: Songs in the device :
Name of the Song Genre Count Song Number
Saturday Saturday 1 15 1
Main koi Aisa Geet Gaaon 2 13 2
Nothing Else Matters 3 9 3
Kashmir 4 11 4
Paradise 5 7 5
Tu kisi Rail Si 5 11 6
Super Machi 7 11 7
Pani Da 5 9 8
Slim Shady 6 6 9
Sunn Raha Hai Na Tu 1 4 10
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 75 / 83
Recommender System for a Music App using ELM- Results
Table: Playlist for the music app created in backhand :
1 1 1 2 3 8 9 10 0 0
2 2 3 3 4 5 6 7 8 9
4 3 1 7 8 5 4 2 1 0
8 5 4 1 2 3 7 9 0 0
1 2 4 8 7 6 3 10 10 0
7 7 7 1 2 4 9 5 6 7
2 2 1 4 7 8 6 3 2 1
4 5 6 1 3 2 4 7 8 9
8 5 2 1 4 7 9 6 3 10
8 5 4 2 1 1 1 0 0 0
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 76 / 83
Recommender System for a Music App using ELM- Results
Table: Dataset for the actual Comparison :
NameSong1 Song2 Song3 Song4 Song5 Song6 Song7 Song8
3 1 1 0 0 0 0 1
0 2 2 1 1 1 1 1
2 1 1 2 1 0 1 1
1 1 1 1 1 0 1 1
1 1 1 1 0 1 1 1
1 2 0 1 1 1 4 0
2 3 1 1 0 1 1 1
1 1 1 2 1 1 1 1
1 1 1 1 1 1 1 1
3 1 0 1 1 0 0 1
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 77 / 83
Recommender System for a Music App using ELM- Results
The songs recommended by the following algorithms are:
ELM: Song No. 1 2 4 5
When the song played is Song number 8. The genre for the above song
matches with the song number 5. Thus the user chooses the song number
5.
BP: Song No. 1 5 6 10
The song number 10 hasn’t been played so much.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 78 / 83
Conclusions and Future Works
Conclusions:
The test results support that ELM has a better generalization
capapbility along with a faster learning rate.
The TLFN-ELM which takes somewhat more time than the SLFN but
in the testing results are quite higher than the SLFN.
Since the ELM-AE can be a counter parts for the deep networks since
they take very small amount of time in comparison to them.
The ELM predicted quite better for the music app than the BP.
Future Works:
The TLFN can be made to work upon various other datasets and
other algorithms. the architecture can be refined to make it predict
better for the load shedding data set.
For the music app, the facial expression detection can be applied to
learn the mood of the person.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 79 / 83
References
Cortes C, Vapnik V. Support vector networks. Mach Learn. 1995;20(3):27397.
Suykens JAK, Vandewalle J. Least squares support vector machine classifiers.
Neural Process Lett. 1999;9(3):293300.
Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: a new learning scheme
of feedforward neural networks. In: Proceedings of international joint conference on
neural networks (IJCNN2004), vol. 2, (Budapest, Hungary); 2004. p. 985990, 2529
July.
Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and
applications. Neurocomputing. 2006;70:489501.
Huang G-B, Chen L, Siew C-K. Universal approximation using incremental
constructive feedforward networks with random hidden nodes. IEEE Trans Neural
Netw. 2006;17(4):87992.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 80 / 83
References
Huang G-B, Chen L. Enhanced random search based incremental extreme learning
machine. Neurocomputing. 2008;71:34608
Bartlett PL. The sample complexity of pattern classification with neural networks:
the size of the weights is more important than the size of the network. IEEE Trans
Inform Theory. 1998;44(2):52536.
Rosenblatt F. The perceptron: a probabilistic model for information storage and
organization in the brain. Psychol Rev. 1958;65(6):386408.
Serre D. Matrices: theory and applications. New York:Springer; 2002.
Rao CR, Mitra SK. Generalized Inverse of matrices and its applications. New
York:Wiley; 1971.
Huang G-B, Ding X, Zhou H. Optimization method based extreme learning
machine for classification. Neurocomputing. 2010;74:15563.
Bai Z, Huang G-B, Wang D, Wang H, Westover MB. Sparse extreme learning
machine for classification. IEEE Trans Cybern. 2014.
Huang G-B, Li M-B, Chen L, Siew C-K. Incremental extreme learning machine with
fully complex hidden nodes. Neurocomputing. 2008;71:57683.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 81 / 83
References
Huang G-B, Chen L. Convex incremental extreme learning machine.
Neurocomputing. 2007;70:305662.
Werbos PJ. Beyond regression: New tools for prediction and analysis in the
behavioral sciences. Ph.D. thesis, Harvord University; 1974.
Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error
propagation. In: Rumelhart DE, McClelland JL, editors. Parallel distributed
processing: explorations in the microstructures of cognition, vol: foundations.
Cambridge, MA: MIT Press; 1986. p. 31862.
Rumelhart DE, Hinton GE, Williams RJ. Learning representations by
back-propagation errors. Nature. 1986;323:5336.
Werbos PJ. The roots of backpropagation : from ordered derivatives to neural
networks and political forecasting. New York:- Wiley; 1994.
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 82 / 83
THANK YOU!
Any Querries?
Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 83 / 83

More Related Content

What's hot

. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
Exact Matrix Completion via Convex Optimization Slide (PPT)
Exact Matrix Completion via Convex Optimization Slide (PPT)Exact Matrix Completion via Convex Optimization Slide (PPT)
Exact Matrix Completion via Convex Optimization Slide (PPT)Joonyoung Yi
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
Application of interpolation and finite difference
Application of interpolation and finite differenceApplication of interpolation and finite difference
Application of interpolation and finite differenceManthan Chavda
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolationEasyStudy3
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
Numerical
NumericalNumerical
Numerical1821986
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysisVanessa S
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...ijfls
 

What's hot (20)

. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Exact Matrix Completion via Convex Optimization Slide (PPT)
Exact Matrix Completion via Convex Optimization Slide (PPT)Exact Matrix Completion via Convex Optimization Slide (PPT)
Exact Matrix Completion via Convex Optimization Slide (PPT)
 
algorithm Unit 5
algorithm Unit 5 algorithm Unit 5
algorithm Unit 5
 
algorithm Unit 2
algorithm Unit 2 algorithm Unit 2
algorithm Unit 2
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
algorithm unit 1
algorithm unit 1algorithm unit 1
algorithm unit 1
 
5.1 greedy 03
5.1 greedy 035.1 greedy 03
5.1 greedy 03
 
Application of interpolation and finite difference
Application of interpolation and finite differenceApplication of interpolation and finite difference
Application of interpolation and finite difference
 
2. polynomial interpolation
2. polynomial interpolation2. polynomial interpolation
2. polynomial interpolation
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Numerical
NumericalNumerical
Numerical
 
Knapsack problem using fixed tuple
Knapsack problem using fixed tupleKnapsack problem using fixed tuple
Knapsack problem using fixed tuple
 
Independent component analysis
Independent component analysisIndependent component analysis
Independent component analysis
 
RTSP Report
RTSP ReportRTSP Report
RTSP Report
 
5.1 greedy
5.1 greedy5.1 greedy
5.1 greedy
 
algorithm Unit 3
algorithm Unit 3algorithm Unit 3
algorithm Unit 3
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
AN ALPHA -CUT OPERATION IN A TRANSPORTATION PROBLEM USING SYMMETRIC HEXAGONAL...
 
Q
QQ
Q
 

Similar to Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045

Cheatsheet unsupervised-learning
Cheatsheet unsupervised-learningCheatsheet unsupervised-learning
Cheatsheet unsupervised-learningSteve Nouri
 
CHN and Swap Heuristic to Solve the Maximum Independent Set Problem
CHN and Swap Heuristic to Solve the Maximum Independent Set ProblemCHN and Swap Heuristic to Solve the Maximum Independent Set Problem
CHN and Swap Heuristic to Solve the Maximum Independent Set ProblemIJECEIAES
 
Nonstationary Relaxed Multisplitting Methods for Solving Linear Complementari...
Nonstationary Relaxed Multisplitting Methods for Solving Linear Complementari...Nonstationary Relaxed Multisplitting Methods for Solving Linear Complementari...
Nonstationary Relaxed Multisplitting Methods for Solving Linear Complementari...ijcsa
 
NONSTATIONARY RELAXED MULTISPLITTING METHODS FOR SOLVING LINEAR COMPLEMENTARI...
NONSTATIONARY RELAXED MULTISPLITTING METHODS FOR SOLVING LINEAR COMPLEMENTARI...NONSTATIONARY RELAXED MULTISPLITTING METHODS FOR SOLVING LINEAR COMPLEMENTARI...
NONSTATIONARY RELAXED MULTISPLITTING METHODS FOR SOLVING LINEAR COMPLEMENTARI...ijcsa
 
An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...Zac Darcy
 
An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...Zac Darcy
 
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...Nimai Chand Das Adhikari
 
Paper id 24201464
Paper id 24201464Paper id 24201464
Paper id 24201464IJRAT
 
Asymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAsymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAmrinder Arora
 
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...Zac Darcy
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Thesis oral defense
Thesis oral defenseThesis oral defense
Thesis oral defenseFan Zhitao
 
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...IJECEIAES
 
Support Vector Machine.pptx
Support Vector Machine.pptxSupport Vector Machine.pptx
Support Vector Machine.pptxHarishNayak44
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程台灣資料科學年會
 
Lecture 1 computational intelligence
Lecture 1  computational intelligenceLecture 1  computational intelligence
Lecture 1 computational intelligenceParveenMalik18
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
 

Similar to Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045 (20)

Cheatsheet unsupervised-learning
Cheatsheet unsupervised-learningCheatsheet unsupervised-learning
Cheatsheet unsupervised-learning
 
CHN and Swap Heuristic to Solve the Maximum Independent Set Problem
CHN and Swap Heuristic to Solve the Maximum Independent Set ProblemCHN and Swap Heuristic to Solve the Maximum Independent Set Problem
CHN and Swap Heuristic to Solve the Maximum Independent Set Problem
 
Nonstationary Relaxed Multisplitting Methods for Solving Linear Complementari...
Nonstationary Relaxed Multisplitting Methods for Solving Linear Complementari...Nonstationary Relaxed Multisplitting Methods for Solving Linear Complementari...
Nonstationary Relaxed Multisplitting Methods for Solving Linear Complementari...
 
NONSTATIONARY RELAXED MULTISPLITTING METHODS FOR SOLVING LINEAR COMPLEMENTARI...
NONSTATIONARY RELAXED MULTISPLITTING METHODS FOR SOLVING LINEAR COMPLEMENTARI...NONSTATIONARY RELAXED MULTISPLITTING METHODS FOR SOLVING LINEAR COMPLEMENTARI...
NONSTATIONARY RELAXED MULTISPLITTING METHODS FOR SOLVING LINEAR COMPLEMENTARI...
 
An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...
 
An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...An Improved Iterative Method for Solving General System of Equations via Gene...
An Improved Iterative Method for Solving General System of Equations via Gene...
 
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
 
overviewPCA
overviewPCAoverviewPCA
overviewPCA
 
Paper id 24201464
Paper id 24201464Paper id 24201464
Paper id 24201464
 
Asymptotic Notation and Data Structures
Asymptotic Notation and Data StructuresAsymptotic Notation and Data Structures
Asymptotic Notation and Data Structures
 
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
AN IMPROVED ITERATIVE METHOD FOR SOLVING GENERAL SYSTEM OF EQUATIONS VIA GENE...
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Thesis oral defense
Thesis oral defenseThesis oral defense
Thesis oral defense
 
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
A Mixed Binary-Real NSGA II Algorithm Ensuring Both Accuracy and Interpretabi...
 
Support Vector Machine.pptx
Support Vector Machine.pptxSupport Vector Machine.pptx
Support Vector Machine.pptx
 
Fuzzy sets
Fuzzy sets Fuzzy sets
Fuzzy sets
 
Solution 3.
Solution 3.Solution 3.
Solution 3.
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
Lecture 1 computational intelligence
Lecture 1  computational intelligenceLecture 1  computational intelligence
Lecture 1 computational intelligence
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
 

More from Nimai Chand Das Adhikari

HPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine LearningHPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine LearningNimai Chand Das Adhikari
 
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...Nimai Chand Das Adhikari
 
HPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine LearningHPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine LearningNimai Chand Das Adhikari
 
An Intelligent Approach to Demand Forecasting
An Intelligent Approach to Demand ForecastingAn Intelligent Approach to Demand Forecasting
An Intelligent Approach to Demand ForecastingNimai Chand Das Adhikari
 
An intelligent approach to demand forecasting
An intelligent approach to demand forecastingAn intelligent approach to demand forecasting
An intelligent approach to demand forecastingNimai Chand Das Adhikari
 

More from Nimai Chand Das Adhikari (9)

HPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine LearningHPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine Learning
 
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
 
Block Chain understanding
Block Chain understandingBlock Chain understanding
Block Chain understanding
 
Face detection Using Computer Vision
Face detection Using Computer VisionFace detection Using Computer Vision
Face detection Using Computer Vision
 
HPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine LearningHPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine Learning
 
An Intelligent Approach to Demand Forecasting
An Intelligent Approach to Demand ForecastingAn Intelligent Approach to Demand Forecasting
An Intelligent Approach to Demand Forecasting
 
Credit defaulter analysis
Credit defaulter analysisCredit defaulter analysis
Credit defaulter analysis
 
Image Stitching for Panorama View
Image Stitching for Panorama ViewImage Stitching for Panorama View
Image Stitching for Panorama View
 
An intelligent approach to demand forecasting
An intelligent approach to demand forecastingAn intelligent approach to demand forecasting
An intelligent approach to demand forecasting
 

Recently uploaded

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 

Recently uploaded (20)

Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 

Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045

  • 1. EXTREME LEARNING MACHINES Nimai Chand Das Adhikari INDIAN INSTITUTE OF SPACE SCIENCE AND TECHNOLOGY Advisor: Dr. Raju K George Dean (R & D and Student Welfare) 6th June, 2016 Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 1 / 83
  • 2. Overview 1 Introduction 2 Brief Outline of the Thesis 3 Concepts Used 4 Theorems Used 5 Extreme learning Machine 6 Extreme Learning Machines - Two Hidden Layer Feed Forward Network 7 Extreme Learning Machines - Auto Encoders 8 Heirarchical Extreme Learning Machines 9 Recommender Sytems for building a music app using ELM Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 2 / 83
  • 3. Introduction Mainly the neural networks are trained with the training sets (xi , ti )N i=1. Where N is the number of the training examples fed to the neural network. 1 For a function in a training set, A SLFN with maximum of N hidden nodes with any non-linear activation function can easily learn N distinct observations with zero error. 2 The gradient based learning persisted to a huge extent and was the sole theory behind all the learning algorithms of the feed forward neural networks. 3 The drawbacks of this method i.e the slow learning due to improper learning steps and converging to a local minima, has asked for a change in the methodology. Apart from this, many iterative steps are required for the optimized results to occur. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 3 / 83
  • 4. Brief Outline of the thesis 1 A brief Description on the concepts behind the ELM. 2 A new approach to the ELM using the Two Hidden layers instead of the Single layer which has been the basis for the ELM. 3 ELM Auto Encoders and Heirarchical ELM which have two basic learnings in that: 1 unsupervised learning for the features extraction. 2 basic ELM for the classification in the last layer. 4 Concepts and ideas behind the building of a new musical app using the Recommendation Systems and Neural Networks. 5 Conclusions and Future Works. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 4 / 83
  • 5. Single Layer Feed Forward Neural Networks The SLFN’s with an arbitrary chosen inputs weights can easily learn N distinct observations with a very small error. Input weights and biases is not an issue. The system becomes linear and output can be easily calculated using the generalized inverse. The learning speed becomes extremely fast. The algorithm that we are going to learn is Extreme Learning Machine (ELM). Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 5 / 83
  • 6. Single Layer Feed Forward Neural Networks Figure: Single Layer Feed Forward Neural Networks Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 6 / 83
  • 7. Concepts Used Moore Penrose generalized Inverse Least Squares Solution Random Features Mappings and Kernels Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 7 / 83
  • 8. Concepts Used Moore Penrose generalized Inverse Least Squares Solution Random Features Mappings and Kernels Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 8 / 83
  • 9. Moore Penrose Generalized Inverse Consider a system of n × n linear system: Ax = b, A ∈ Mnn, b ∈ n This system will have a unique solution iff matrix A is full rank. Then x = A−1 b Again, let the system be Ax = b, A ∈ Mmn, b ∈ m There are two cases: m > n, System is overdetermined m < n, System is under determines Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 9 / 83
  • 10. Moore Penrose Generalized Inverse: Over Determined Systems In this case, there is no such x ∈ n such that: Ax = b b − Ax = 0 Hence, there will be no exact solution and the residual for this system can be: r(x) = b − Ax, x ∈ n Therefore we will find a vector for which ||r(x)|| = ||b − Ax|| is minimum. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 10 / 83
  • 11. Moore Penrose Generalized Inverse: Definitions A vector which minimizes ||r(x)||2 is the least squares solution for the system defined in the above slide. The least squares solution x which has the minimum2-norm is called the minimum norm least squares solution. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 11 / 83
  • 12. Moore penrose Inverse Let A be any m × n matrix then the pseudo inverse of A is an n × m matrix A† which satisfies the following Moore-Penrose conditions: (MP1) AA†A = A (MP2) A†AA† = A† (MP3) (AA†)T = AA† (MP4) (A†A)T = A†A Theorem: Let A be any real m × n matrix. Then the pseudoinverse of A is unique A†† = A (AT )† = (A†)T Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 12 / 83
  • 13. Moore Penrose Inverse - How to compute? If A is an n × n matrix and non singular, then A† = A−1 If A is an m × n real matrix with m ≥ n and having full rank, i.e. rank(A) = n, then A† = (AT A)−1 )AT If A is an m × n real matrix with m ≤ n and having full rank, i.e. rank(A) = m, then A† = AT (AAT )−1 Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 13 / 83
  • 14. Moore Penrose Inverse - Examples Consider, A = 1 2 has an inverse, A† = 1/5 2/5 While its other inverse A−L = [3 − 1] doesn’t satify the fourth condition while the previous satisfies the above four conditions. Note: 1. A matrix that satisfies the first two conditions is called a Generalized inverse. 2. While the uniqueness is established by the last two conditions Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 14 / 83
  • 15. Concepts Used Moore Penrose generalized Inverse Least Squares Solution Random Features Mappings and Kernels Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 15 / 83
  • 16. Least Squares Solution Two different cases that arise for solving the linear system: Ax = b, A ∈ Mmn, x ∈ n , b ∈ m When m > n, the linear system is called over determined When m < n, the linear system is called under determined A system is solvable iff rank(A) = rank(A|b) If there is no exact solution for a linear system, we get the residual as r(x) = b − Ax, x ∈ n Then we try to seek a vector x ∈ n for which ||r(x)||2 = ||b − Ax||2 is minimum. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 16 / 83
  • 17. Minimum Norm Least Squares Solution The least-squares solution x which has the minimum 2-norm is called the minimum norm least squares solution, i.e. if we say that z is any other least squares solution to the above system Ax = b, then we must have ||x||2 < ||z||2 Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 17 / 83
  • 18. Basic Least Squares Theorem Let there be a linear system Ax = b with A being a real m × n matrix with m ≥ n and b ∈ m. Then 1 The above linear system has a unique least-squares solution x iff A has a full rank. 2 The above linear system has infinitely many least-squares solutions iff A is rank-deficient. 3 The minimum norm least-squares solution to the system is given by x = A† b Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 18 / 83
  • 19. Concepts Used Moore Penrose generalized Inverse Least Squares Solution Random Features Mappings and Kernels Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 19 / 83
  • 20. Random Features Mappings and Kernels Random Feature Mappings: Hidden Layer Output vector i.e h(x) = [h1(x), ..., hL(x)] is mostly calculated if all its hidden node parameters are randomly generated according to any continuous sampling distribution probability and such that h(x) has the universal approximation capability, i.e. ||h(x)β − f (x)|| = limL−→∞|| L i=1 βi hi (x) − f (x)|| = 0 holds with a probability 1 with appropriate output weights. Thus, h(x) = G(a1, b1, x), ..., G(aL, bL, x) here, the G(a,b,x) is a non linear piecewise function that satisfies the ELM universal approximation capability theorem. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 20 / 83
  • 21. Random Features Mappings and Kernels Kernels: Instead of h(x) we can apply kernel matrix in ELM. ˆH = HHT : ˆHi,j = h(xi ).h(xj ) = K(xi , xj ) Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 21 / 83
  • 22. Random Features Mappings and Kernels Feature Mapping Matrix: hi (x) denotes the output of the hidden layer with the regard to the input space, and the feature mapping matrix is mostly irrelevant to the target ti . Also the feature mapping is reasonable to having the feature mapping matrix which is independent from its target values. Figure: ELM Feature Mapping Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 22 / 83
  • 23. Theorems Universal Approximation Capability Given any bounded non-constant piece-wise continuous function as the activation function used in the hidden neurons, if by tuning parameters of hidden neuron activation function SLFNs can easily approximate any target continuous function, then for any continuous target function f (x) and any randomly generated function sequence [hi (x)]L i=1, limL−→∞|| L i=1 βi hi (x) − f (x)|| = 0 holds with a probability 1 with appropriate output weights β. Classification Capability Theorem Given a feature mapping h(x), if h(x)β is dense in C(Rd ) or in C(M), where M is a compact set in Rd , then a generalized SLFN with such a hidden layer mapping h(x) can seperate arbitrary disjoint region of any shapes in Rd or M. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 23 / 83
  • 24. Extreme Learning Machines-Introduction Gradient descent based methods have mainly been the backbone for the feed forward neural networks. In such case, the parameters for the feed forward neural networks need to be tuned and for learning, the time taken is very large. If in case of SLFNs, the input hidden layer weights and biases are randomly assigned then the system will be linear and can be computed through the simple generalized inverse operation. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 24 / 83
  • 25. Extreme Learning Machines Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 25 / 83
  • 26. Extreme Learning Machines - SLFNs Output of the hidden nodes G(ai , bi , x) = g(ai .x + bi ) here, ai : the weight vector connecting the ith hidden node and the input nodes. and bi : the threshold of the ith hidden node. Output of the Network fL(x) = L i=1 βi .G(ai , bi , x) here, G(.) is the activation function. and L is the number of the hidden layer nodes. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 26 / 83
  • 27. Extreme Learning Machine - Mathematical Model Mathematical Model L i=1 βi G(ai , bi , xj ) = tj , j = 1, ..., N is equivalent to Hβ = T, where H(a1, ..., aL, b1, ..., bL, x1, ..., xN) =       G(a1, b1, x1) ... G(aL, bL, x1) . . . . . ... . G(a1, b1, xN) ... G(aL.bL, xN)       N×L β =       βT 1 . . . βT L       L×m and T =       tT 1 . . . tT N       N×m , here H is the hidden-layer-output matrix Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 27 / 83
  • 28. Extreme Learning Machines - Mathematical and Learning Models Mathematical Model Any continuous traget function f (x) can be approximated by SLFNs. Thus, given a small positive value , for SLFNs with enough number of hidden layer nodes (L) we will have ||fL(x) − f (x)|| < Learning Model For N arbitrary distinct samples (xi , ti ) ∈ Rn × Rm, SLFNs with L hidden nodes and activation function g(x) are mathematically modeled as fL(xj ) = oj , j = 1, ..., N Cost Function: E = N j=1 ||oj − tj ||2 Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 28 / 83
  • 29. Extreme Learning Machines - Learning Model Learning Model The target is to minimize the cost function E by adjusting the network parameters βi , ai , bj If, = 0 FL(x) = f (x) = T, T is the known target and Cost function = 0 Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 29 / 83
  • 30. Extreme Learning Machines - Algorithm Three Step Learning Model Given a training set S = (xi , ti )|xi ∈ Rn, ti ∈ Rm, i = 1, ..., N, activation function G and the number of hidden nodes L, 1 Assign randomly input weight vectors ai and the hidden node bias bi , i = 1, ..., L. 2 Calculate the hidden layer output matrix H. 3 Calculate the output weight β : β = H† T Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 30 / 83
  • 31. Extreme Learning Machines - Performance Comparison Algorithm Training Rate Testing Rate Training Time (secs) ELM 78.06 76.78 0.405290 SVM-linear 78.76 77.6518 0.6563 BP 77.88 73.04348 15.2361 Table: Performance Comparison for the Diabetes dataset Figure: performance for diabates dataset Figure: performance for diabates dataset Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 31 / 83
  • 32. Extreme Learning Machines - Performance Comparison DATASET Testing Rate Training Time (secs) Glass 74.2115 0.044677 Hepatitis 70.380 0.037706 Breast Cancer 83.3333 0.024485 Table: Performance Comparison For various DATASETS by ELM Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 32 / 83
  • 33. Extreme Learning Machines - Performance Comparison DATASET Testing Rate Training Time (secs) Glass 37.4129 10.548456 Hepatitis 48.3585 7.4125 Breast Cancer 77.609 2.3564 Table: Performance Comparison For various DATASETS by BP Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 33 / 83
  • 34. Extreme Learning Machines - Performance Comparison DATASET Testing Rate Training Time (secs) Glass 34.5750 0.2354677 Hepatitis 63.0435 0.0632 Breast Cancer 66.67 0.2261 Table: Performance Comparison For various DATASETS by SVM Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 34 / 83
  • 35. Extreme Learning Machines - Performance Comparison Figure: performance comparison Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 35 / 83
  • 36. Extreme Learning Machines - Performance Comparison: SinC function Figure: performance for SinC Function Figure: performance for SinC function Figure: Comparison for the Training Time Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 36 / 83
  • 37. Extreme Learning Machines - Two Hidden Layer Feed Forward Neural Networks Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 37 / 83
  • 38. Extreme Learning Machines - Two Hidden Layer Feed Forward Neural Networks 1 Extreme Learning Machine (ELM) for single-hidden layer feed-forward neural networks (SLFNs) which randomly chooses hidden nodes and biases and then analytically determines the output weights of SLFNs. 2 This algorithm tends to provide good generalization performance with an extremely fast learning speed. 3 If instead one hidden layer two hidden layers are put and the weights and biases are randomly generated according to theories and the theorems discussed above, then we get ELM-TLFN. 4 Similar to the SLFN ELMS, after the input weights and the hidden layer biases are chosen arbitrarily, SLFNs can be simply considered as a linear system and the output weights of the TLFN-ELM can be analytically determined through the simple generalized inverse operation of the hidden layer output matrices. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 38 / 83
  • 39. Extreme Learning Machines - Two Hidden Layer Feed Forward Neural Networks : Proposal The weights and the biases of the two hidden layers are generated randomly as done in case of the SLFNs.Then, ˆN2 i=1 βi g2(Wi .( ˆN1 j=1 g1(wj .xk + bj ))) = Ok, ∀k = 1, ..., N Here, N is the total number of the training examples of the system. Also wi = [wi1, ..., win]T and wj = [wj1, ..., wj ˆN1 ]T are the weight matrices that are generated randomly for the system. Alsoβi = [βi1, ..., βim]T be the output weight matrix. Then, like the SLFn network the above system can be written as: Hβ = T Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 39 / 83
  • 40. Extreme Learning Machines - Two Hidden Layer Feed Forward Neural Networks: H matrix Hidden Layer Output Matrix Here the H is the hidden layer output matrix for the system and can be generated as H =   g2(wj1.g1 + b1) . . . g2(wj ˆN1 .g1 + b ˆN2 ) . . . . . g2(wj1.g ˆN1 + b1) . . . g2(wj ˆN1 .g ˆN1 + b ˆN2 )   ˆN1× ˆN2 Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 40 / 83
  • 41. Extreme Learning Machines - Two Hidden layer Feed Forward Neural Networks: Algorithm Given a training set, activation function g1(x) and g2(x) and the hidden neurons ˆN1 and ˆN2 for the first and second hidden layers respectively 1 Assign arbitrary input weight w1 and w2. Also the biases B1 and B2. 2 Calculate the hidden-layer output matrix of the system. 3 Calculate the output weight β: β = H† T Here, H, β and T are already described in the previous sections. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 41 / 83
  • 42. Extreme Learning Machines - Two Hidden layer Feed Forward Neural Networks: Performance Comparison Algorithm Testing Rate Training Time (secs) Hidden Nodes ELM-SLFN 74.2115 0.044677 20 ELM-TLFN 75.46929 0.16709 10-20 Table: PERFORMANCE COMPARISON for GLASS Dataset with ELM SLFN and ELM TLFN Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 42 / 83
  • 43. Extreme Learning Machines - Two Hidden layer Feed Forward Neural Networks: Performance Comparison Figure: performance comparison for various dataset Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 43 / 83
  • 44. Sparse Autoencoders- Introduction Sparse Autoencoder learning algorithm is one approach to automatically learn features from unlabeled data.In some domains, such as computer vision, this approach is not by itself competitive with the best hand-engineered features, but the features it can learn do turn out to be useful for a range of problems (including ones in audio, text, etc). Further, there are more sophisticated versions of the sparse autoencoder that do surprisingly well, and in many cases are competitive with or superior to even the best hand-engineered representations. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 44 / 83
  • 45. Sparse Autoencoders- Structural View Figure: Sparse Autoencoders network Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 45 / 83
  • 46. Sparse Autoencoders-A brief Idea So far, we have described the application of neural networks to supervised learning, in which we are have labeled training examples.Now suppose we have only unlabeled training examples set {x1, x2, x3...} where xi ∈ n. An autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. i.e yi ≈ xi The autoencoder tries to learn a function hW ,b(x) ≈ x. In other words, it is trying to learn an approximation to the identity function, so as to output ˆx that is similar to x. The identity function seems a particularly trivial function to be trying to learn; but by placing constraints on the network, such as by limiting the number of hidden units, we can discover interesting structure about the data. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 46 / 83
  • 47. Sparse Autoencoder- Example Features Representation suppose the inputs x are the pixel intensity values from a 10 × 10 i.e x ∈ 100, image (100 pixels) so n = 100, and there are s2 = 50 hidden units in layer L2. but also we have y ∈ 100. Reconstruction Since there are only 50 hidden units, the network is forced to learn a compressed representation of the input. i.e. given only the vector of hidden unit activations a(2) ∈ 50 it must try to reconstruct the 100-pixel input x. Different layers When the number of hidden units is large, we can still discover interesting structure, by imposing other constraints on the network. In particular, if we impose a sparsity constraint on the hidden units, then the autoencoder will still discover interesting structure in the data. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 47 / 83
  • 48. Sparse Autoencoders- Sparsity Constraints Informally, we will think of a neuron as being active (or as firing) if its output value is close to 1, or as being inactive if its output value is close to 0. We would like to constrain the neurons to be inactive most of the time. Now a (2) j (x) denotes the activation of the hidden unit j using the input x. Let us define ˆρj = 1/m m i=1 [a (2) j (xi )] be the average activation of the hidden unit j (averaged over the training set. Then we would like to enforce the constraint ˆρj = ρ where ρ is a sparsity parameter(typically a small value ≈ 0.05). Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 48 / 83
  • 49. Sparse Autoencoders- Sparsity Constraints We would like the average activation of each hidden neuron j to be close to 0.05 (say). To satisfy this constraint, the hidden units activations must mostly be near 0. To achieve this another penalty term is introduced to our optimization objective which penalizes ˆρj deviating from the ρ. s2 j=1 ρ log (ρ/ ˆρj ) + (1 − ρ) log ((1 − ρ)/(1 − ˆρj )) Here s2 is the number of neurons in the hidden layer, and the index jis the summing over the hidden units in out network. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 49 / 83
  • 50. Sparse Autoencoders- Kullback-Leibler (KL) divergence The above penalty term ρ log (ρ/ ˆρj ) + (1 − ρ) log ((1 − ρ)/(1 − ˆρj )) is known as Kullback-Leibler (KL) divergence between a Bernoulli random variable with mean ρ and a Bernoulli random variable with mean ˆρj . KL-divergence is a standard function for measuring how different two different distributions are. Property: KL(ρ|| ˆρj ) =    0 : ˆρj = ρ increases monotonically : ˆρj = ρ Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 50 / 83
  • 51. Sparse Autoencoders- KL Divergence Graph Figure: KL Divergence plot Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 51 / 83
  • 52. Sparse Autoencoders- Cost Function Now the overall cost function can be written as : Jsparse(W , b) = J(W , b) + β s2 j=1 KL(ρ|| ˆρj ) Let us define the two terms: J(W,b) is the cost function used for the BP Algorithm. i.e J(W , b) = [1/m m i=1 J(W , b; Xi , yi )] + λ/2 nl −1 l=i sl i=1 sl +1 j=1 (W (l) ji )2 The first term in the definition of J(W , b) is an average sum-of-squares error term. The second term is a regularization term (also called a weight decay term) that tends to decrease the magnitude of the weights, and helps prevent overfitting. In the second term λ controls the weight of the sparsity penalty term. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 52 / 83
  • 53. Representational Learning with ELM for Big Data (ELM-AE) - Introduction A machine learning algorithms generalization capability depends on the dataset, which is why engineering a datasets features to represent the datas salient structure is important. However,feature engineering requires domain knowledge and human ingenuity to generate appropriate features. Similar to deep networks, ELM (ML-ELM) performs layer-by-layer unsupervised learning. This article also introduces the ELM auto-encoder (ELM-AE), which represents features based on singular values. Resembling deep networks, ML-ELM stacks on top of ELM-AE to create a multilayer neural network. It learns significantly faster than existing deep networks, outperforming DBNs, SAEs, and SDAEs and performing on par with DBMs on the MNIST dataset. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 53 / 83
  • 54. (ELM-AE)- A brief of ELM The ELM for SLFNs shows that hidden nodes can be randomly generated. The input data is mapped to L-dimensional ELM random feature space, and the network output is: fL(x) = L i=1 βi hi (x) = h(x)β where β = [β1, β2, β3, ..., βL]T is the output weight matrix between the hidden nodes and the output nodes. Also h(x) = [g1(x), g2(x), ..., gL(x)] are the hidden node outputs (random hidden features) for the input x and gi (x) is the output of the i-th hidden node. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 54 / 83
  • 55. (ELM-AE)- ELM Given that there are N training samples (xi , ti )N i=1, ELM is to resolve the following learning problems: Hβ = T where T = [t1, ..., tN]T are the target labels and hT (x1), ..., hT (xN)]T . Hence the output weights β can be calculated as given by: β = H† T where H† is the Moore-Penrose generalized inverse of a Matrix ’H’. To have a better generalization performance and to make the solution more robust, one can add a regularization term as shown below: β = (I/λ + HT H)−1 HT T Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 55 / 83
  • 56. Extreme Learning Machines as (ELM-AE) The main objective of ELM-AE is to represent the input features meaningfully in three different representations: Compressed representations Represent features from a higher dimensional input data space to a lower dimensional feature space. Sparse representation Represent features from a lower dimensional input data space to a higher dimensional feature space. Equal dimension representation Represent features from an input data space dimension equal to feature space dimension. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 56 / 83
  • 57. (ELM-AE)-Network Figure: ELM-AE Orthogonal Weights Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 57 / 83
  • 58. (ELM-AE)- Orthogonalisation Othogonalisation of randomly generated hidden parameters(weights and biases) tends to make the generalization performance of the ELM-AE better. ELM-AE orthogonal random weights and biases of the hidden nodes project the input data to a different or equal dimension space as shown by the Johnson-Lindenstrauss Lemma. The weights and the biases are calculated as below: h = g(ax + b) aT a = I, bT b = 1 where a = [a1, a2, ..., aL] are the orthogonal random weights and b = [b1, b2, ..., bL] are the orthogonal random biases between the input nodes and hidden nodes. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 58 / 83
  • 59. (ELM-AE)-Johnson-Lindenstrauss Lemma ” The lemma states that a small set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved.” Given 0 < < 1 a set X of m points in N, and a number n > ln(m)/ 2, there is a linear map f : N −→ n such that, (1 − )||u − v||2 ≤ ||f (u) − f (v)||2 ≤ (1 + )||u + v||2 for all u,v ∈ X. One proof of the lemma takes to be a suitable multiple of the orthogonal projection onto a random subspace of dimension n in RN, and exploits the phenomenon of concentration of measure. Obviously an orthogonal projection will, in general, reduce the average distance between points, but the lemma can be viewed as dealing with relative distances, which do not change under scaling. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 59 / 83
  • 60. (ELM-AE)-Singular Value Decomposition The SVD of the equation: β = (I/λ + HT H)−1 HT T is shown as: Hβ = N i=1 ui d2 i /(d2 i + λ)uT i X where u are the eigen vectors of HHT , d are the singular values of H, realted to the SVD of the input data X (as H is projected feature space of X squashed via a sigmoid function). The hypothesis is that the output weight β of the ELM-Ae will be learning to represent the features of the input data via singular values. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 60 / 83
  • 61. (ELM-AE)-Multilayer Extreme Learning Machine Multilayer neural networks perform poorly when trained with back propagation (BP) only, so we initialize hidden layer weights in a deep network by using layer-wise unsupervised training and fine-tune the whole neural network with BP. Similar to deep networks, ML-ELM hidden layer weights are initialized with ELM-AE, which performs layer-wise unsupervised training. However, in contrast to deep networks, ML-ELM doesnt require fine tuning. ML-ELM hidden layer activation functions can be either linear or nonlinear piecewise. If the number of nodes Lk in the kth hidden layer is equal to the number of nodes Lk1 in the (k 1)th hidden layer, g is chosen as linear; otherwise, g is chosen as nonlinear piecewise, such as a sigmoidal function: Hk = g((βk )T Hk−1 ) Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 61 / 83
  • 62. (ELM-AE)- Adding Layers in ML-ELM Figure: ELM-AE addition of the layers and the working Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 62 / 83
  • 63. (ELM-AE)-Performance Comparison Algorithm Training (rate) Testing (rate) Training Time ELM 95.45 98.03 1120.365 ML-ELM 99.51 98.75 785.235 SAE 98.5645 98.7812 - DBN 94.568 98.56 20548 DL-CNN 97 96 61872 Table: Performance Comparison for MNIST dataset Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 63 / 83
  • 64. (ELM-AE)- Performance Comparison Figure: ELM Autoencoder performance graph Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 64 / 83
  • 65. Heirarchical ELM (H-ELM)- ELM for Multilayer perceptron It consists of two basic components: 1. unsupervised feature learning 2. supervised feature claasification. The ELM for both semisupervised and unsupervised tasks based on the manifold regularization,and the unlabeled or partially labeled samples are clustered using ELM. The major difference between the H-ELM and the original ELM is that before the ELM-based feature classification is done , H-ELM uses the training to obtain the multilayer sparse representation to the raw input data, whereas in ELM the raw data is used for regression or classification. Hence the compact features can help to remove the redundancy of the original inputs and thus improve the efficiency. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 65 / 83
  • 66. (H-ELM)-Theorems Theorem Given any bounded nonconstant piecewise continuous function g: −→ , if the span G(a,b,x) : (a,b) ∈ d X is dense in L2, for any target function f and any function sequence gL(x) = G(aL, bL, x randomly generated based on any comtinuous sampling distribution, limn→∞ ||f − fn|| = 0 holds with probability one if the output weights βi are determined by the ordinary least square to minimize ||f (x) − L i=1 βi gi (x)|| Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 66 / 83
  • 67. (H-ELM)-Network Figure: Hierarchical ELM Network Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 67 / 83
  • 68. (H-ELM)-Framework H-ELM training architecture is structurally divided into two separate phases: Unsupervised hierarchical feature representation Supervised feature classification. For the former phase a new ELM-based autoencoder is developed to extract multilayer sparse features of the input data, which is to be discussed in the next section; while for the latter one, the original ELM-based regression is performed for final decision making. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 68 / 83
  • 69. H-ELM - Performance Comparison Algorithm Testing Accuracy Training Time(secs) ELM-AE(ML-ELM) 98.75 785.235 H-ELM 99.0121 456.258 Table: Performance Comparison for MNIST dataset Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 69 / 83
  • 70. Recommender System for a Music App using ELM Recommender Algorithms are best known for the use in the e-commerce websites, where they use the input about the customer’s interest to generate a list of the recommended items. There are three kinds of Recommender Algorithms: Collaborative Filtering : This filtering approaches building a model from the user’s past behaviour as well as similar decisions made by other users. This model is then used to predict the items that the user may have an interest in. Content Based Filtering: This filtering approaches to utilize a series of discrete characteristics of an item in order to recommend additional items with similar properties. hybrid System: When the above two approaches are combined together then a new recommender system is formed. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 70 / 83
  • 71. Recommender System for a Music App using ELM- Proximity / Similarity Measure It is the distance between the two customers using either the correlation or the cosine measure Correlation: In this case the similarity between the two users is measured by computing the Pearson Correlation and is given by: corra,b = i (rai − ˆra) i (rai−ˆra )2 i (rbi − ˆrb)2 Cosine: In this case two customers a and b are thought of as two vectors in the m dimensional product space (or the k-dimensional space in case of reduced representation). The proximity between them is measured by computing the cosine of the angle between the two vectors, which is given by cos(a, b) = a.b ||a||2 ∗ ||b||2 Using this value the similarity matrix between the two users is made. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 71 / 83
  • 72. Recommender System for a Music App using ELM- Idea In this section we would be discussing how this music app will be working. Let there are 10 songs in the system, then each song will be having the following attributes: Attribute 1: Genre : A music genre is a conventional category that identifies some pieces of music as belonging to a shared tradition or set of conventions. It is to be distinguished from musical form and musical style, although in practice these terms are sometimes used interchangeably. Attribute 2: Number of times played: If a song is played more than the 75 % of the total time of that particular song than the song is played otherwise it is assumed to be not played. This is used so that a boundary can be formed for distinguishing between the songs being not played and forwarded and a song being played till a particular time. Attribute 3: Output : This is simply the song number when it was downloaded to the device. Here the most important attribute is the ’genre’. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 72 / 83
  • 73. Recommender System for a Music App using ELM- Dataset Making There are three datasets used in this making of the app. Datasetsongnames : This is the dataset for the songs that are kept in the device. If a new song is added the dataset will have one more row having the song and the details along with it. Playlist : This matrix is made within the application of this system. It is an t × m matrix. In which t is the number of the times the app is played and the background data tries to store the song details and m is the number of times songs are played from the app in one go. RecommPlaylist: This is a n X n matrix where ’n’ is the number of songs in this device. Each cell tries to depict whether that particular song is played in that playlist. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 73 / 83
  • 74. Recommender System for a Music App using ELM- Procedure 1. Measure Similarity using Jaccard Distance Cosine Distance 2. Suggestions By ELM Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 74 / 83
  • 75. Recommender System for a Music App using ELM- Results Table: Songs in the device : Name of the Song Genre Count Song Number Saturday Saturday 1 15 1 Main koi Aisa Geet Gaaon 2 13 2 Nothing Else Matters 3 9 3 Kashmir 4 11 4 Paradise 5 7 5 Tu kisi Rail Si 5 11 6 Super Machi 7 11 7 Pani Da 5 9 8 Slim Shady 6 6 9 Sunn Raha Hai Na Tu 1 4 10 Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 75 / 83
  • 76. Recommender System for a Music App using ELM- Results Table: Playlist for the music app created in backhand : 1 1 1 2 3 8 9 10 0 0 2 2 3 3 4 5 6 7 8 9 4 3 1 7 8 5 4 2 1 0 8 5 4 1 2 3 7 9 0 0 1 2 4 8 7 6 3 10 10 0 7 7 7 1 2 4 9 5 6 7 2 2 1 4 7 8 6 3 2 1 4 5 6 1 3 2 4 7 8 9 8 5 2 1 4 7 9 6 3 10 8 5 4 2 1 1 1 0 0 0 Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 76 / 83
  • 77. Recommender System for a Music App using ELM- Results Table: Dataset for the actual Comparison : NameSong1 Song2 Song3 Song4 Song5 Song6 Song7 Song8 3 1 1 0 0 0 0 1 0 2 2 1 1 1 1 1 2 1 1 2 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 2 0 1 1 1 4 0 2 3 1 1 0 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 3 1 0 1 1 0 0 1 Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 77 / 83
  • 78. Recommender System for a Music App using ELM- Results The songs recommended by the following algorithms are: ELM: Song No. 1 2 4 5 When the song played is Song number 8. The genre for the above song matches with the song number 5. Thus the user chooses the song number 5. BP: Song No. 1 5 6 10 The song number 10 hasn’t been played so much. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 78 / 83
  • 79. Conclusions and Future Works Conclusions: The test results support that ELM has a better generalization capapbility along with a faster learning rate. The TLFN-ELM which takes somewhat more time than the SLFN but in the testing results are quite higher than the SLFN. Since the ELM-AE can be a counter parts for the deep networks since they take very small amount of time in comparison to them. The ELM predicted quite better for the music app than the BP. Future Works: The TLFN can be made to work upon various other datasets and other algorithms. the architecture can be refined to make it predict better for the load shedding data set. For the music app, the facial expression detection can be applied to learn the mood of the person. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 79 / 83
  • 80. References Cortes C, Vapnik V. Support vector networks. Mach Learn. 1995;20(3):27397. Suykens JAK, Vandewalle J. Least squares support vector machine classifiers. Neural Process Lett. 1999;9(3):293300. Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings of international joint conference on neural networks (IJCNN2004), vol. 2, (Budapest, Hungary); 2004. p. 985990, 2529 July. Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing. 2006;70:489501. Huang G-B, Chen L, Siew C-K. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw. 2006;17(4):87992. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 80 / 83
  • 81. References Huang G-B, Chen L. Enhanced random search based incremental extreme learning machine. Neurocomputing. 2008;71:34608 Bartlett PL. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inform Theory. 1998;44(2):52536. Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev. 1958;65(6):386408. Serre D. Matrices: theory and applications. New York:Springer; 2002. Rao CR, Mitra SK. Generalized Inverse of matrices and its applications. New York:Wiley; 1971. Huang G-B, Ding X, Zhou H. Optimization method based extreme learning machine for classification. Neurocomputing. 2010;74:15563. Bai Z, Huang G-B, Wang D, Wang H, Westover MB. Sparse extreme learning machine for classification. IEEE Trans Cybern. 2014. Huang G-B, Li M-B, Chen L, Siew C-K. Incremental extreme learning machine with fully complex hidden nodes. Neurocomputing. 2008;71:57683. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 81 / 83
  • 82. References Huang G-B, Chen L. Convex incremental extreme learning machine. Neurocomputing. 2007;70:305662. Werbos PJ. Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvord University; 1974. Rumelhart DE, Hinton GE, Williams RJ. Learning internal representations by error propagation. In: Rumelhart DE, McClelland JL, editors. Parallel distributed processing: explorations in the microstructures of cognition, vol: foundations. Cambridge, MA: MIT Press; 1986. p. 31862. Rumelhart DE, Hinton GE, Williams RJ. Learning representations by back-propagation errors. Nature. 1986;323:5336. Werbos PJ. The roots of backpropagation : from ordered derivatives to neural networks and political forecasting. New York:- Wiley; 1994. Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 82 / 83
  • 83. THANK YOU! Any Querries? Nimai Chand Das Adhikari (IIST) ELM-AE 6th June, 2016 83 / 83