SlideShare a Scribd company logo
1 of 6
Download to read offline
1
TFFN: Two Hidden Layer Feed Forward Network
using the randomness of Extreme Learning Machine
Nimai Chand Das Adhikari∗, Arpana Alka∗, Dr. Raju K George†
∗ Masters in Machine Learning and Computing, Indian Institute of Space Science and Technology, Trivandrum
†Dean, Indian Institute of Space Science and Technology, Trivandrum
Abstract—The learning speed of the feed forward neural
network takes a lot of time to be trained which is a major
drawback in their applications since the past decades. The
key reasons behind may be due to the slow gradient-based
learning algorithms which are extensively used to train the
neural networks or due to the parameters in the networks
which are tuned iteratively using some learning algorithms.
Thus, in order to eradicate the above pitfalls, a new learning
algorithm was proposed known as Extreme Learning Machines
(ELM). This algorithm tries to compute Hidden-layer-output
matrix that is made of randomly assigned input layer and
hidden layer weights and randomly assigned biases. Unlike the
other feedforward networks, ELM has the access of the whole
training dataset before going into the computation part. Here,
we have devised a new two-layer-feedforward network (TFFN)
for ELM in a new manner with randomly assigning the weights
and biases in both the hidden layers, which then calculates the
output-hidden layer weights using the Moore-Penrose generalized
inverse. TFFN doesn’t restricts the algorithm to fix the number
of hidden neurons that the algorithm should have. Rather it
searches the space which gives an optimized result in the neurons
combination in both the hidden layers. This algorithm provides a
good generalization capability than the parent Extreme Learning
Machines at an extremely fast learning speed. Here, we have
experimented the algorithm on various types of datasets and
various popular algorithm to find the performances and report
a comparison.
Index Terms—Artificial Neural Networks, Extreme Learning
Machines, Generalized Inverse, Pseudo Inverse, Least Squares
Solution, Back propagation, Hidden Neurons, Randomness
I. INTRODUCTION
Back Propagation and its variants have played a dominant
role in the training of the Feed Forward Neural Networks. But
there are several issues such as local optimum, trivial manual
intervention and time consuming in training the parameters
which this algorithm faces. Although many researchers are
working to find out a more efficient learning algorithm for
the feed-forward neural network which even consumes less
time in training. SVM as an alternative solution to the FF-
NN, became somewhat popular when researchers thought that
there wasn’t any other neural network to compensate for the
BP in the training of the Feed Forward Neural Networks.
ELM was originally inspired from the biological learning
and was proposed to overcome the challenges and the issues
that is faced by the BP algorithms [1] [2]. By taking the
background of the biological learning features, it has been
inferred that some part of the brain systems should have the
random neurons with all the parameters independent of the en-
vironments and the resultant technique known to be ELMs [3].
Its computer-based learning efficiency was verified as early as
in 2004, its universal approximation capability was rigorously
proved in theory in 2006−2008, and its concrete biological
behavior is seemed to be subsequently appear in early twenty
first century [4]. Unlike the other so-called randomness (semi
randomness) based learning methods/ networks, the hidden
nodes in ELM are not only independent of the training data
but also are independent of each other. Although the hidden
nodes are important and critical in this case these are not
tuned as in the case of other algorithms. These hidden nodes
are randomly generated beforehand. Unlike all the others
conventional learning methods, this learning method must
see the training examples even before the hidden nodes are
generated. ELM also generates the weight matrix before seeing
the training examples.
In the subsequent sections we will discuss upon the concepts
behind the ELM and propose a new network called TFFN
(Two Hidden Layer Feed Forward Network) using the
concepts and theorems behind its parent network Extreme
Learning Machines. We will show how this proposed archi-
tecture performs in comparison to some of the best known
algorithms in this field.
II. LEARNING PRINCIPLES AND CONCEPTS
A. Concepts for ELM
Fig. 1. ELM-SLFN
This algorithm was first proposed for the single-layer feed
forward neural networks (SLFNs) and was then extended to
the generalized single- hidden layer feed-forward networks in
which the hidden layer need not be a neuron like.
Considering the architecture point of view, the output function
2
of the ELM for the generalized SLFN can be as deduced
below:
fL(x) =
L
i=1
βihi(x) = h(x)β
Here, the β = [β1, β2, ..., βL]T
is a vector for the output
weights between the hidden layer of L nodes to the m ≥ 1
output nodes, and also the
h(x) = [h1(x), h2(x), ..., hL(x)]
is the output vector of the hidden layers with respect to
the input x [5]. Also we remember that the above hidden
matrix hL(x) is the row vectors. hi(x) is the output of the
i−th hidden node output, and the output functions of hidden
nodes may not be unique. We might be using different output
functions in many different hidden neurons. In general, hi(x)
can mostly be:
hi(x) = G(ai, bi, x), ai ∈ d
, bi ∈
This G(a, b, x) is a non-piece-wise continuous function which
fulfills the ELM universal approximation capability theorem
which will be discussed thoroughly in the upcoming sections
[4]. We will give a brief note about the different non linear
piece-wise continuous functions that are already defined in
the literature:
1. Sigmoid Function:
G(a, b, x) =
1
1 + exp(−(a.x + b))
2. Fourier function:
G(a, b, x) = sin(a.x + b)
3. Hardlimit function
G(a, b, x) =
1 a.x − b ≥ 0
0 otherwise
4. Gaussian function
G(a, b, x) = exp(−b||x − a||2
)
5. Multiquadrics function
G(a, b, x) = (||x − a||2
b2
)1/2
Below are the definitions and the learning principles
behind out proposing architecture [5]:
Definition 1:A neuron(or node)[3] is called a random
neuron(node) if all its parameters(e.g, a,b) in its output
function G(a,b,x) are randomly generated based on a
continuous sampling Distribution probability.
Definition 2:A hidden layer output mapping h(x) [6] is
said to be an ELM random feature mapping if all its
hidden node parameters are randomly generated according
to any continuous sampling distribution probability and
such h(x) has universal approximation capability, that is,
||h(x)β − f(x)|| = limL⇒∞||
L
i=1 βihi(x) − f(x)|| = 0
holds with the probability 1 with appropriate output weights β.
If we take into the account the Barlett’s neural network
generalization theory [7], for the feed-forward neural
networks for reaching the smaller training error, then the
smaller the norms of the weights are, the better generalization
performance the network tend s to have. Thus we infer that it
might be true with the generalized SLFNs where the hidden
neurons may not be neuron alike. Hence if we consider the
learning point of view of the ELM, then ELM’s theory aims
to reach the smallest error in the training part as well as the
smallest norm of the output weights between the hidden node
and the output node [4][8]:
Minimize : ||β||σ1
p + C||Hβ − T||σ2
q
Here σ1 > 0, σ2 > 0, p, q = 0, 1/2, 1, 2, ..., +∞, H is the
hidden layer output matrix(i.e randomized matrix) and C is
the regularized parameter [3]:
H =






h(x1)
.
.
.
h(xN )






= 





h1(x1) . . . hL(x1)
. . . . .
. . . . .
. . . . .
h1(xN ) . . . hL(xN )






and T is the training data target matrix:
T =






tT
1
.
.
.
tT
N






= 





t11 . . . t(1m)
. . . . .
. . . . .
. . . . .
t1(N1) . . . t(Nm)






Now let us present the some of the learning rules of ELM:
Learning Principle 1 Hidden neurons of SLFNs with almost
any nonlinear piece-wise continuous activation functions
or their linear combinations can be generated randomly
in accordance to any continuous sampling probability
distribution, and such hidden neurons can be independent of
training samples and also its learning environment. According
to the theory, the use of feature mappings h(x) can be used
in ELM for which it can approximate any of the continuous
target functions. Activation functions like sigmoid function
which are used in the artificial neural networks are an
oversimplified modeling of brain neurons and may be very
much different from what they might be. But it is true that
3
the actual activation function of a real brain is unknown.
The exact activation function of a live brain neuron may
be impossible to know. But it can be assumed that the
original function (activation) might be nonlinear piece-wise
continuous [3].
Hence, Learning Principle I of ELM may be widely
adopted in some brain learning mechanism without the need
of knowing the actual activation function of living brain
neurons.
B. Pseudo-inverse: Moore-Penrose generalized inverse
Let us consider a system nXn of linear system as given by
[9][10]:
Ax = b, A ∈ Mnn, b ∈ n
The above system will have a unique solution iff the matrix
A has a full rank [10]. In that case the value of x (in case it
is unique) will be given by
x = A−1
b
Now let us consider a system of m X n of linear equations
Ax = b, (A ∈ Mmn, b ∈ m
)
Then there are two cases arising:
Case 1: If m > n, the system is over-determined. In these
kinds of systems, there is no solution. In other words, there is
no such x ∈ n
such that
Ax = b,
or
b − Ax = 0
When there is no exact solution, the residual can be written
as
r(x) = b − Ax, x ∈ n
and try to find a vector x ∈ n
for which
||r(x)|| = ||b − Ax||
Let us define some definitions and theorems related to
describe this topic.
Definition 1. A vector x that minimizes ||r(x)||2 is called a
least-squares solution to the system defined above [11]. The
least squares solution x which has the minimum 2-norm is
called the minimum norm least squares solution [12].
Let us now give an example on the understanding of this
concept. Let ’y’ is any other least square solution to the
system Ax = b, then to satisfy the above definition
||x||2 < ||z||2
Now we will be proving later that the minimum norm least-
squares solution to the over-determined system is given by
x = A†
b
Here A†
is the pseudo-inverse of A.
C. Generalized Inverse
If A is any matrix, there is a generalized inverse, A−
such
that [10],
AA−
A = A
Now, this equation is an extrapolated from the conjuncture that
any matrix has at-least one-sided inverse. Let A−
is equal to
either L or R (i.e Left and Right sided inverse respectively).
Then,
ALA = A(LA) = AI = A, ARA = (AR)A = IA = A
If A is a n X m matrix , A−
is then a m X n matrix and the
resultant identity matrix either has the rank equal to columns
or rows. It is known that when m = n and when rank(A) =
n then A−
= A−1
. There are many properties of this but the
most important of all those is that the generalized inverses A−
are not unique.
1) Moore-Penrose Inverse: In this section we will be
defining the pseudo-inverse A†
of an m x n matrix A, and
illustrate how we can compute it using the various methods.
Definition 2.Let A be any real m x n matrix. Then the
pseudo-inverse of A is an n x m matrix X (instead of calling
it A†
satisfying the following Moore-Penrose conditions:
(MP1) AXA = A
(MP2) XAX = X
(MP3) (AX)’ = AX
(MP4) (XA)’ = XA
(AT
)†
= (A†
)T
D. Minimum Norm Least-Squares Solution
A vector x which minimizes ||r(x)||2 is called a least-
squares solution to the system described above. Also the least-
squares solution x which has the minimum 2-norm is called the
minimum norm least squares solution, i.e. if we say that z us
any other least squares solution to the above system Ax = b,
then we must have
||x||2 < ||z||2
Hence this way of finding the least-squares solutions for the
linear system like Ax = b is called the linear least-squares
problem [10][1][2][4]. Here we are going to establish a result
for the system Ax = b is over determined and has full rank,
then it has a unique l.s.s x obtained by solving the normal
system
AT
Ax = AT
b
But the matrix AT
A is ill-conditioned most frequently and is
influenced by the rounding errors. Thus, when the A is over
determined and has a full rank,
k2(AT
A) =
α2
1
α2
n
= [k(A)]2
4
E. Basic Least-Squares Theorem
Let us now describe and prove the most important theorem
for our proposed work [10].
Theorem 1:Let a linear system be there
Ax = b
Where A is a real m × n matrix, with m ≥ n, and b ∈ m
.
Then,
(a) The above linear system has a unique least-squares solution
x iff A has a full rank.
(b) The above linear system has infinitely many least-squares
solutions iff A is rank-deficient.
(c)The minimum norm least-squares solution to the system is
given by
x = A†
b
III. CONCEPT OF TFFN: TWO HIDDEN LAYERS
FEEDFORWARD NETWORK
Extreme Learning Machines (ELM) for two-hidden layer
feed-forward neural networks (TFFNs) which randomly
chooses the hidden node weights and biases and then
analytically determines the output weights β. In theory,
this algorithm also tends to provide good generalization
performance with an extremely fast learning speed like
SLFNs in comparison to the multilayer Back Propagation
. The experimental results based on a few artificial and
real benchmark function approximation and classification
problems including very large complex applications show
that the new algorithm can produce good generalization
performance in most cases and can learn faster than the
conventional popular learning algorithms for feed-forward
neural networks.
Similarly if instead of one hidden layer two hidden layers
Fig. 2. Two Hidden Layer Feedforward ELM
are put and the weights and biases are randomly generated
according to theories and the theorems discussed above in the
previous sections, the results will be promising and even some
times better than the SLFNs. Here we present a new concept
of making the ELM to go further than its single hidden
layer. Similar to the SLFN ELMs, after the input weights
and the hidden layer biases are chosen arbitrarily, TFFNs
can be simply considered as a linear system and the output
weights of the TFFN can be analytically determined through
the simple generalized inverse operation of the hidden layer
output matrices [4].
A. Proposal
As already described above instead of SLFN the concepts
behind the idea of two hidden layer Feed forward network
using ELM will be described here. The theory behind this
is same as that of the SLFN as the weight matrix and bias
matrix for the hidden layers are randomly generated. Let us
think about the structure having two hidden layer with nodes
ˆN1 and ˆN2 nodes. Also the training example that are with the
algorithm be arbitrary distinct samples as (xi, ti), where xi =
[xi1, xi2, ..., xin]T
, ∈ n
and let ti = [ti1, ti2, ..., tim]T
∈ m
.
Then we can mathematically model the system as:
ˆN2
i=1
βig2(Wi.(
ˆN1
j=1
g1(wj.xj + bj) + bi)) = Ok, ∀k = 1, ..., N
Here, N is the number of the training examples of
the system. Also wi = [wi1, wi2, ..., win]T
and wj =
[wj1, wj2, ..., wj ˆN1
]T
are the weight matrices that are ran-
domly generated for the system. Also βi = [βi1, βi2, ..., βim]
be the output weight matrix of the system. Then the above
system can be written as
Hβ = T
Here the H is the hidden layer output matrix for the system
and can be generated as
H =


g2(wj1.x1 + b1) . . . g2(wj ˆN1
.x1 + b ˆN2
)
. . . . .
g2(wj1.x ˆN1
+ b1) . . . g2(wj ˆN1
.x ˆN1
+ b ˆN2
)


ˆN1× ˆN2
Here the hidden layer is found from equivalent hidden layers
of the SLFNs but the number will be two instead of one
in case of the SLFN. Also the time of computation will be
somewhat more than that of the SLFN as the number of
hidden layers is more than the SLFN.
IV. SINGLE-HIDDEN-LAYER FEED FORWARD NETWORKS
VERSUS MULTI-HIDDEN-LAYER FEED FORWARD
NETWORKS
It is difficult to deal with multi-hidden layers of ELM
directly without having any ideas on single hidden layer of
ELM. Thus, in the past 10 years, most of the ELM works
has been focused on generalized single hidden-layer feed
forward networks (SLFNs).The concepts behind the TFFNs
are the same as that of the SLFNs. In the TFFNs we can
say that there are two ELM theories running, one for the
input layer-first hidden layer-second hidden layer and another
for the first hidden layer-second hidden layer -output layer.
Hence the concepts will be the same as that for the SLFNs
ELM.
The theorems and the theories behind this TFFN is as given
below [1][2][3][4][12][13]:
5
Theorem 2: Universal approximation capability
Given any bounded non-constant piece-wise continuous
function as the activation function in hidden neurons,
if by tuning parameters of hidden neuron activation
function SLFNs can approximate any target continuous
function, then for any continuous target function f(x)
and any randomly generated function sequence hi(x)L
i=1,
LimL⇒∞||
L
i=1 βihi(x) − f(x)|| = 0 holds with probability
one with appropriate output weights β.
Classification Capability: Similar to the approximation
capability theorem of single-hidden-layer feed forward neural
networks, it can be proved that the classification capability
for the SLFNs with the hidden-layer mapping h(x) satisfying
the universal approximation condition.
Definition 1: A closed set is called a region regardless
whether it is bounded or not.
Lemma 1: Given some disjoint regions K1, K2, ..., Km in d
and the corresponding m arbitrary real values c1, c2, ..., cm
and an arbitrary region X which is disjoint from any of
Ki, then there exists a continuous function f(x) such that
f(x) = ci if x ∈ Ki and f(x) = c0 if x ∈ X, where c0 is
any arbitrary real value different from c1, c2, ..., cp
Now we can define the theorem known as Classification
Capability theorem.
Theorem 3 : Classification Capability Theorem
Given a feature mapping h(x), if h(x)β is dense in C( d
) or
in C(M), where M is a compact set in d
, then a generalized
SLFN with such a hidden-layer mapping h(x) can separate
arbitrary disjoint regions of any shapes in d
or M.
Thus, according to the above theorems, it is necessary
and sufficient condition that the feature mapping h(x)
is chosen to make the h(x)β to have the capability for
approximating any of the target continuous function. Again
if the h(x)β cannot approximate any target continuous
functions, then there may exist some shapes of regions which
might not be separated any classifier with such kind of feature
mapping h(x). Also as the dimensionality of the feature
mapping is large, the output by the classifier h(x)β will also
be as close to the class labels in those corresponding regions
as possible.
If the binary classification case is considered, ELM only uses
a single output node, and the class label closer to the output
value of the ELM is that of the predicted class label of that
input data.
A. Algorithm
Given a training set, activation function g1(x) and g2(x)
and the hidden neuron number ˆN1 and ˆN2,
1) Input the data into the model
2) Divide the data into the train and validation samples.
The algorithm uses Randomized Search to achieve the
optimized Hyperparameters (learning rate, regularization
parameter in cost function, hidden neurons and hidden
biases).
• For the training set, assign arbitrary input weight
w1 and w2. Also the biases B1 and B2.
• Assign arbitrary input weight w1 and w2. Also the
biases B1 and B2.
• The output weight β will be calculated as:
β = H†
T
3) Optimized hypermeter setting network passed for the
validation samples
4) Calculation of the output:
H ∗ β = T
Here, H, β and T are already described in the previous
sections.
V. RESULTS AND DISCUSSION
TABLE I
PERFORMANCE COMPARISON FOR GLASS DATASET
Algorithm Testing Rate Training Time (secs)
SVC-Sigmoid 66.154 0.058
SVC-rbf 63.077 0.0623
Logistic Regression L1 67.6923 0.0235
Logistic Regression L2 66.154 0.0354
Decision Trees 61.54 0.25
Random Forest 78.462 3.98
EM-SLFN 74.2115 0.044677
TFFN 75.46929 0.16709
Bagging 80 1.35
MLP 71.24 3.87
A comparison of an ELM with two hidden layers where
the weight matrices for the input layer to the first hidden
layer and the weight matrix for the second hidden layer is
generated randomly as well as the biases in the both the
hidden layer is made with that of ELM with SLFN. The
dataset taken for the comparison is the Glass dataset. Here for
all the algorithms 70% is taken for the training set and 30%
id taken for the testing or the validation set. The prediction
for the SLFN as we found out to be 74.2115 % whereas
the prediction for the TFFN is found out to be 75.47%.
Also from the above results, Bagging using the optimized
search with decision trees gives a better result of 80%. But
the time taken is very large i.e more than a minute. But if
we see our proposed algorithm gives better result of 75%
validation accuracy and time take is 0.16 seconds. Both these
set of comparisons were made after a lot of testing and
these readings were taken 20 times and the average is listed
above. If the training time is taken into the consideration than
SLFN-ELM still has the best training time.
In the table II, for the pima-indians-diabetes dataset, we
can see that our proposed algorithm has performed well both
in terms of the validation accuracy and the training time
taken. Its parent algorithm i.e SLFN-ELM has the accuracy of
76.78% in 0.4 seconds in comparison to TFFN with 77.66 %.
Here for all the algorithms we have taken the random search
6
TABLE II
PERFORMANCE COMPARISON FOR PIMA-INDIANS-DIABETES
DATASET
Algorithm Testing Rate Training Time (secs)
SVC-Sigmoid 65.4 0.92
SVC-rbf 73.6 0.786
SVC-Linear 77.6518 0.6563
Logistic Regression L1 73.6 0.0678
Logistic Regression L2 73.16 0.0987
Decision Trees 64.07 0.78
Random Forest 72.73 9.98
EM-SLFN 76.78 0.4052
TFFN 77.66 0.86709
Bagging 75.32 10.5
MLP 73.16 15.23
and grid search to obtain at the optimized hyperparameters
setting for running the algorithms in the validation part.
Fig. 3. ELM Vs TFFN results for different datasets
In the figure 3 we can find that ELM-SLFN and ELM-
TFFN performances in the testing validation for the datasets
Hepatitis, Diabetes, Haberman, Dermatology and Fertility. For
some of the datasets the performance of TFFN is extremely
good. While for others its performance is comparable. Also the
only demerit is the training time. Its about 5 times the time
taken by ELM-SLFN. Thus along with some merits, there are
also some demerits.
VI. CONCLUSION
The randomness in the TFFN helps in avoiding the iterations
that takes place in the optimizing the parameters in case of
the multilayer perceptron. Apart from the iterations, the TFFN
has a better accuracy in most of the standard datasets present
in the literature. Thus, this network apart from being very
fast in learning, does optimize the parameters to have a better
accuracy. The demerit of this architecture is that is doesn’t
know how to give priority to some of the learning training
data apart from that the randomness can also scrutinize this
algorithm to be moving in only one direction of learning.
VII. REFERENCES
[1] Huang, Gao, et al. ”Trends in extreme learning ma-
chines: A review.” Neural Networks 61 (2015): 32-48.
[2] Cambria, Erik, et al. ”Extreme learning machines [trends
& controversies].” IEEE Intelligent Systems 28.6 (2013): 30-
59.
[3] Huang, Guang-Bin. ”An insight into extreme learning
machines: random neurons, random features and kernels.”
Cognitive Computation 6.3 (2014): 376-390.
[4] Huang, Guang-Bin, et al. ”Extreme learning machine for
regression and multiclass classification.” IEEE Transactions
on Systems, Man, and Cybernetics, Part B (Cybernetics) 42.2
(2012): 513-529.
[5] Huang, Guang-Bin, Qin-Yu Zhu, and Chee-Kheong Siew.
”Extreme learning machine: theory and applications.” Neuro-
computing 70.1 (2006): 489-501.
[6] Funahashi, Ken-Ichi. ”On the approximate realization of
continuous mappings by neural networks.” Neural networks
2.3 (1989): 183-192.
[7] Anthony, Martin, and Peter L. Bartlett. Neural network
learning: Theoretical foundations. cambridge university press,
2009.
[8] Huang, Guang-Bin, and Lei Chen. ”Convex incremental
extreme learning machine.” Neurocomputing 70.16 (2007):
3056-3062.
[9] Albert, Arthur. Regression and the Moore-Penrose pseu-
doinverse. Elsevier, 1972.
[10] Penrose, Roger. ”A generalized inverse for matrices.”
Mathematical proceedings of the Cambridge philosophical
society. Vol. 51. No. 3. Cambridge University Press, 1955.
[11] Golub, Gene H., and Charles F. Van Loan. ”An analysis of
the total least squares problem.” SIAM Journal on Numerical
Analysis 17.6 (1980): 883-893.
[12] Lawson, Charles L., and Richard J. Hanson. Solving
least squares problems. Society for Industrial and Applied
Mathematics, 1995.
[13] Huang, Guang-Bin, et al. ”Extreme learning machine for
regression and multiclass classification.” IEEE Transactions
on Systems, Man, and Cybernetics, Part B (Cybernetics) 42.2
(2012): 513-529.

More Related Content

What's hot

ARTIFICIAL NEURAL NETWORKS
ARTIFICIAL NEURAL NETWORKSARTIFICIAL NEURAL NETWORKS
ARTIFICIAL NEURAL NETWORKSAIMS Education
 
Artificial neural networks (2)
Artificial neural networks (2)Artificial neural networks (2)
Artificial neural networks (2)sai anjaneya
 
A neuro fuzzy decision support system
A neuro fuzzy decision support systemA neuro fuzzy decision support system
A neuro fuzzy decision support systemR A Akerkar
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
 
Ppt on artifishail intelligence
Ppt on artifishail intelligencePpt on artifishail intelligence
Ppt on artifishail intelligencesnehal_gongle
 
Lecture artificial neural networks and pattern recognition
Lecture   artificial neural networks and pattern recognitionLecture   artificial neural networks and pattern recognition
Lecture artificial neural networks and pattern recognitionHưng Đặng
 
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logicApproximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logicCemal Ardil
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_ASrimatre K
 

What's hot (17)

Perceptron
PerceptronPerceptron
Perceptron
 
ARTIFICIAL NEURAL NETWORKS
ARTIFICIAL NEURAL NETWORKSARTIFICIAL NEURAL NETWORKS
ARTIFICIAL NEURAL NETWORKS
 
Artificial neural networks (2)
Artificial neural networks (2)Artificial neural networks (2)
Artificial neural networks (2)
 
A neuro fuzzy decision support system
A neuro fuzzy decision support systemA neuro fuzzy decision support system
A neuro fuzzy decision support system
 
02 Fundamental Concepts of ANN
02 Fundamental Concepts of ANN02 Fundamental Concepts of ANN
02 Fundamental Concepts of ANN
 
Neural network
Neural networkNeural network
Neural network
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
 
03 Single layer Perception Classifier
03 Single layer Perception Classifier03 Single layer Perception Classifier
03 Single layer Perception Classifier
 
Ppt on artifishail intelligence
Ppt on artifishail intelligencePpt on artifishail intelligence
Ppt on artifishail intelligence
 
Lecture artificial neural networks and pattern recognition
Lecture   artificial neural networks and pattern recognitionLecture   artificial neural networks and pattern recognition
Lecture artificial neural networks and pattern recognition
 
B021106013
B021106013B021106013
B021106013
 
Ffnn
FfnnFfnn
Ffnn
 
Ann
Ann Ann
Ann
 
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logicApproximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
Approximate bounded-knowledge-extractionusing-type-i-fuzzy-logic
 
06 neurolab python
06 neurolab python06 neurolab python
06 neurolab python
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_A
 

Similar to TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme Learning Machine

High performance extreme learning machines a complete toolbox for big data a...
High performance extreme learning machines  a complete toolbox for big data a...High performance extreme learning machines  a complete toolbox for big data a...
High performance extreme learning machines a complete toolbox for big data a...redpel dot com
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
Extreme learning machine:Theory and applications
Extreme learning machine:Theory and applicationsExtreme learning machine:Theory and applications
Extreme learning machine:Theory and applicationsJames Chou
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RManish Saraswat
 
Cognitive Science Unit 4
Cognitive Science Unit 4Cognitive Science Unit 4
Cognitive Science Unit 4CSITSansar
 
14 Machine Learning Single Layer Perceptron
14 Machine Learning Single Layer Perceptron14 Machine Learning Single Layer Perceptron
14 Machine Learning Single Layer PerceptronAndres Mendez-Vazquez
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Akash Goel
 
Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101AMIT KUMAR
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...cscpconf
 
Neural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptxNeural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptxRatuRumana3
 
M7 - Neural Networks in machine learning.pdf
M7 - Neural Networks in machine learning.pdfM7 - Neural Networks in machine learning.pdf
M7 - Neural Networks in machine learning.pdfArushiKansal3
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.pptbutest
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchJisang Yoon
 
Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementIOSR Journals
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeESCOM
 

Similar to TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme Learning Machine (20)

High performance extreme learning machines a complete toolbox for big data a...
High performance extreme learning machines  a complete toolbox for big data a...High performance extreme learning machines  a complete toolbox for big data a...
High performance extreme learning machines a complete toolbox for big data a...
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
Extreme learning machine:Theory and applications
Extreme learning machine:Theory and applicationsExtreme learning machine:Theory and applications
Extreme learning machine:Theory and applications
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
 
Cognitive Science Unit 4
Cognitive Science Unit 4Cognitive Science Unit 4
Cognitive Science Unit 4
 
14 Machine Learning Single Layer Perceptron
14 Machine Learning Single Layer Perceptron14 Machine Learning Single Layer Perceptron
14 Machine Learning Single Layer Perceptron
 
Nueral fuzzy system.pptx
Nueral fuzzy system.pptxNueral fuzzy system.pptx
Nueral fuzzy system.pptx
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
N ns 1
N ns 1N ns 1
N ns 1
 
Soft Computing-173101
Soft Computing-173101Soft Computing-173101
Soft Computing-173101
 
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
DESIGN AND IMPLEMENTATION OF BINARY NEURAL NETWORK LEARNING WITH FUZZY CLUSTE...
 
19_Learning.ppt
19_Learning.ppt19_Learning.ppt
19_Learning.ppt
 
Neural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptxNeural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptx
 
M7 - Neural Networks in machine learning.pdf
M7 - Neural Networks in machine learning.pdfM7 - Neural Networks in machine learning.pdf
M7 - Neural Networks in machine learning.pdf
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
Artificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In ManagementArtificial Neural Networks: Applications In Management
Artificial Neural Networks: Applications In Management
 
Adaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on CooperativeAdaptive Training of Radial Basis Function Networks Based on Cooperative
Adaptive Training of Radial Basis Function Networks Based on Cooperative
 
Nn devs
Nn devsNn devs
Nn devs
 
B42010712
B42010712B42010712
B42010712
 

More from Nimai Chand Das Adhikari

HPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine LearningHPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine LearningNimai Chand Das Adhikari
 
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...Nimai Chand Das Adhikari
 
HPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine LearningHPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine LearningNimai Chand Das Adhikari
 
An Intelligent Approach to Demand Forecasting
An Intelligent Approach to Demand ForecastingAn Intelligent Approach to Demand Forecasting
An Intelligent Approach to Demand ForecastingNimai Chand Das Adhikari
 
Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045
Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045
Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045Nimai Chand Das Adhikari
 
An intelligent approach to demand forecasting
An intelligent approach to demand forecastingAn intelligent approach to demand forecasting
An intelligent approach to demand forecastingNimai Chand Das Adhikari
 

More from Nimai Chand Das Adhikari (10)

HPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine LearningHPPS: Heart Problem Prediction System using Machine Learning
HPPS: Heart Problem Prediction System using Machine Learning
 
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme L...
 
Block Chain understanding
Block Chain understandingBlock Chain understanding
Block Chain understanding
 
Face detection Using Computer Vision
Face detection Using Computer VisionFace detection Using Computer Vision
Face detection Using Computer Vision
 
HPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine LearningHPPS : Heart Problem Prediction System using Machine Learning
HPPS : Heart Problem Prediction System using Machine Learning
 
An Intelligent Approach to Demand Forecasting
An Intelligent Approach to Demand ForecastingAn Intelligent Approach to Demand Forecasting
An Intelligent Approach to Demand Forecasting
 
Credit defaulter analysis
Credit defaulter analysisCredit defaulter analysis
Credit defaulter analysis
 
Image Stitching for Panorama View
Image Stitching for Panorama ViewImage Stitching for Panorama View
Image Stitching for Panorama View
 
Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045
Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045
Thesis Presentation_Extreme Learning Machine_Nimai_SC14M045
 
An intelligent approach to demand forecasting
An intelligent approach to demand forecastingAn intelligent approach to demand forecasting
An intelligent approach to demand forecasting
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme Learning Machine

  • 1. 1 TFFN: Two Hidden Layer Feed Forward Network using the randomness of Extreme Learning Machine Nimai Chand Das Adhikari∗, Arpana Alka∗, Dr. Raju K George† ∗ Masters in Machine Learning and Computing, Indian Institute of Space Science and Technology, Trivandrum †Dean, Indian Institute of Space Science and Technology, Trivandrum Abstract—The learning speed of the feed forward neural network takes a lot of time to be trained which is a major drawback in their applications since the past decades. The key reasons behind may be due to the slow gradient-based learning algorithms which are extensively used to train the neural networks or due to the parameters in the networks which are tuned iteratively using some learning algorithms. Thus, in order to eradicate the above pitfalls, a new learning algorithm was proposed known as Extreme Learning Machines (ELM). This algorithm tries to compute Hidden-layer-output matrix that is made of randomly assigned input layer and hidden layer weights and randomly assigned biases. Unlike the other feedforward networks, ELM has the access of the whole training dataset before going into the computation part. Here, we have devised a new two-layer-feedforward network (TFFN) for ELM in a new manner with randomly assigning the weights and biases in both the hidden layers, which then calculates the output-hidden layer weights using the Moore-Penrose generalized inverse. TFFN doesn’t restricts the algorithm to fix the number of hidden neurons that the algorithm should have. Rather it searches the space which gives an optimized result in the neurons combination in both the hidden layers. This algorithm provides a good generalization capability than the parent Extreme Learning Machines at an extremely fast learning speed. Here, we have experimented the algorithm on various types of datasets and various popular algorithm to find the performances and report a comparison. Index Terms—Artificial Neural Networks, Extreme Learning Machines, Generalized Inverse, Pseudo Inverse, Least Squares Solution, Back propagation, Hidden Neurons, Randomness I. INTRODUCTION Back Propagation and its variants have played a dominant role in the training of the Feed Forward Neural Networks. But there are several issues such as local optimum, trivial manual intervention and time consuming in training the parameters which this algorithm faces. Although many researchers are working to find out a more efficient learning algorithm for the feed-forward neural network which even consumes less time in training. SVM as an alternative solution to the FF- NN, became somewhat popular when researchers thought that there wasn’t any other neural network to compensate for the BP in the training of the Feed Forward Neural Networks. ELM was originally inspired from the biological learning and was proposed to overcome the challenges and the issues that is faced by the BP algorithms [1] [2]. By taking the background of the biological learning features, it has been inferred that some part of the brain systems should have the random neurons with all the parameters independent of the en- vironments and the resultant technique known to be ELMs [3]. Its computer-based learning efficiency was verified as early as in 2004, its universal approximation capability was rigorously proved in theory in 2006−2008, and its concrete biological behavior is seemed to be subsequently appear in early twenty first century [4]. Unlike the other so-called randomness (semi randomness) based learning methods/ networks, the hidden nodes in ELM are not only independent of the training data but also are independent of each other. Although the hidden nodes are important and critical in this case these are not tuned as in the case of other algorithms. These hidden nodes are randomly generated beforehand. Unlike all the others conventional learning methods, this learning method must see the training examples even before the hidden nodes are generated. ELM also generates the weight matrix before seeing the training examples. In the subsequent sections we will discuss upon the concepts behind the ELM and propose a new network called TFFN (Two Hidden Layer Feed Forward Network) using the concepts and theorems behind its parent network Extreme Learning Machines. We will show how this proposed archi- tecture performs in comparison to some of the best known algorithms in this field. II. LEARNING PRINCIPLES AND CONCEPTS A. Concepts for ELM Fig. 1. ELM-SLFN This algorithm was first proposed for the single-layer feed forward neural networks (SLFNs) and was then extended to the generalized single- hidden layer feed-forward networks in which the hidden layer need not be a neuron like. Considering the architecture point of view, the output function
  • 2. 2 of the ELM for the generalized SLFN can be as deduced below: fL(x) = L i=1 βihi(x) = h(x)β Here, the β = [β1, β2, ..., βL]T is a vector for the output weights between the hidden layer of L nodes to the m ≥ 1 output nodes, and also the h(x) = [h1(x), h2(x), ..., hL(x)] is the output vector of the hidden layers with respect to the input x [5]. Also we remember that the above hidden matrix hL(x) is the row vectors. hi(x) is the output of the i−th hidden node output, and the output functions of hidden nodes may not be unique. We might be using different output functions in many different hidden neurons. In general, hi(x) can mostly be: hi(x) = G(ai, bi, x), ai ∈ d , bi ∈ This G(a, b, x) is a non-piece-wise continuous function which fulfills the ELM universal approximation capability theorem which will be discussed thoroughly in the upcoming sections [4]. We will give a brief note about the different non linear piece-wise continuous functions that are already defined in the literature: 1. Sigmoid Function: G(a, b, x) = 1 1 + exp(−(a.x + b)) 2. Fourier function: G(a, b, x) = sin(a.x + b) 3. Hardlimit function G(a, b, x) = 1 a.x − b ≥ 0 0 otherwise 4. Gaussian function G(a, b, x) = exp(−b||x − a||2 ) 5. Multiquadrics function G(a, b, x) = (||x − a||2 b2 )1/2 Below are the definitions and the learning principles behind out proposing architecture [5]: Definition 1:A neuron(or node)[3] is called a random neuron(node) if all its parameters(e.g, a,b) in its output function G(a,b,x) are randomly generated based on a continuous sampling Distribution probability. Definition 2:A hidden layer output mapping h(x) [6] is said to be an ELM random feature mapping if all its hidden node parameters are randomly generated according to any continuous sampling distribution probability and such h(x) has universal approximation capability, that is, ||h(x)β − f(x)|| = limL⇒∞|| L i=1 βihi(x) − f(x)|| = 0 holds with the probability 1 with appropriate output weights β. If we take into the account the Barlett’s neural network generalization theory [7], for the feed-forward neural networks for reaching the smaller training error, then the smaller the norms of the weights are, the better generalization performance the network tend s to have. Thus we infer that it might be true with the generalized SLFNs where the hidden neurons may not be neuron alike. Hence if we consider the learning point of view of the ELM, then ELM’s theory aims to reach the smallest error in the training part as well as the smallest norm of the output weights between the hidden node and the output node [4][8]: Minimize : ||β||σ1 p + C||Hβ − T||σ2 q Here σ1 > 0, σ2 > 0, p, q = 0, 1/2, 1, 2, ..., +∞, H is the hidden layer output matrix(i.e randomized matrix) and C is the regularized parameter [3]: H =       h(x1) . . . h(xN )       =       h1(x1) . . . hL(x1) . . . . . . . . . . . . . . . h1(xN ) . . . hL(xN )       and T is the training data target matrix: T =       tT 1 . . . tT N       =       t11 . . . t(1m) . . . . . . . . . . . . . . . t1(N1) . . . t(Nm)       Now let us present the some of the learning rules of ELM: Learning Principle 1 Hidden neurons of SLFNs with almost any nonlinear piece-wise continuous activation functions or their linear combinations can be generated randomly in accordance to any continuous sampling probability distribution, and such hidden neurons can be independent of training samples and also its learning environment. According to the theory, the use of feature mappings h(x) can be used in ELM for which it can approximate any of the continuous target functions. Activation functions like sigmoid function which are used in the artificial neural networks are an oversimplified modeling of brain neurons and may be very much different from what they might be. But it is true that
  • 3. 3 the actual activation function of a real brain is unknown. The exact activation function of a live brain neuron may be impossible to know. But it can be assumed that the original function (activation) might be nonlinear piece-wise continuous [3]. Hence, Learning Principle I of ELM may be widely adopted in some brain learning mechanism without the need of knowing the actual activation function of living brain neurons. B. Pseudo-inverse: Moore-Penrose generalized inverse Let us consider a system nXn of linear system as given by [9][10]: Ax = b, A ∈ Mnn, b ∈ n The above system will have a unique solution iff the matrix A has a full rank [10]. In that case the value of x (in case it is unique) will be given by x = A−1 b Now let us consider a system of m X n of linear equations Ax = b, (A ∈ Mmn, b ∈ m ) Then there are two cases arising: Case 1: If m > n, the system is over-determined. In these kinds of systems, there is no solution. In other words, there is no such x ∈ n such that Ax = b, or b − Ax = 0 When there is no exact solution, the residual can be written as r(x) = b − Ax, x ∈ n and try to find a vector x ∈ n for which ||r(x)|| = ||b − Ax|| Let us define some definitions and theorems related to describe this topic. Definition 1. A vector x that minimizes ||r(x)||2 is called a least-squares solution to the system defined above [11]. The least squares solution x which has the minimum 2-norm is called the minimum norm least squares solution [12]. Let us now give an example on the understanding of this concept. Let ’y’ is any other least square solution to the system Ax = b, then to satisfy the above definition ||x||2 < ||z||2 Now we will be proving later that the minimum norm least- squares solution to the over-determined system is given by x = A† b Here A† is the pseudo-inverse of A. C. Generalized Inverse If A is any matrix, there is a generalized inverse, A− such that [10], AA− A = A Now, this equation is an extrapolated from the conjuncture that any matrix has at-least one-sided inverse. Let A− is equal to either L or R (i.e Left and Right sided inverse respectively). Then, ALA = A(LA) = AI = A, ARA = (AR)A = IA = A If A is a n X m matrix , A− is then a m X n matrix and the resultant identity matrix either has the rank equal to columns or rows. It is known that when m = n and when rank(A) = n then A− = A−1 . There are many properties of this but the most important of all those is that the generalized inverses A− are not unique. 1) Moore-Penrose Inverse: In this section we will be defining the pseudo-inverse A† of an m x n matrix A, and illustrate how we can compute it using the various methods. Definition 2.Let A be any real m x n matrix. Then the pseudo-inverse of A is an n x m matrix X (instead of calling it A† satisfying the following Moore-Penrose conditions: (MP1) AXA = A (MP2) XAX = X (MP3) (AX)’ = AX (MP4) (XA)’ = XA (AT )† = (A† )T D. Minimum Norm Least-Squares Solution A vector x which minimizes ||r(x)||2 is called a least- squares solution to the system described above. Also the least- squares solution x which has the minimum 2-norm is called the minimum norm least squares solution, i.e. if we say that z us any other least squares solution to the above system Ax = b, then we must have ||x||2 < ||z||2 Hence this way of finding the least-squares solutions for the linear system like Ax = b is called the linear least-squares problem [10][1][2][4]. Here we are going to establish a result for the system Ax = b is over determined and has full rank, then it has a unique l.s.s x obtained by solving the normal system AT Ax = AT b But the matrix AT A is ill-conditioned most frequently and is influenced by the rounding errors. Thus, when the A is over determined and has a full rank, k2(AT A) = α2 1 α2 n = [k(A)]2
  • 4. 4 E. Basic Least-Squares Theorem Let us now describe and prove the most important theorem for our proposed work [10]. Theorem 1:Let a linear system be there Ax = b Where A is a real m × n matrix, with m ≥ n, and b ∈ m . Then, (a) The above linear system has a unique least-squares solution x iff A has a full rank. (b) The above linear system has infinitely many least-squares solutions iff A is rank-deficient. (c)The minimum norm least-squares solution to the system is given by x = A† b III. CONCEPT OF TFFN: TWO HIDDEN LAYERS FEEDFORWARD NETWORK Extreme Learning Machines (ELM) for two-hidden layer feed-forward neural networks (TFFNs) which randomly chooses the hidden node weights and biases and then analytically determines the output weights β. In theory, this algorithm also tends to provide good generalization performance with an extremely fast learning speed like SLFNs in comparison to the multilayer Back Propagation . The experimental results based on a few artificial and real benchmark function approximation and classification problems including very large complex applications show that the new algorithm can produce good generalization performance in most cases and can learn faster than the conventional popular learning algorithms for feed-forward neural networks. Similarly if instead of one hidden layer two hidden layers Fig. 2. Two Hidden Layer Feedforward ELM are put and the weights and biases are randomly generated according to theories and the theorems discussed above in the previous sections, the results will be promising and even some times better than the SLFNs. Here we present a new concept of making the ELM to go further than its single hidden layer. Similar to the SLFN ELMs, after the input weights and the hidden layer biases are chosen arbitrarily, TFFNs can be simply considered as a linear system and the output weights of the TFFN can be analytically determined through the simple generalized inverse operation of the hidden layer output matrices [4]. A. Proposal As already described above instead of SLFN the concepts behind the idea of two hidden layer Feed forward network using ELM will be described here. The theory behind this is same as that of the SLFN as the weight matrix and bias matrix for the hidden layers are randomly generated. Let us think about the structure having two hidden layer with nodes ˆN1 and ˆN2 nodes. Also the training example that are with the algorithm be arbitrary distinct samples as (xi, ti), where xi = [xi1, xi2, ..., xin]T , ∈ n and let ti = [ti1, ti2, ..., tim]T ∈ m . Then we can mathematically model the system as: ˆN2 i=1 βig2(Wi.( ˆN1 j=1 g1(wj.xj + bj) + bi)) = Ok, ∀k = 1, ..., N Here, N is the number of the training examples of the system. Also wi = [wi1, wi2, ..., win]T and wj = [wj1, wj2, ..., wj ˆN1 ]T are the weight matrices that are ran- domly generated for the system. Also βi = [βi1, βi2, ..., βim] be the output weight matrix of the system. Then the above system can be written as Hβ = T Here the H is the hidden layer output matrix for the system and can be generated as H =   g2(wj1.x1 + b1) . . . g2(wj ˆN1 .x1 + b ˆN2 ) . . . . . g2(wj1.x ˆN1 + b1) . . . g2(wj ˆN1 .x ˆN1 + b ˆN2 )   ˆN1× ˆN2 Here the hidden layer is found from equivalent hidden layers of the SLFNs but the number will be two instead of one in case of the SLFN. Also the time of computation will be somewhat more than that of the SLFN as the number of hidden layers is more than the SLFN. IV. SINGLE-HIDDEN-LAYER FEED FORWARD NETWORKS VERSUS MULTI-HIDDEN-LAYER FEED FORWARD NETWORKS It is difficult to deal with multi-hidden layers of ELM directly without having any ideas on single hidden layer of ELM. Thus, in the past 10 years, most of the ELM works has been focused on generalized single hidden-layer feed forward networks (SLFNs).The concepts behind the TFFNs are the same as that of the SLFNs. In the TFFNs we can say that there are two ELM theories running, one for the input layer-first hidden layer-second hidden layer and another for the first hidden layer-second hidden layer -output layer. Hence the concepts will be the same as that for the SLFNs ELM. The theorems and the theories behind this TFFN is as given below [1][2][3][4][12][13]:
  • 5. 5 Theorem 2: Universal approximation capability Given any bounded non-constant piece-wise continuous function as the activation function in hidden neurons, if by tuning parameters of hidden neuron activation function SLFNs can approximate any target continuous function, then for any continuous target function f(x) and any randomly generated function sequence hi(x)L i=1, LimL⇒∞|| L i=1 βihi(x) − f(x)|| = 0 holds with probability one with appropriate output weights β. Classification Capability: Similar to the approximation capability theorem of single-hidden-layer feed forward neural networks, it can be proved that the classification capability for the SLFNs with the hidden-layer mapping h(x) satisfying the universal approximation condition. Definition 1: A closed set is called a region regardless whether it is bounded or not. Lemma 1: Given some disjoint regions K1, K2, ..., Km in d and the corresponding m arbitrary real values c1, c2, ..., cm and an arbitrary region X which is disjoint from any of Ki, then there exists a continuous function f(x) such that f(x) = ci if x ∈ Ki and f(x) = c0 if x ∈ X, where c0 is any arbitrary real value different from c1, c2, ..., cp Now we can define the theorem known as Classification Capability theorem. Theorem 3 : Classification Capability Theorem Given a feature mapping h(x), if h(x)β is dense in C( d ) or in C(M), where M is a compact set in d , then a generalized SLFN with such a hidden-layer mapping h(x) can separate arbitrary disjoint regions of any shapes in d or M. Thus, according to the above theorems, it is necessary and sufficient condition that the feature mapping h(x) is chosen to make the h(x)β to have the capability for approximating any of the target continuous function. Again if the h(x)β cannot approximate any target continuous functions, then there may exist some shapes of regions which might not be separated any classifier with such kind of feature mapping h(x). Also as the dimensionality of the feature mapping is large, the output by the classifier h(x)β will also be as close to the class labels in those corresponding regions as possible. If the binary classification case is considered, ELM only uses a single output node, and the class label closer to the output value of the ELM is that of the predicted class label of that input data. A. Algorithm Given a training set, activation function g1(x) and g2(x) and the hidden neuron number ˆN1 and ˆN2, 1) Input the data into the model 2) Divide the data into the train and validation samples. The algorithm uses Randomized Search to achieve the optimized Hyperparameters (learning rate, regularization parameter in cost function, hidden neurons and hidden biases). • For the training set, assign arbitrary input weight w1 and w2. Also the biases B1 and B2. • Assign arbitrary input weight w1 and w2. Also the biases B1 and B2. • The output weight β will be calculated as: β = H† T 3) Optimized hypermeter setting network passed for the validation samples 4) Calculation of the output: H ∗ β = T Here, H, β and T are already described in the previous sections. V. RESULTS AND DISCUSSION TABLE I PERFORMANCE COMPARISON FOR GLASS DATASET Algorithm Testing Rate Training Time (secs) SVC-Sigmoid 66.154 0.058 SVC-rbf 63.077 0.0623 Logistic Regression L1 67.6923 0.0235 Logistic Regression L2 66.154 0.0354 Decision Trees 61.54 0.25 Random Forest 78.462 3.98 EM-SLFN 74.2115 0.044677 TFFN 75.46929 0.16709 Bagging 80 1.35 MLP 71.24 3.87 A comparison of an ELM with two hidden layers where the weight matrices for the input layer to the first hidden layer and the weight matrix for the second hidden layer is generated randomly as well as the biases in the both the hidden layer is made with that of ELM with SLFN. The dataset taken for the comparison is the Glass dataset. Here for all the algorithms 70% is taken for the training set and 30% id taken for the testing or the validation set. The prediction for the SLFN as we found out to be 74.2115 % whereas the prediction for the TFFN is found out to be 75.47%. Also from the above results, Bagging using the optimized search with decision trees gives a better result of 80%. But the time taken is very large i.e more than a minute. But if we see our proposed algorithm gives better result of 75% validation accuracy and time take is 0.16 seconds. Both these set of comparisons were made after a lot of testing and these readings were taken 20 times and the average is listed above. If the training time is taken into the consideration than SLFN-ELM still has the best training time. In the table II, for the pima-indians-diabetes dataset, we can see that our proposed algorithm has performed well both in terms of the validation accuracy and the training time taken. Its parent algorithm i.e SLFN-ELM has the accuracy of 76.78% in 0.4 seconds in comparison to TFFN with 77.66 %. Here for all the algorithms we have taken the random search
  • 6. 6 TABLE II PERFORMANCE COMPARISON FOR PIMA-INDIANS-DIABETES DATASET Algorithm Testing Rate Training Time (secs) SVC-Sigmoid 65.4 0.92 SVC-rbf 73.6 0.786 SVC-Linear 77.6518 0.6563 Logistic Regression L1 73.6 0.0678 Logistic Regression L2 73.16 0.0987 Decision Trees 64.07 0.78 Random Forest 72.73 9.98 EM-SLFN 76.78 0.4052 TFFN 77.66 0.86709 Bagging 75.32 10.5 MLP 73.16 15.23 and grid search to obtain at the optimized hyperparameters setting for running the algorithms in the validation part. Fig. 3. ELM Vs TFFN results for different datasets In the figure 3 we can find that ELM-SLFN and ELM- TFFN performances in the testing validation for the datasets Hepatitis, Diabetes, Haberman, Dermatology and Fertility. For some of the datasets the performance of TFFN is extremely good. While for others its performance is comparable. Also the only demerit is the training time. Its about 5 times the time taken by ELM-SLFN. Thus along with some merits, there are also some demerits. VI. CONCLUSION The randomness in the TFFN helps in avoiding the iterations that takes place in the optimizing the parameters in case of the multilayer perceptron. Apart from the iterations, the TFFN has a better accuracy in most of the standard datasets present in the literature. Thus, this network apart from being very fast in learning, does optimize the parameters to have a better accuracy. The demerit of this architecture is that is doesn’t know how to give priority to some of the learning training data apart from that the randomness can also scrutinize this algorithm to be moving in only one direction of learning. VII. REFERENCES [1] Huang, Gao, et al. ”Trends in extreme learning ma- chines: A review.” Neural Networks 61 (2015): 32-48. [2] Cambria, Erik, et al. ”Extreme learning machines [trends & controversies].” IEEE Intelligent Systems 28.6 (2013): 30- 59. [3] Huang, Guang-Bin. ”An insight into extreme learning machines: random neurons, random features and kernels.” Cognitive Computation 6.3 (2014): 376-390. [4] Huang, Guang-Bin, et al. ”Extreme learning machine for regression and multiclass classification.” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42.2 (2012): 513-529. [5] Huang, Guang-Bin, Qin-Yu Zhu, and Chee-Kheong Siew. ”Extreme learning machine: theory and applications.” Neuro- computing 70.1 (2006): 489-501. [6] Funahashi, Ken-Ichi. ”On the approximate realization of continuous mappings by neural networks.” Neural networks 2.3 (1989): 183-192. [7] Anthony, Martin, and Peter L. Bartlett. Neural network learning: Theoretical foundations. cambridge university press, 2009. [8] Huang, Guang-Bin, and Lei Chen. ”Convex incremental extreme learning machine.” Neurocomputing 70.16 (2007): 3056-3062. [9] Albert, Arthur. Regression and the Moore-Penrose pseu- doinverse. Elsevier, 1972. [10] Penrose, Roger. ”A generalized inverse for matrices.” Mathematical proceedings of the Cambridge philosophical society. Vol. 51. No. 3. Cambridge University Press, 1955. [11] Golub, Gene H., and Charles F. Van Loan. ”An analysis of the total least squares problem.” SIAM Journal on Numerical Analysis 17.6 (1980): 883-893. [12] Lawson, Charles L., and Richard J. Hanson. Solving least squares problems. Society for Industrial and Applied Mathematics, 1995. [13] Huang, Guang-Bin, et al. ”Extreme learning machine for regression and multiclass classification.” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42.2 (2012): 513-529.