Vector-Based Back Propagation Algorithm of.pdf

Vector-Based Back Propagation Algorithm of
Supervised Convolution Neural Network
1st
Nesrine Wagaa
Department of Physics and Instrumentation
LARATSI Laboratory
National Institute of Applied Sciences and Technology (INSAT)
Centre Urbain Nord, Tunisia
nesrinewagah@gmail.com
2nd
Hichem Kallel
Department of Physics and Instrumentation
LARATSI Laboratory
INSAT & South Mediterranean University (MedTech)
Les Berges du Lac II, Tunisia
hichem.kallel@yahoo.com
Abstract—The primary goal of this paper is to analyze the
impact of the convolution operation on the model performance.
In this context, to avoid the mathematical complexities behind
the Convolution Neural Network (CNN) model, the classical
convolution operation is substituted by a new proposed matrix
operation. The model considered is composed of one convolution
layer in series with a set of fully connected hidden layers. The
network parameters (filters, weights, and biases) are updated
using the back propagation gradient descent algorithm. The
model performance is improved through the variation of the
width and height CNN hyper-parameters. MNIST data are
considered here for the classification of handwritten numbers.
With a simple modification of the CNN hyper-parameters using
the new proposed matrix operation, a CNN performance of
98.83% was achieved.
Index Terms—Classical Convolution Operation , Matrix Op-
eration, CNN Parameters, CNN Hyper-Parameters, Gradient
Descent.
I. INTRODUCTION
A convolution neural network (CNN) or ConvNet is a
concrete case of deep neural networks [1]. It is inspired by
the hierarchical of the biological nervous system. ConvNet
is an old technique, which was already introduced at the
end of the 90s by Fukushima [2]. But the model was not
achieving good results until 1998 when LeCun et al designed
LeNet-5 [3], [4]. It was the first CNN modern architecture
capable to classify the handwritten digits from 0-9. In 2005 [5]
with the establishment of the General-purpose computing on
graphics processing units, (GPGPU) the performance of CNN
has significantly improved especially in image classification.
In 2012 [1], the AlexNet model is proposed. It is a very deep
network consisting of a huge number of layers. The AlexNet
was trained on two GPUs to speed up the learning process. It
was able to reduce the error rate to 15.3 %. Since then, more
other models are proposed to improve the CNN performance
such as VGGNet-16 [6], ResNet [7],etc [8], [9]. These various
types of CNN networks can be used to resolve the problems
of computer vision as object classification [10], detection [11],
and object tracking. The CNN networks are trained based
on big data together with strong hardware to realize needed
computations [5].
The structure or the architecture of the ConvNet consists of
a set of hidden layers. These layers are typically categorized
as convolution, and fully connected hidden layers [12]. The
CNN input is the images to be processed. For supervised
learning, the CNN output is the predicted value or the classes
of the input data [12], [13]. ConvNet model trains using
the back propagation algorithm [1], [2] to update the CNN
parameters which correspond to the filters coefficients applied
during the convolution layers to extract the image features,
the weights that represent the interconnection between the
artificial neurons of the fully connected hidden layers and
the bias values that define the activation level of the filters
coefficients and artificial neurons.
In this paper, the CNN performance is enhanced by the
development of a new technique that substitutes the convo-
lution operation based on matrix operation. We propose a
compact back propagation method to update the parameters of
a CNN model consisting of one convolution layer. For the fully
connected hidden layers, a simple recursive update rule was
previously developed [14]. The developed tools easily update
the CNN parameters (filters, weights, and biases) [12], [17]and
allow modification of the CNN hyper-parameters:
1) CNN Width.
• Fully connected hidden layers numbers [15].
2) CNN Height.
• Size and numbers of convolution filters [12].
• Neurons number in each fully connected hidden
layers [15], [16], [19].
3) Striding and padding convolution operation. [17].
4) Pooling operation [18], [20].
5) Activation function [21], [22].
6) Cost function [15], [23].
7) Learning rate [24].
In this context, we tested the CNN model performance based
on the variation of its hyper-parameters. The MNIST dataset of
handwritten digits [25] is used by the CNN model to achieve
the learning process.
978-1-7281-6999-6/20/$31.00 ©2020 IEEE
Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.

II. CLASSICAL CONVOLUTION NEURAL NETWORK (CNN)
MODEL
A. Convolution Operation
The convolution operation between an image X ∈ ℜ(u×u)
and a filter F ∈ ℜ(v×v)
is defined as follows
X~F = C
(
(u − v + 2Pad) + 1
s
×
(u − v + 2Pad) + 1
s
)
(1)
where,
C[a, b] =
u
∑
k=0
u
∑
l=0
X[k, l]F[a − k, b − l] (2)
Here,~ denotes the convolution operation. The stride s
denotes the number of pixels by which F is sliding over X.
The padding Pad is the number of zeros applied around X.
a, b, k, and l are the rows and columns indices of C and X.
B. Convolution Layer
The convolution layer shown in Fig.1 is a model consisting
of a convolution map and a pooling map. Here, in the
convolution operation the padding Pad = 0 and the stride
s = 1.
Fig. 1. Convolution Layer (ConvL).
Convolution Map
The convolution map is designed as follows:
• An input image X ∈ ℜ(u×u)
.
• r filters Fj ∈ ℜ(v×v)
; j = 1, 2, · · · , r.
• A bias matrix Bfj ∈ ℜ((u−v+1)×(u−v+1))
.
• A non-linear activation function f.
• An output matrix Cpj ∈ ℜ((u−v+1)×(u−v+1))
.
The output convolution map is defined as follows
Cj = X ~ Fj + Bfj (3)
Cpj = f(Cj) (4)
Pooling Map
A pooling map is an essential unit of CNN architecture.
This step is used to reduce the computational complexity of
the network through the minimization of the Cpj dimension.
Average pooling and max pooling are examples of the pooling
operation [18], [19]. Typically, a kernel Kj of size (2×2) with
a stride equals 2 can be applied to calculate the average or the
maximum value for each patch of Cpj. The output pooling
operation Pj has the size of (u−v+1
2 × u−v+1
2 ).
In appendix 1, we present the average and max-pooling
operations.
C. Fully Connected Layer
The convolution filters detect features of the input images,
called local features. A fully connected layer added in series
with the convolution layer to recognize the input images [12],
[15], [26]. As shown in Fig.2, a fully connected layer has an
input vector Y 0
corresponding to the concatenation of the r
CNN pooling map Pj. The output vector Y t
defines image
classes. Typically, a series of t fully connected hidden layers
are added between the input and output vector to enhance the
CNN performance.
Fig. 2. Fully Connected layer (FCL).
The basic equations of fully connected hidden layers are as
shown
Hi
= Wi
Y (i−1)
+ Bi
(5)
Y i
= f(Hi
) (6)
Where, Hi
is the weight sum vector, Bi
defines the bias
vector, Wi
is the weight matrix that represents the intercon-
nection between hidden layers . Y i
denotes the output vector
of a select fully connected hidden layer.
D. Convolution Neural Networks Model
The convolution neural network shown in Fig.3 is composed
of one convolution layer convL in series with t fully connected
hidden layers L. X is the input image that will be recognized
by the CNN model. Y t
is the CNN output corresponding to
the image recognition.

Fig. 3. Convolution Neural Network Architecture.
III. VECTOR-BASED CNN MODEL
This section aims to replace the classical convolution
operation by matrix operation.
Definition: We define the vector expression of any matrix
M ∈ ℜ(n×n)
as follows
M =






M1T
M2T
.
.
.
MnT






, M̄(n2×1) =





M1
M2
.
.
.
Mn





In fact, in this section the output convolutions map Cpj of
size ((u − v + 1) × (u − v + 1))will be transformed into a
vector ¯
Cpj of dimension((u − v + 1)2
× 1).
Based on appendix 2, the output convolved vector ¯
Cpj and
the output average pooling vector ¯
Pj are defined as follows
C̄j = Xx · ¯
Fj + B̄fj (7)
¯
Cpj = f ¯
(Cj) (8)
¯
Pj =






(
c̄pj1+c̄pj2+c̄pj3+c̄pj4
4 )
(
c̄pj5+c̄pj6+c̄pj7+c̄pj8
4 )
.
.
.
(
c̄pj(w2−3)+c̄pj(w2−2)+c̄pj(w2−1)+c̄pjw2
4 )






(9)
Not that Xx is ((u − v + 1)2
× v2
) input image, ¯
Fj is (v2
×
1) vector filter and B̄fj is ((u − v + 1)2
× 1) bias convolved
vector.
A. Forward Propagation
The CNN model developed in this study consists of one
convolution layer in series with three fully connected hidden
layers i = 3.
Convolution Layer
Based on equations (7), (8), and (9) the output convolution
map and the average pooling map of a model consists of r
vector filters can be written as follows
C̄((r×w2)×1) = Xo((r×w2)×(r×v2))·F̄((r×v2)×1)+B̄f((r×w2)×1)
(10)
¯
Cp = f ¯
(C) (11)
P̄((r× w2
4 )×1)
=





P̄1
P̄2
.
.
.
P̄r





(12)
where,w = u − v + 1
Xo((r×w2)×(r×v2) =





Xx 0 · · · 0
0 Xx · · · 0
.
.
.
.
.
.
...
.
.
.
0 0 · · · Xx





F̄(r×v2) =





F1
F2
.
.
.
Fr





, B̄f(r×w2) =





Bf1
Bf2
.
.
.
Bfr





Concatenation
Concatenation is the operation that defines the input of the
fully connected layer as a function of r pooling operations.
Y 0
(m×1) = P̄ (13)
where, m = r × w2
4 .
Fully Connected Layer
The fully connected hidden layers equations are derived
from [14].
Layer 1
Y 1
(n×1) = f(W1
(n×m)Y 0
(m×1) + B1
(n×1)) (14)
n denotes the number of artificial neurons in the first fully
connected hidden layer.
Layer 2
Y 2
(o×1) = f(W2
(o×n)Y 1
(n×1) + B2
(o×1)) (15)
o denotes the number of artificial neurons in the second
fully connected hidden layer.
Layer 3
Y 3
(p×1) = f(W3
(p×o)Y 2
(o×1) + B3
(p×1)) (16)
p is the dimension of the input labeled data.
B. Back Propagation
To update the CNN parameters and perform the learning
process, a back propagation algorithm is developed to min-
imize a cost function E. In our analysis, the mean squared
error cost function [12], [15] is used.
E =
1
2
(Y 3
− Yd)T
(Y 3
− Yd) (17)
Equation (18) shows the gradient descent method to update
the CNN parameters.
ℓnew = ℓold − α(
∂E
∂ℓold
)T
(18)

Here, ℓnew represents the update of bias convolution vector
B̄f , filter vector F̄, weight matrix W, and bias fully connected
vector B. α is the learning rate, we can choose it as a constant
or a variable with a positive value.
We note that the update equations of the CNN parameters
involve the computations of ( ∂E
∂ℓold
). We develop below the
parameters update for each layer.
Fully Connected Layer
For the fully connected layer, the detailed development of
the back propagation is proposed in [14], where
∂E
∂Bi
= [(W(i−1)T
·
∂E
∂B(i−1)
) ∗ f
′
(Hi
)]T
(19)
∂E
∂Wi
= (
∂E
∂Bi
)
T
· Y (i−1)T
(20)
The operation ∗ is defined in appendix 3.
Convolution Layer
∂E(1×1)
∂B̄f((r×w2)×1)
|(1×(r×w2)) =
∂E(1×1)
∂C̄((r×w2)×1)
∂C̄((r×w2)×1)
∂B̄f((r×w2)×1)
(21)
∂E(1×1)
∂F̄((r×v2)×1)
|(1×(r×v2)) =
∂E(1×1)
∂C̄((r×w2)×1)
∂C̄((r×w2)×1)
∂F̄((r×v2)×1)
(22)
By substituting (10) into (21) and (22), we obtain
∂C̄((r×w2)×1)
∂B̄f((r×w2)×1)
|((r×w2)×(r×w2)) = I (23)
∂C̄((r×w2)×1)
∂F̄((r×v2)×1)
|((r×w2)×(r×v2)) = Xo (24)
Here, I is the identity matrix
Computing of ∂E
∂C̄
∂E(1×1)
∂C̄((r×w2)×1)
|(1×(r×w2)) =
∂E(1×1)
∂C̄p((r×w2)×1)
∂C̄p((r×w2)×1)
∂C̄((r×w2)×1)
=
∂E
∂C̄p
|(1×(r×w2)) ∗ f′ ¯
(Cp)T
(1×(r×w2))
(25)
Definition: let’s define the operation Inc of any vector U as
follows:
U(n×1) =




u1
u2
.
.
.
un




Inc(U) =














u1
4
u1
4
u1
4
u1
4
.
.
.
un
4
un
4
un
4
un
4














(4n×1)
(26)
In this section, the Inc operation is used to increase the size of
the average pooling vector map, where
∂E
∂C̄p
= Inc(
∂E
∂P̄
) (27)
Since Y 0
=P̄
∂E
∂P̄
=
∂E
∂B1
W1
(28)
By substituting (28), (27),(25),(24), and (23) into (22) and (21)
we obtain
∂E
∂B̄f
= Inc(
∂E
∂B1
W1
) ∗ f′ ¯
(Cp)T (29)
∂E
∂F̄
=
∂E
∂B̄f
· Xo (30)
IV. SIMULATIONS RESULTS AND DISCUSSION
In this section, MNIST handwritten digits data are used to test the
CNN performance using the proposed matrix operation. This database
consists of a training set of 60,000 images and a testing set of 10,000
images. The images are a gray scale of dimension(28 × 28). The
(10 × 1) output vector classifies the digits from 0 to 9. To enhance
the CNN performance the following hyper-parameters are varied:
• CNN Width.
• CNN Height.
The performance of the CNN model corresponds to the ratio
between the total number of correct classifications and the number
of test images.
TABLE I
CNN PERFORMANCE ACCORDING THE VARIATION OF WIDTH AND
HEIGHT HYPER-PARAMETERS
N° of training images N° of filters Size of filters Performance
5,000
− − 0.9325
5
(32 × 1) 0.9414
(92 × 1) 0.9508
(132 × 1) 0.9478
10 (92 × 1) 0.9575
20 (92 × 1) 0.9582
10,000
− − 0.9481
5
(32 × 1) 0.9570
(92 × 1) 0.9589
(132 × 1) 0.9579
10 (92 × 1) 0.9623
20 (92 × 1) 0.9677
30,000
− − 0.9773
5
(32 × 1) 0.9785
(92 × 1) 0.9810
(132 × 1) 0.9798
10 (92 × 1) 0.9835
20 (92 × 1) 0.9871
60,000
− − 0.9790
5
(32 × 1) 0.9831
(92 × 1) 0.9867
(132 × 1) 0.9859
10 (92 × 1) 0.9874
20 (92 × 1) 0.9883
The simulated CNN model consists of :
• One convolution layer. The convolve activation function is the
Relu. The used pooling operation is the average.
• A fully connected layer. It comprises 3 hidden layers. Each
hidden layer forms of 200 artificial neurons. The activation
function is the same used on the convolution layer. The learning
rate equals 0.3.

As shown in Table 1, we tested the CNN performance through the
variation of
• Number of convolving filters vector.
• Size of each filter.
• Number of training images.
Here (−) denotes a model consisting just with a fully connected
layer.
The peak of CNN performance is 0.9883. We obtained it by
increasing the size and the number of convolved filters vector and
the number of training images.
V. CONCLUSION
In this paper, a new matrix operation that substitutes the classical
convolution operation is developed. MNIST data of handwritten digits
is used to test the influence of the CNN hyper-parameters on the
model performance. The peak of performance achieved is 0.9883.
It is obtained using a CNN model composed of one convolution
layer and three fully connected hidden layers. The results deduced
from the simulation proposed do not represent the optimal CNN
hyper-parameters configuration. Further increase in the number of
convolution layers and the number of training dataset can be enhanced
the CNN performance.
APPENDIX
Appendix 1: Average and max-pooling operations
Cpj =




cp11 cp12 · · · cp1(u−v+1)
cp21 cp22 · · · cp2(u−v+1)
.
.
.
.
.
.
...
.
.
.
cp(u−v+1)1 cp(u−v+1)2 · · · cp(u−v+1)(u−v+1)




(31)
The pooling map defined as follows
PjAve =






P11 P12 · · · P1
(u−v+1)
2
P21 P22 · · · P2
(u−v+1)
2
.
.
.
.
.
.
...
.
.
.
P(u−v+1)
2
1
P(u−v+1)
2
2
· · · P(u−v+1)
2
(u−v+1)
2






(32)
For the Average Pooling Operation,
P11 =
cp11+cp12+cp21+cp22
4
P1
(u−v+1)
2
=
cp1(u−v)+cp1(u−v+1)+cp2(u−v)+cp2(u−v+1)
4
P(u−v+1)
2
1
=
cp(u−v)1+cp(u−v)2+cp(u−v+1)1+cp(u−v+1)2
4
P(u−v+1)
2
(u−v+1)
2
=
cp(u−v)(u−v)+cp(u−v)(u−v+1)+cp(u−v+1)(u−v)+cp(u−v+1)(u−v+1)
4
For the max Pooling Operation,
P11 = max(cp11, cp12, cp21, cp22)
P1
(u−v+1)
2
= max(cp1(u−v), cp1(u−v+1), cp2(u−v), cp2(u−v+1))
P(u−v+1)
2
1
= max(cp(u−v)1, cp(u−v)2, cp(u−v+1)1, cp(u−v+1)2)
P(u−v+1)
2
(u−v+1)
2
= max(cp(u−v)(u−v), cp(u−v)(u−v+1), cp(u−v+1)(u−v), cp(u−v+1)(u−v+1))
Appendix 2: Matrix Operation
For a CNN model consisting of an input image X ∈ ℜ(u×u)
convolved with a filter F ∈ ℜ(v×v)
X =






X1T
X2T
.
.
.
XuT






, F =






F1T
F2T
.
.
.
FvT






where,
XiT
=
[
xi
1 xi
2 · · · xi
u
]
FiT
=
[
fi
1 fi
2 · · · fi
v
]
We define XiT
j−
→k as follows
XiT
j−
→k =
[
xi
j xi
j+1 · · · xi
k
]
The proposed matrix operation that substituted the classical con-
volution operation is:
X ~ F = Xx|((u−v+1)2×v2) · F̄|(v2×1)
=
















































XuT
1−
→v X
(u−1)T
1−
→v · · · X1T
1−
→v
XuT
2−
→(v+1) X
(u−1)T
2−
→(v+1) · · · X1T
2−
→(v+1)
.
.
.
.
.
.
...
.
.
.
XuT
(u−v+1)−
→1 X
(u−1)T
(u−v+1)−
→1 · · · X1T
(u−v+1)−
→1
















X
(u−1)T
1−
→v X
(u−2)T
1−
→v · · · X2T
1−
→v
X
(u−1)T
2−
→(v+1) X
(u−2)T
2−
→(v+1) · · · X2T
2−
→(v+1)
.
.
.
.
.
.
...
.
.
.
X
(u−1)T
(u−v+1)−
→1 X
(u−2)T
(u−v+1)−
→1 · · · X2T
(u−v+1)−
→1








.
.
.








XvT
1−
→v X
(u−v)T
1−
→v · · · X
(u−v+1)T
1−
→v
XvT
2−
→(v+1) X
(u−v)T
2−
→(v+1) · · · X
(u−v+1)T
2−
→(v+1)
.
.
.
.
.
.
...
.
.
.
XvT
(u−v+1)−
→1 X
(u−v)T
(u−v+1)−
→1 · · · X
(u−v+1)T
(u−v+1)−
→1





















































F1
F2
.
.
.
Fv





(33)
Appendix 3: The definition of (∗) operation
Let’s define the operation (∗) for any vector U and V :
U(n×1) =




u1
u2
.
.
.
un



 and V(n×1) =




v1
v2
.
.
.
vn




U ∗ V =




u1v1
u2v2
.
.
.
unvn



 (34)
REFERENCES
[1] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “ImageNet Classifica-
tion with Deep ConvolutionalNeural Networks,” Proceedings of the
Advances in neural information Processing Systems, pp. 1097–1105,
2012.
[2] K. Fukushima, “ Neocognitron: A Self-organizing Neural Network
Model for a Mechanism of Pattern Recognition Unaffected by Shift
in Position,” Biol. Cybernetics 36 by Springe, pp. 193–202,1980.

[3] Y. LeCun, et al., “Handwritten digit recognition with a backpropagation
network,” in Advances in neural information processing systems, pp.
396–404, 1990.
[4] Y. LeCun, L.Bottou, Y.Bengio, and P.Haffner, “Gradient-based learning
applied to document recognition,” Proceedings of the IEEE, vol. 86, pp.
2278–2324, 1998.
[5] D. Steinkraus, I. Buck, and P.Y. Simard, “Using GPUs for machine
learning algorithms,” in Document Analysis and Recognition, 2005.
Proceedings. Eighth International Conference, pp. 1115–1120,2005.
[6] K.Simonyan, and A.Zisserman, “ Very deep convolutional networks for
large-scale image recognition,” Conference paper at ICLR, pp. 1–14,
2015.
[7] K. He, X. Zhang, S. Ren,and J. Sun, “ Deep Residual Learning for
Image Recognition,” 2016 IEEE Conference on Computer Vision and
Pattern Recognition , pp.1–9,2016.
[8] S.Xie, R.Girshick, P.Dollar, Z. Tu, and K.He, “ Aggregated Residual
Transformations for Deep Neural Networks,” 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 1–9, 2017.
[9] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi , “Inception-v4,
Inception-ResNet and the Impact of Residual Connections on Learning,”
arXiv Prepr. arXiv1602.07261v2, vol. 131, no. 2,pp. 262–263,2016.
[10] H.Jang, H-Jun. Yang, D-S. Jeong, and H.Lee, “ Object classification
using CNN for video traffic detection system,” 2015 21st Korea-Japan
Joint Workshop on Frontiers of Computer Vision (FCV), 2015.
[11] H.Yanagisawa, T.Yamashita, and H.Watanabe, “ A Study on Object
Detection Method from Manga Images using CNN ,”2018 International
Workshop on Advanced Image Technology (IWAIT), 2018.
[12] R. B. Arif, M. A. B.Siddique, M. M. R. Khan, and M.R. Oishe, “Study
and Observation of the Variations of Accuracies for Handwritten Digits
Recognition with Various Hidden Layers and Epochs using Convolu-
tional Neural Network ,” 4th International Conference on Electrical
Engineering and Information and Communication Technology,2018.
[13] V.Gullapalli, “ A COMPARISON OF SUPERVISED AND REIN-
FORCEMENT LEARNING METHODS ON A REINFORCEMENT
LEARNING TASK, ” Proceedings of the 1991 IEEE International
Symposium on Intelligent Control, pp. 394–399, 1991.
[14] N.Wagaa,and H. Kallel, “Recursive Supervised Artificial Neural Net-
work Algorithm for Data Classification and Regression,” unpublished.
[15] M. A. B. Siddique, M. M. R. Khan, R. B. Arif, and Z. Ashrafi,
“Study and Observation of the Variations of Accuracies for Handwritten
Digits Recognition with Various Hidden Layers and Epochs using
Neural Network Algorithm,” 4th International Conference on Electrical
Engineering and Information and Communication Technology, pp. 118–
123,2018.
[16] V. E .Ismailov, “On the approximation by neural networks with bounded
number of neurons in hidden layers,” Journal of Mathematical Analysis
and Applications, pp. 963–969,2014.
[17] M .Rastegari, V.Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net:
ImageNet Classification Using Binary Convolutional Neural Networks,
” Lecture Notes in Computer Science Springer, pp. 525–542, 2016.
[18] D . Scherer, A.M¨uller, and S. Behnke, “Evaluation of Pooling Opera-
tions in Convolutional Architectures for Object Recognition, ” Interna-
tional Conference on Artificial Neural Networks Springer, pp. 92–101,
2010.
[19] I. B.Dlimi, and H.Kallel, “Robust Neural Control for Robotic Ma-
nipulators,” International Journal of Enhanced Research in Science
Technology, and Engineering, vol.5, no.2, pp. 198–205, 2016.
[20] M . DONG, Y.LI, X.TANG, J. XU, S.BI, and A. Y.CAI, “Variable Con-
volution and Pooling Convolutional Neural Network for Text Sentiment
Classification, ” IEEE Access, 2020.
[21] X .Glorot , and Y.Bengio, “Understanding the difficulty of training
deep feedforward neural networks, ” 13th International Conference on
Artificial Intelligence and Statistics, pp. 249–256, 2010.
[22] K .He, X.Zhang, S.Ren, and J.Sun, “Delving Deep into Rectifiers:
Surpassing Human-Level Performance on ImageNet Classification, ”
2015 IEEE International Conference on Computer Vision, pp. 1026–
1034, 2015.
[23] P .Dangeti , “Statistics for Machine Learning: Techniques for explor-
ing supervised, unsupervised, and reinforcement learning models with
Python and R, ” Packt Publishing, 2017.
[24] T .Takase , S.Oyama, and K.Masahito, “Effective neural network training
with adaptive learning rate based on training loss, ” Neural Networks,
pp. 68–78, 2018.
[25] Y . LeCun , “The MNIST database of handwritten digits,
”http://yann.lecun.com/exdb/mnist/”, 1998.
[26] I. B.Dlimi, and H.Kallel, “Optimal neural control for constrained robotic
manipulators,” 2010 5th IEEE International Conference Intelligent
Systems,pp.302–308, 2010.

Vector-Based Back Propagation Algorithm of.pdf

Recommended

Recommended

More Related Content

Similar to Vector-Based Back Propagation Algorithm of.pdf

Similar to Vector-Based Back Propagation Algorithm of.pdf (20)

Recently uploaded

Recently uploaded (20)

Vector-Based Back Propagation Algorithm of.pdf