SlideShare a Scribd company logo
1 of 6
Download to read offline
Vector-Based Back Propagation Algorithm of
Supervised Convolution Neural Network
1st
Nesrine Wagaa
Department of Physics and Instrumentation
LARATSI Laboratory
National Institute of Applied Sciences and Technology (INSAT)
Centre Urbain Nord, Tunisia
nesrinewagah@gmail.com
2nd
Hichem Kallel
Department of Physics and Instrumentation
LARATSI Laboratory
INSAT & South Mediterranean University (MedTech)
Les Berges du Lac II, Tunisia
hichem.kallel@yahoo.com
Abstract—The primary goal of this paper is to analyze the
impact of the convolution operation on the model performance.
In this context, to avoid the mathematical complexities behind
the Convolution Neural Network (CNN) model, the classical
convolution operation is substituted by a new proposed matrix
operation. The model considered is composed of one convolution
layer in series with a set of fully connected hidden layers. The
network parameters (filters, weights, and biases) are updated
using the back propagation gradient descent algorithm. The
model performance is improved through the variation of the
width and height CNN hyper-parameters. MNIST data are
considered here for the classification of handwritten numbers.
With a simple modification of the CNN hyper-parameters using
the new proposed matrix operation, a CNN performance of
98.83% was achieved.
Index Terms—Classical Convolution Operation , Matrix Op-
eration, CNN Parameters, CNN Hyper-Parameters, Gradient
Descent.
I. INTRODUCTION
A convolution neural network (CNN) or ConvNet is a
concrete case of deep neural networks [1]. It is inspired by
the hierarchical of the biological nervous system. ConvNet
is an old technique, which was already introduced at the
end of the 90s by Fukushima [2]. But the model was not
achieving good results until 1998 when LeCun et al designed
LeNet-5 [3], [4]. It was the first CNN modern architecture
capable to classify the handwritten digits from 0-9. In 2005 [5]
with the establishment of the General-purpose computing on
graphics processing units, (GPGPU) the performance of CNN
has significantly improved especially in image classification.
In 2012 [1], the AlexNet model is proposed. It is a very deep
network consisting of a huge number of layers. The AlexNet
was trained on two GPUs to speed up the learning process. It
was able to reduce the error rate to 15.3 %. Since then, more
other models are proposed to improve the CNN performance
such as VGGNet-16 [6], ResNet [7],etc [8], [9]. These various
types of CNN networks can be used to resolve the problems
of computer vision as object classification [10], detection [11],
and object tracking. The CNN networks are trained based
on big data together with strong hardware to realize needed
computations [5].
The structure or the architecture of the ConvNet consists of
a set of hidden layers. These layers are typically categorized
as convolution, and fully connected hidden layers [12]. The
CNN input is the images to be processed. For supervised
learning, the CNN output is the predicted value or the classes
of the input data [12], [13]. ConvNet model trains using
the back propagation algorithm [1], [2] to update the CNN
parameters which correspond to the filters coefficients applied
during the convolution layers to extract the image features,
the weights that represent the interconnection between the
artificial neurons of the fully connected hidden layers and
the bias values that define the activation level of the filters
coefficients and artificial neurons.
In this paper, the CNN performance is enhanced by the
development of a new technique that substitutes the convo-
lution operation based on matrix operation. We propose a
compact back propagation method to update the parameters of
a CNN model consisting of one convolution layer. For the fully
connected hidden layers, a simple recursive update rule was
previously developed [14]. The developed tools easily update
the CNN parameters (filters, weights, and biases) [12], [17]and
allow modification of the CNN hyper-parameters:
1) CNN Width.
• Fully connected hidden layers numbers [15].
2) CNN Height.
• Size and numbers of convolution filters [12].
• Neurons number in each fully connected hidden
layers [15], [16], [19].
3) Striding and padding convolution operation. [17].
4) Pooling operation [18], [20].
5) Activation function [21], [22].
6) Cost function [15], [23].
7) Learning rate [24].
In this context, we tested the CNN model performance based
on the variation of its hyper-parameters. The MNIST dataset of
handwritten digits [25] is used by the CNN model to achieve
the learning process.
978-1-7281-6999-6/20/$31.00 ©2020 IEEE
Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.
II. CLASSICAL CONVOLUTION NEURAL NETWORK (CNN)
MODEL
A. Convolution Operation
The convolution operation between an image X ∈ ℜ(u×u)
and a filter F ∈ ℜ(v×v)
is defined as follows
X~F = C
(
(u − v + 2Pad) + 1
s
×
(u − v + 2Pad) + 1
s
)
(1)
where,
C[a, b] =
u
∑
k=0
u
∑
l=0
X[k, l]F[a − k, b − l] (2)
Here,~ denotes the convolution operation. The stride s
denotes the number of pixels by which F is sliding over X.
The padding Pad is the number of zeros applied around X.
a, b, k, and l are the rows and columns indices of C and X.
B. Convolution Layer
The convolution layer shown in Fig.1 is a model consisting
of a convolution map and a pooling map. Here, in the
convolution operation the padding Pad = 0 and the stride
s = 1.
Fig. 1. Convolution Layer (ConvL).
Convolution Map
The convolution map is designed as follows:
• An input image X ∈ ℜ(u×u)
.
• r filters Fj ∈ ℜ(v×v)
; j = 1, 2, · · · , r.
• A bias matrix Bfj ∈ ℜ((u−v+1)×(u−v+1))
.
• A non-linear activation function f.
• An output matrix Cpj ∈ ℜ((u−v+1)×(u−v+1))
.
The output convolution map is defined as follows
Cj = X ~ Fj + Bfj (3)
Cpj = f(Cj) (4)
Pooling Map
A pooling map is an essential unit of CNN architecture.
This step is used to reduce the computational complexity of
the network through the minimization of the Cpj dimension.
Average pooling and max pooling are examples of the pooling
operation [18], [19]. Typically, a kernel Kj of size (2×2) with
a stride equals 2 can be applied to calculate the average or the
maximum value for each patch of Cpj. The output pooling
operation Pj has the size of (u−v+1
2 × u−v+1
2 ).
In appendix 1, we present the average and max-pooling
operations.
C. Fully Connected Layer
The convolution filters detect features of the input images,
called local features. A fully connected layer added in series
with the convolution layer to recognize the input images [12],
[15], [26]. As shown in Fig.2, a fully connected layer has an
input vector Y 0
corresponding to the concatenation of the r
CNN pooling map Pj. The output vector Y t
defines image
classes. Typically, a series of t fully connected hidden layers
are added between the input and output vector to enhance the
CNN performance.
Fig. 2. Fully Connected layer (FCL).
The basic equations of fully connected hidden layers are as
shown
Hi
= Wi
Y (i−1)
+ Bi
(5)
Y i
= f(Hi
) (6)
Where, Hi
is the weight sum vector, Bi
defines the bias
vector, Wi
is the weight matrix that represents the intercon-
nection between hidden layers . Y i
denotes the output vector
of a select fully connected hidden layer.
D. Convolution Neural Networks Model
The convolution neural network shown in Fig.3 is composed
of one convolution layer convL in series with t fully connected
hidden layers L. X is the input image that will be recognized
by the CNN model. Y t
is the CNN output corresponding to
the image recognition.
Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.
Fig. 3. Convolution Neural Network Architecture.
III. VECTOR-BASED CNN MODEL
This section aims to replace the classical convolution
operation by matrix operation.
Definition: We define the vector expression of any matrix
M ∈ ℜ(n×n)
as follows
M =






M1T
M2T
.
.
.
MnT






, M̄(n2×1) =





M1
M2
.
.
.
Mn





In fact, in this section the output convolutions map Cpj of
size ((u − v + 1) × (u − v + 1))will be transformed into a
vector ¯
Cpj of dimension((u − v + 1)2
× 1).
Based on appendix 2, the output convolved vector ¯
Cpj and
the output average pooling vector ¯
Pj are defined as follows
C̄j = Xx · ¯
Fj + B̄fj (7)
¯
Cpj = f ¯
(Cj) (8)
¯
Pj =






(
c̄pj1+c̄pj2+c̄pj3+c̄pj4
4 )
(
c̄pj5+c̄pj6+c̄pj7+c̄pj8
4 )
.
.
.
(
c̄pj(w2−3)+c̄pj(w2−2)+c̄pj(w2−1)+c̄pjw2
4 )






(9)
Not that Xx is ((u − v + 1)2
× v2
) input image, ¯
Fj is (v2
×
1) vector filter and B̄fj is ((u − v + 1)2
× 1) bias convolved
vector.
A. Forward Propagation
The CNN model developed in this study consists of one
convolution layer in series with three fully connected hidden
layers i = 3.
Convolution Layer
Based on equations (7), (8), and (9) the output convolution
map and the average pooling map of a model consists of r
vector filters can be written as follows
C̄((r×w2)×1) = Xo((r×w2)×(r×v2))·F̄((r×v2)×1)+B̄f((r×w2)×1)
(10)
¯
Cp = f ¯
(C) (11)
P̄((r× w2
4 )×1)
=





P̄1
P̄2
.
.
.
P̄r





(12)
where,w = u − v + 1
Xo((r×w2)×(r×v2) =





Xx 0 · · · 0
0 Xx · · · 0
.
.
.
.
.
.
...
.
.
.
0 0 · · · Xx





F̄(r×v2) =





F1
F2
.
.
.
Fr





, B̄f(r×w2) =





Bf1
Bf2
.
.
.
Bfr





Concatenation
Concatenation is the operation that defines the input of the
fully connected layer as a function of r pooling operations.
Y 0
(m×1) = P̄ (13)
where, m = r × w2
4 .
Fully Connected Layer
The fully connected hidden layers equations are derived
from [14].
Layer 1
Y 1
(n×1) = f(W1
(n×m)Y 0
(m×1) + B1
(n×1)) (14)
n denotes the number of artificial neurons in the first fully
connected hidden layer.
Layer 2
Y 2
(o×1) = f(W2
(o×n)Y 1
(n×1) + B2
(o×1)) (15)
o denotes the number of artificial neurons in the second
fully connected hidden layer.
Layer 3
Y 3
(p×1) = f(W3
(p×o)Y 2
(o×1) + B3
(p×1)) (16)
p is the dimension of the input labeled data.
B. Back Propagation
To update the CNN parameters and perform the learning
process, a back propagation algorithm is developed to min-
imize a cost function E. In our analysis, the mean squared
error cost function [12], [15] is used.
E =
1
2
(Y 3
− Yd)T
(Y 3
− Yd) (17)
Equation (18) shows the gradient descent method to update
the CNN parameters.
ℓnew = ℓold − α(
∂E
∂ℓold
)T
(18)
Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.
Here, ℓnew represents the update of bias convolution vector
B̄f , filter vector F̄, weight matrix W, and bias fully connected
vector B. α is the learning rate, we can choose it as a constant
or a variable with a positive value.
We note that the update equations of the CNN parameters
involve the computations of ( ∂E
∂ℓold
). We develop below the
parameters update for each layer.
Fully Connected Layer
For the fully connected layer, the detailed development of
the back propagation is proposed in [14], where
∂E
∂Bi
= [(W(i−1)T
·
∂E
∂B(i−1)
) ∗ f
′
(Hi
)]T
(19)
∂E
∂Wi
= (
∂E
∂Bi
)
T
· Y (i−1)T
(20)
The operation ∗ is defined in appendix 3.
Convolution Layer
∂E(1×1)
∂B̄f((r×w2)×1)
|(1×(r×w2)) =
∂E(1×1)
∂C̄((r×w2)×1)
∂C̄((r×w2)×1)
∂B̄f((r×w2)×1)
(21)
∂E(1×1)
∂F̄((r×v2)×1)
|(1×(r×v2)) =
∂E(1×1)
∂C̄((r×w2)×1)
∂C̄((r×w2)×1)
∂F̄((r×v2)×1)
(22)
By substituting (10) into (21) and (22), we obtain
∂C̄((r×w2)×1)
∂B̄f((r×w2)×1)
|((r×w2)×(r×w2)) = I (23)
∂C̄((r×w2)×1)
∂F̄((r×v2)×1)
|((r×w2)×(r×v2)) = Xo (24)
Here, I is the identity matrix
Computing of ∂E
∂C̄
∂E(1×1)
∂C̄((r×w2)×1)
|(1×(r×w2)) =
∂E(1×1)
∂C̄p((r×w2)×1)
∂C̄p((r×w2)×1)
∂C̄((r×w2)×1)
=
∂E
∂C̄p
|(1×(r×w2)) ∗ f′ ¯
(Cp)T
(1×(r×w2))
(25)
Definition: let’s define the operation Inc of any vector U as
follows:
U(n×1) =




u1
u2
.
.
.
un




Inc(U) =














u1
4
u1
4
u1
4
u1
4
.
.
.
un
4
un
4
un
4
un
4














(4n×1)
(26)
In this section, the Inc operation is used to increase the size of
the average pooling vector map, where
∂E
∂C̄p
= Inc(
∂E
∂P̄
) (27)
Since Y 0
=P̄
∂E
∂P̄
=
∂E
∂B1
W1
(28)
By substituting (28), (27),(25),(24), and (23) into (22) and (21)
we obtain
∂E
∂B̄f
= Inc(
∂E
∂B1
W1
) ∗ f′ ¯
(Cp)T (29)
∂E
∂F̄
=
∂E
∂B̄f
· Xo (30)
IV. SIMULATIONS RESULTS AND DISCUSSION
In this section, MNIST handwritten digits data are used to test the
CNN performance using the proposed matrix operation. This database
consists of a training set of 60,000 images and a testing set of 10,000
images. The images are a gray scale of dimension(28 × 28). The
(10 × 1) output vector classifies the digits from 0 to 9. To enhance
the CNN performance the following hyper-parameters are varied:
• CNN Width.
• CNN Height.
The performance of the CNN model corresponds to the ratio
between the total number of correct classifications and the number
of test images.
TABLE I
CNN PERFORMANCE ACCORDING THE VARIATION OF WIDTH AND
HEIGHT HYPER-PARAMETERS
N° of training images N° of filters Size of filters Performance
5,000
− − 0.9325
5
(32 × 1) 0.9414
(92 × 1) 0.9508
(132 × 1) 0.9478
10 (92 × 1) 0.9575
20 (92 × 1) 0.9582
10,000
− − 0.9481
5
(32 × 1) 0.9570
(92 × 1) 0.9589
(132 × 1) 0.9579
10 (92 × 1) 0.9623
20 (92 × 1) 0.9677
30,000
− − 0.9773
5
(32 × 1) 0.9785
(92 × 1) 0.9810
(132 × 1) 0.9798
10 (92 × 1) 0.9835
20 (92 × 1) 0.9871
60,000
− − 0.9790
5
(32 × 1) 0.9831
(92 × 1) 0.9867
(132 × 1) 0.9859
10 (92 × 1) 0.9874
20 (92 × 1) 0.9883
The simulated CNN model consists of :
• One convolution layer. The convolve activation function is the
Relu. The used pooling operation is the average.
• A fully connected layer. It comprises 3 hidden layers. Each
hidden layer forms of 200 artificial neurons. The activation
function is the same used on the convolution layer. The learning
rate equals 0.3.
Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.
As shown in Table 1, we tested the CNN performance through the
variation of
• Number of convolving filters vector.
• Size of each filter.
• Number of training images.
Here (−) denotes a model consisting just with a fully connected
layer.
The peak of CNN performance is 0.9883. We obtained it by
increasing the size and the number of convolved filters vector and
the number of training images.
V. CONCLUSION
In this paper, a new matrix operation that substitutes the classical
convolution operation is developed. MNIST data of handwritten digits
is used to test the influence of the CNN hyper-parameters on the
model performance. The peak of performance achieved is 0.9883.
It is obtained using a CNN model composed of one convolution
layer and three fully connected hidden layers. The results deduced
from the simulation proposed do not represent the optimal CNN
hyper-parameters configuration. Further increase in the number of
convolution layers and the number of training dataset can be enhanced
the CNN performance.
APPENDIX
Appendix 1: Average and max-pooling operations
Cpj =




cp11 cp12 · · · cp1(u−v+1)
cp21 cp22 · · · cp2(u−v+1)
.
.
.
.
.
.
...
.
.
.
cp(u−v+1)1 cp(u−v+1)2 · · · cp(u−v+1)(u−v+1)




(31)
The pooling map defined as follows
PjAve =






P11 P12 · · · P1
(u−v+1)
2
P21 P22 · · · P2
(u−v+1)
2
.
.
.
.
.
.
...
.
.
.
P(u−v+1)
2
1
P(u−v+1)
2
2
· · · P(u−v+1)
2
(u−v+1)
2






(32)
For the Average Pooling Operation,
P11 =
cp11+cp12+cp21+cp22
4
P1
(u−v+1)
2
=
cp1(u−v)+cp1(u−v+1)+cp2(u−v)+cp2(u−v+1)
4
P(u−v+1)
2
1
=
cp(u−v)1+cp(u−v)2+cp(u−v+1)1+cp(u−v+1)2
4
P(u−v+1)
2
(u−v+1)
2
=
cp(u−v)(u−v)+cp(u−v)(u−v+1)+cp(u−v+1)(u−v)+cp(u−v+1)(u−v+1)
4
For the max Pooling Operation,
P11 = max(cp11, cp12, cp21, cp22)
P1
(u−v+1)
2
= max(cp1(u−v), cp1(u−v+1), cp2(u−v), cp2(u−v+1))
P(u−v+1)
2
1
= max(cp(u−v)1, cp(u−v)2, cp(u−v+1)1, cp(u−v+1)2)
P(u−v+1)
2
(u−v+1)
2
= max(cp(u−v)(u−v), cp(u−v)(u−v+1), cp(u−v+1)(u−v), cp(u−v+1)(u−v+1))
Appendix 2: Matrix Operation
For a CNN model consisting of an input image X ∈ ℜ(u×u)
convolved with a filter F ∈ ℜ(v×v)
X =






X1T
X2T
.
.
.
XuT






, F =






F1T
F2T
.
.
.
FvT






where,
XiT
=
[
xi
1 xi
2 · · · xi
u
]
FiT
=
[
fi
1 fi
2 · · · fi
v
]
We define XiT
j−
→k as follows
XiT
j−
→k =
[
xi
j xi
j+1 · · · xi
k
]
The proposed matrix operation that substituted the classical con-
volution operation is:
X ~ F = Xx|((u−v+1)2×v2) · F̄|(v2×1)
=
















































XuT
1−
→v X
(u−1)T
1−
→v · · · X1T
1−
→v
XuT
2−
→(v+1) X
(u−1)T
2−
→(v+1) · · · X1T
2−
→(v+1)
.
.
.
.
.
.
...
.
.
.
XuT
(u−v+1)−
→1 X
(u−1)T
(u−v+1)−
→1 · · · X1T
(u−v+1)−
→1
















X
(u−1)T
1−
→v X
(u−2)T
1−
→v · · · X2T
1−
→v
X
(u−1)T
2−
→(v+1) X
(u−2)T
2−
→(v+1) · · · X2T
2−
→(v+1)
.
.
.
.
.
.
...
.
.
.
X
(u−1)T
(u−v+1)−
→1 X
(u−2)T
(u−v+1)−
→1 · · · X2T
(u−v+1)−
→1








.
.
.








XvT
1−
→v X
(u−v)T
1−
→v · · · X
(u−v+1)T
1−
→v
XvT
2−
→(v+1) X
(u−v)T
2−
→(v+1) · · · X
(u−v+1)T
2−
→(v+1)
.
.
.
.
.
.
...
.
.
.
XvT
(u−v+1)−
→1 X
(u−v)T
(u−v+1)−
→1 · · · X
(u−v+1)T
(u−v+1)−
→1





















































F1
F2
.
.
.
Fv





(33)
Appendix 3: The definition of (∗) operation
Let’s define the operation (∗) for any vector U and V :
U(n×1) =




u1
u2
.
.
.
un



 and V(n×1) =




v1
v2
.
.
.
vn




U ∗ V =




u1v1
u2v2
.
.
.
unvn



 (34)
REFERENCES
[1] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “ImageNet Classifica-
tion with Deep ConvolutionalNeural Networks,” Proceedings of the
Advances in neural information Processing Systems, pp. 1097–1105,
2012.
[2] K. Fukushima, “ Neocognitron: A Self-organizing Neural Network
Model for a Mechanism of Pattern Recognition Unaffected by Shift
in Position,” Biol. Cybernetics 36 by Springe, pp. 193–202,1980.
Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.
[3] Y. LeCun, et al., “Handwritten digit recognition with a backpropagation
network,” in Advances in neural information processing systems, pp.
396–404, 1990.
[4] Y. LeCun, L.Bottou, Y.Bengio, and P.Haffner, “Gradient-based learning
applied to document recognition,” Proceedings of the IEEE, vol. 86, pp.
2278–2324, 1998.
[5] D. Steinkraus, I. Buck, and P.Y. Simard, “Using GPUs for machine
learning algorithms,” in Document Analysis and Recognition, 2005.
Proceedings. Eighth International Conference, pp. 1115–1120,2005.
[6] K.Simonyan, and A.Zisserman, “ Very deep convolutional networks for
large-scale image recognition,” Conference paper at ICLR, pp. 1–14,
2015.
[7] K. He, X. Zhang, S. Ren,and J. Sun, “ Deep Residual Learning for
Image Recognition,” 2016 IEEE Conference on Computer Vision and
Pattern Recognition , pp.1–9,2016.
[8] S.Xie, R.Girshick, P.Dollar, Z. Tu, and K.He, “ Aggregated Residual
Transformations for Deep Neural Networks,” 2017 IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 1–9, 2017.
[9] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi , “Inception-v4,
Inception-ResNet and the Impact of Residual Connections on Learning,”
arXiv Prepr. arXiv1602.07261v2, vol. 131, no. 2,pp. 262–263,2016.
[10] H.Jang, H-Jun. Yang, D-S. Jeong, and H.Lee, “ Object classification
using CNN for video traffic detection system,” 2015 21st Korea-Japan
Joint Workshop on Frontiers of Computer Vision (FCV), 2015.
[11] H.Yanagisawa, T.Yamashita, and H.Watanabe, “ A Study on Object
Detection Method from Manga Images using CNN ,”2018 International
Workshop on Advanced Image Technology (IWAIT), 2018.
[12] R. B. Arif, M. A. B.Siddique, M. M. R. Khan, and M.R. Oishe, “Study
and Observation of the Variations of Accuracies for Handwritten Digits
Recognition with Various Hidden Layers and Epochs using Convolu-
tional Neural Network ,” 4th International Conference on Electrical
Engineering and Information and Communication Technology,2018.
[13] V.Gullapalli, “ A COMPARISON OF SUPERVISED AND REIN-
FORCEMENT LEARNING METHODS ON A REINFORCEMENT
LEARNING TASK, ” Proceedings of the 1991 IEEE International
Symposium on Intelligent Control, pp. 394–399, 1991.
[14] N.Wagaa,and H. Kallel, “Recursive Supervised Artificial Neural Net-
work Algorithm for Data Classification and Regression,” unpublished.
[15] M. A. B. Siddique, M. M. R. Khan, R. B. Arif, and Z. Ashrafi,
“Study and Observation of the Variations of Accuracies for Handwritten
Digits Recognition with Various Hidden Layers and Epochs using
Neural Network Algorithm,” 4th International Conference on Electrical
Engineering and Information and Communication Technology, pp. 118–
123,2018.
[16] V. E .Ismailov, “On the approximation by neural networks with bounded
number of neurons in hidden layers,” Journal of Mathematical Analysis
and Applications, pp. 963–969,2014.
[17] M .Rastegari, V.Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net:
ImageNet Classification Using Binary Convolutional Neural Networks,
” Lecture Notes in Computer Science Springer, pp. 525–542, 2016.
[18] D . Scherer, A.M¨uller, and S. Behnke, “Evaluation of Pooling Opera-
tions in Convolutional Architectures for Object Recognition, ” Interna-
tional Conference on Artificial Neural Networks Springer, pp. 92–101,
2010.
[19] I. B.Dlimi, and H.Kallel, “Robust Neural Control for Robotic Ma-
nipulators,” International Journal of Enhanced Research in Science
Technology, and Engineering, vol.5, no.2, pp. 198–205, 2016.
[20] M . DONG, Y.LI, X.TANG, J. XU, S.BI, and A. Y.CAI, “Variable Con-
volution and Pooling Convolutional Neural Network for Text Sentiment
Classification, ” IEEE Access, 2020.
[21] X .Glorot , and Y.Bengio, “Understanding the difficulty of training
deep feedforward neural networks, ” 13th International Conference on
Artificial Intelligence and Statistics, pp. 249–256, 2010.
[22] K .He, X.Zhang, S.Ren, and J.Sun, “Delving Deep into Rectifiers:
Surpassing Human-Level Performance on ImageNet Classification, ”
2015 IEEE International Conference on Computer Vision, pp. 1026–
1034, 2015.
[23] P .Dangeti , “Statistics for Machine Learning: Techniques for explor-
ing supervised, unsupervised, and reinforcement learning models with
Python and R, ” Packt Publishing, 2017.
[24] T .Takase , S.Oyama, and K.Masahito, “Effective neural network training
with adaptive learning rate based on training loss, ” Neural Networks,
pp. 68–78, 2018.
[25] Y . LeCun , “The MNIST database of handwritten digits,
”http://yann.lecun.com/exdb/mnist/”, 1998.
[26] I. B.Dlimi, and H.Kallel, “Optimal neural control for constrained robotic
manipulators,” 2010 5th IEEE International Conference Intelligent
Systems,pp.302–308, 2010.
Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.

More Related Content

Similar to Vector-Based Back Propagation Algorithm of.pdf

A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...Cemal Ardil
 
Reconfiguration layers of convolutional neural network for fundus patches cla...
Reconfiguration layers of convolutional neural network for fundus patches cla...Reconfiguration layers of convolutional neural network for fundus patches cla...
Reconfiguration layers of convolutional neural network for fundus patches cla...journalBEEI
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMWireilla
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMijfls
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNDat Nguyen
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...ArchiLab 7
 
International journal of applied sciences and innovation vol 2015 - no 1 - ...
International journal of applied sciences and innovation   vol 2015 - no 1 - ...International journal of applied sciences and innovation   vol 2015 - no 1 - ...
International journal of applied sciences and innovation vol 2015 - no 1 - ...sophiabelthome
 
Algorithm Finding Maximum Concurrent Multicommodity Linear Flow with Limited ...
Algorithm Finding Maximum Concurrent Multicommodity Linear Flow with Limited ...Algorithm Finding Maximum Concurrent Multicommodity Linear Flow with Limited ...
Algorithm Finding Maximum Concurrent Multicommodity Linear Flow with Limited ...IJCNCJournal
 
De-convolution on Digital Images
De-convolution on Digital ImagesDe-convolution on Digital Images
De-convolution on Digital ImagesMd. Shohel Rana
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsDevansh16
 
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...Mara Graziani
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithmsArmando Vieira
 
On image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDAOn image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDARaghu Palakodety
 
Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...
Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...
Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...CSCJournals
 
Method for a Simple Encryption of Images Based on the Chaotic Map of Bernoulli
Method for a Simple Encryption of Images Based on the Chaotic Map of BernoulliMethod for a Simple Encryption of Images Based on the Chaotic Map of Bernoulli
Method for a Simple Encryption of Images Based on the Chaotic Map of BernoulliAIRCC Publishing Corporation
 
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLIMETHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLIijcsit
 

Similar to Vector-Based Back Propagation Algorithm of.pdf (20)

A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
 
Neural networks
Neural networksNeural networks
Neural networks
 
Reconfiguration layers of convolutional neural network for fundus patches cla...
Reconfiguration layers of convolutional neural network for fundus patches cla...Reconfiguration layers of convolutional neural network for fundus patches cla...
Reconfiguration layers of convolutional neural network for fundus patches cla...
 
[IJCT-V3I2P24] Authors: Osama H. Abdelwahed; and M. El-Sayed Wahed
[IJCT-V3I2P24] Authors: Osama H. Abdelwahed; and M. El-Sayed Wahed[IJCT-V3I2P24] Authors: Osama H. Abdelwahed; and M. El-Sayed Wahed
[IJCT-V3I2P24] Authors: Osama H. Abdelwahed; and M. El-Sayed Wahed
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHMADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
ADAPTIVE FUZZY KERNEL CLUSTERING ALGORITHM
 
VJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCNVJAI Paper Reading#3-KDD2019-ClusterGCN
VJAI Paper Reading#3-KDD2019-ClusterGCN
 
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...Drobics, m. 2001:  datamining using synergiesbetween self-organising maps and...
Drobics, m. 2001: datamining using synergiesbetween self-organising maps and...
 
International journal of applied sciences and innovation vol 2015 - no 1 - ...
International journal of applied sciences and innovation   vol 2015 - no 1 - ...International journal of applied sciences and innovation   vol 2015 - no 1 - ...
International journal of applied sciences and innovation vol 2015 - no 1 - ...
 
Algorithm Finding Maximum Concurrent Multicommodity Linear Flow with Limited ...
Algorithm Finding Maximum Concurrent Multicommodity Linear Flow with Limited ...Algorithm Finding Maximum Concurrent Multicommodity Linear Flow with Limited ...
Algorithm Finding Maximum Concurrent Multicommodity Linear Flow with Limited ...
 
28 01-2021-05
28 01-2021-0528 01-2021-05
28 01-2021-05
 
De-convolution on Digital Images
De-convolution on Digital ImagesDe-convolution on Digital Images
De-convolution on Digital Images
 
A simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representationsA simple framework for contrastive learning of visual representations
A simple framework for contrastive learning of visual representations
 
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
Improved interpretability for Computer-Aided Assessment of Retinopathy of Pre...
 
Dl1 deep learning_algorithms
Dl1 deep learning_algorithmsDl1 deep learning_algorithms
Dl1 deep learning_algorithms
 
Path loss prediction
Path loss predictionPath loss prediction
Path loss prediction
 
On image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDAOn image intensities, eigenfaces and LDA
On image intensities, eigenfaces and LDA
 
Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...
Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...
Performance Comparison of Image Retrieval Using Fractional Coefficients of Tr...
 
Method for a Simple Encryption of Images Based on the Chaotic Map of Bernoulli
Method for a Simple Encryption of Images Based on the Chaotic Map of BernoulliMethod for a Simple Encryption of Images Based on the Chaotic Map of Bernoulli
Method for a Simple Encryption of Images Based on the Chaotic Map of Bernoulli
 
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLIMETHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
METHOD FOR A SIMPLE ENCRYPTION OF IMAGES BASED ON THE CHAOTIC MAP OF BERNOULLI
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 

Recently uploaded (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 

Vector-Based Back Propagation Algorithm of.pdf

  • 1. Vector-Based Back Propagation Algorithm of Supervised Convolution Neural Network 1st Nesrine Wagaa Department of Physics and Instrumentation LARATSI Laboratory National Institute of Applied Sciences and Technology (INSAT) Centre Urbain Nord, Tunisia nesrinewagah@gmail.com 2nd Hichem Kallel Department of Physics and Instrumentation LARATSI Laboratory INSAT & South Mediterranean University (MedTech) Les Berges du Lac II, Tunisia hichem.kallel@yahoo.com Abstract—The primary goal of this paper is to analyze the impact of the convolution operation on the model performance. In this context, to avoid the mathematical complexities behind the Convolution Neural Network (CNN) model, the classical convolution operation is substituted by a new proposed matrix operation. The model considered is composed of one convolution layer in series with a set of fully connected hidden layers. The network parameters (filters, weights, and biases) are updated using the back propagation gradient descent algorithm. The model performance is improved through the variation of the width and height CNN hyper-parameters. MNIST data are considered here for the classification of handwritten numbers. With a simple modification of the CNN hyper-parameters using the new proposed matrix operation, a CNN performance of 98.83% was achieved. Index Terms—Classical Convolution Operation , Matrix Op- eration, CNN Parameters, CNN Hyper-Parameters, Gradient Descent. I. INTRODUCTION A convolution neural network (CNN) or ConvNet is a concrete case of deep neural networks [1]. It is inspired by the hierarchical of the biological nervous system. ConvNet is an old technique, which was already introduced at the end of the 90s by Fukushima [2]. But the model was not achieving good results until 1998 when LeCun et al designed LeNet-5 [3], [4]. It was the first CNN modern architecture capable to classify the handwritten digits from 0-9. In 2005 [5] with the establishment of the General-purpose computing on graphics processing units, (GPGPU) the performance of CNN has significantly improved especially in image classification. In 2012 [1], the AlexNet model is proposed. It is a very deep network consisting of a huge number of layers. The AlexNet was trained on two GPUs to speed up the learning process. It was able to reduce the error rate to 15.3 %. Since then, more other models are proposed to improve the CNN performance such as VGGNet-16 [6], ResNet [7],etc [8], [9]. These various types of CNN networks can be used to resolve the problems of computer vision as object classification [10], detection [11], and object tracking. The CNN networks are trained based on big data together with strong hardware to realize needed computations [5]. The structure or the architecture of the ConvNet consists of a set of hidden layers. These layers are typically categorized as convolution, and fully connected hidden layers [12]. The CNN input is the images to be processed. For supervised learning, the CNN output is the predicted value or the classes of the input data [12], [13]. ConvNet model trains using the back propagation algorithm [1], [2] to update the CNN parameters which correspond to the filters coefficients applied during the convolution layers to extract the image features, the weights that represent the interconnection between the artificial neurons of the fully connected hidden layers and the bias values that define the activation level of the filters coefficients and artificial neurons. In this paper, the CNN performance is enhanced by the development of a new technique that substitutes the convo- lution operation based on matrix operation. We propose a compact back propagation method to update the parameters of a CNN model consisting of one convolution layer. For the fully connected hidden layers, a simple recursive update rule was previously developed [14]. The developed tools easily update the CNN parameters (filters, weights, and biases) [12], [17]and allow modification of the CNN hyper-parameters: 1) CNN Width. • Fully connected hidden layers numbers [15]. 2) CNN Height. • Size and numbers of convolution filters [12]. • Neurons number in each fully connected hidden layers [15], [16], [19]. 3) Striding and padding convolution operation. [17]. 4) Pooling operation [18], [20]. 5) Activation function [21], [22]. 6) Cost function [15], [23]. 7) Learning rate [24]. In this context, we tested the CNN model performance based on the variation of its hyper-parameters. The MNIST dataset of handwritten digits [25] is used by the CNN model to achieve the learning process. 978-1-7281-6999-6/20/$31.00 ©2020 IEEE Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.
  • 2. II. CLASSICAL CONVOLUTION NEURAL NETWORK (CNN) MODEL A. Convolution Operation The convolution operation between an image X ∈ ℜ(u×u) and a filter F ∈ ℜ(v×v) is defined as follows X~F = C ( (u − v + 2Pad) + 1 s × (u − v + 2Pad) + 1 s ) (1) where, C[a, b] = u ∑ k=0 u ∑ l=0 X[k, l]F[a − k, b − l] (2) Here,~ denotes the convolution operation. The stride s denotes the number of pixels by which F is sliding over X. The padding Pad is the number of zeros applied around X. a, b, k, and l are the rows and columns indices of C and X. B. Convolution Layer The convolution layer shown in Fig.1 is a model consisting of a convolution map and a pooling map. Here, in the convolution operation the padding Pad = 0 and the stride s = 1. Fig. 1. Convolution Layer (ConvL). Convolution Map The convolution map is designed as follows: • An input image X ∈ ℜ(u×u) . • r filters Fj ∈ ℜ(v×v) ; j = 1, 2, · · · , r. • A bias matrix Bfj ∈ ℜ((u−v+1)×(u−v+1)) . • A non-linear activation function f. • An output matrix Cpj ∈ ℜ((u−v+1)×(u−v+1)) . The output convolution map is defined as follows Cj = X ~ Fj + Bfj (3) Cpj = f(Cj) (4) Pooling Map A pooling map is an essential unit of CNN architecture. This step is used to reduce the computational complexity of the network through the minimization of the Cpj dimension. Average pooling and max pooling are examples of the pooling operation [18], [19]. Typically, a kernel Kj of size (2×2) with a stride equals 2 can be applied to calculate the average or the maximum value for each patch of Cpj. The output pooling operation Pj has the size of (u−v+1 2 × u−v+1 2 ). In appendix 1, we present the average and max-pooling operations. C. Fully Connected Layer The convolution filters detect features of the input images, called local features. A fully connected layer added in series with the convolution layer to recognize the input images [12], [15], [26]. As shown in Fig.2, a fully connected layer has an input vector Y 0 corresponding to the concatenation of the r CNN pooling map Pj. The output vector Y t defines image classes. Typically, a series of t fully connected hidden layers are added between the input and output vector to enhance the CNN performance. Fig. 2. Fully Connected layer (FCL). The basic equations of fully connected hidden layers are as shown Hi = Wi Y (i−1) + Bi (5) Y i = f(Hi ) (6) Where, Hi is the weight sum vector, Bi defines the bias vector, Wi is the weight matrix that represents the intercon- nection between hidden layers . Y i denotes the output vector of a select fully connected hidden layer. D. Convolution Neural Networks Model The convolution neural network shown in Fig.3 is composed of one convolution layer convL in series with t fully connected hidden layers L. X is the input image that will be recognized by the CNN model. Y t is the CNN output corresponding to the image recognition. Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.
  • 3. Fig. 3. Convolution Neural Network Architecture. III. VECTOR-BASED CNN MODEL This section aims to replace the classical convolution operation by matrix operation. Definition: We define the vector expression of any matrix M ∈ ℜ(n×n) as follows M =       M1T M2T . . . MnT       , M̄(n2×1) =      M1 M2 . . . Mn      In fact, in this section the output convolutions map Cpj of size ((u − v + 1) × (u − v + 1))will be transformed into a vector ¯ Cpj of dimension((u − v + 1)2 × 1). Based on appendix 2, the output convolved vector ¯ Cpj and the output average pooling vector ¯ Pj are defined as follows C̄j = Xx · ¯ Fj + B̄fj (7) ¯ Cpj = f ¯ (Cj) (8) ¯ Pj =       ( c̄pj1+c̄pj2+c̄pj3+c̄pj4 4 ) ( c̄pj5+c̄pj6+c̄pj7+c̄pj8 4 ) . . . ( c̄pj(w2−3)+c̄pj(w2−2)+c̄pj(w2−1)+c̄pjw2 4 )       (9) Not that Xx is ((u − v + 1)2 × v2 ) input image, ¯ Fj is (v2 × 1) vector filter and B̄fj is ((u − v + 1)2 × 1) bias convolved vector. A. Forward Propagation The CNN model developed in this study consists of one convolution layer in series with three fully connected hidden layers i = 3. Convolution Layer Based on equations (7), (8), and (9) the output convolution map and the average pooling map of a model consists of r vector filters can be written as follows C̄((r×w2)×1) = Xo((r×w2)×(r×v2))·F̄((r×v2)×1)+B̄f((r×w2)×1) (10) ¯ Cp = f ¯ (C) (11) P̄((r× w2 4 )×1) =      P̄1 P̄2 . . . P̄r      (12) where,w = u − v + 1 Xo((r×w2)×(r×v2) =      Xx 0 · · · 0 0 Xx · · · 0 . . . . . . ... . . . 0 0 · · · Xx      F̄(r×v2) =      F1 F2 . . . Fr      , B̄f(r×w2) =      Bf1 Bf2 . . . Bfr      Concatenation Concatenation is the operation that defines the input of the fully connected layer as a function of r pooling operations. Y 0 (m×1) = P̄ (13) where, m = r × w2 4 . Fully Connected Layer The fully connected hidden layers equations are derived from [14]. Layer 1 Y 1 (n×1) = f(W1 (n×m)Y 0 (m×1) + B1 (n×1)) (14) n denotes the number of artificial neurons in the first fully connected hidden layer. Layer 2 Y 2 (o×1) = f(W2 (o×n)Y 1 (n×1) + B2 (o×1)) (15) o denotes the number of artificial neurons in the second fully connected hidden layer. Layer 3 Y 3 (p×1) = f(W3 (p×o)Y 2 (o×1) + B3 (p×1)) (16) p is the dimension of the input labeled data. B. Back Propagation To update the CNN parameters and perform the learning process, a back propagation algorithm is developed to min- imize a cost function E. In our analysis, the mean squared error cost function [12], [15] is used. E = 1 2 (Y 3 − Yd)T (Y 3 − Yd) (17) Equation (18) shows the gradient descent method to update the CNN parameters. ℓnew = ℓold − α( ∂E ∂ℓold )T (18) Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.
  • 4. Here, ℓnew represents the update of bias convolution vector B̄f , filter vector F̄, weight matrix W, and bias fully connected vector B. α is the learning rate, we can choose it as a constant or a variable with a positive value. We note that the update equations of the CNN parameters involve the computations of ( ∂E ∂ℓold ). We develop below the parameters update for each layer. Fully Connected Layer For the fully connected layer, the detailed development of the back propagation is proposed in [14], where ∂E ∂Bi = [(W(i−1)T · ∂E ∂B(i−1) ) ∗ f ′ (Hi )]T (19) ∂E ∂Wi = ( ∂E ∂Bi ) T · Y (i−1)T (20) The operation ∗ is defined in appendix 3. Convolution Layer ∂E(1×1) ∂B̄f((r×w2)×1) |(1×(r×w2)) = ∂E(1×1) ∂C̄((r×w2)×1) ∂C̄((r×w2)×1) ∂B̄f((r×w2)×1) (21) ∂E(1×1) ∂F̄((r×v2)×1) |(1×(r×v2)) = ∂E(1×1) ∂C̄((r×w2)×1) ∂C̄((r×w2)×1) ∂F̄((r×v2)×1) (22) By substituting (10) into (21) and (22), we obtain ∂C̄((r×w2)×1) ∂B̄f((r×w2)×1) |((r×w2)×(r×w2)) = I (23) ∂C̄((r×w2)×1) ∂F̄((r×v2)×1) |((r×w2)×(r×v2)) = Xo (24) Here, I is the identity matrix Computing of ∂E ∂C̄ ∂E(1×1) ∂C̄((r×w2)×1) |(1×(r×w2)) = ∂E(1×1) ∂C̄p((r×w2)×1) ∂C̄p((r×w2)×1) ∂C̄((r×w2)×1) = ∂E ∂C̄p |(1×(r×w2)) ∗ f′ ¯ (Cp)T (1×(r×w2)) (25) Definition: let’s define the operation Inc of any vector U as follows: U(n×1) =     u1 u2 . . . un     Inc(U) =               u1 4 u1 4 u1 4 u1 4 . . . un 4 un 4 un 4 un 4               (4n×1) (26) In this section, the Inc operation is used to increase the size of the average pooling vector map, where ∂E ∂C̄p = Inc( ∂E ∂P̄ ) (27) Since Y 0 =P̄ ∂E ∂P̄ = ∂E ∂B1 W1 (28) By substituting (28), (27),(25),(24), and (23) into (22) and (21) we obtain ∂E ∂B̄f = Inc( ∂E ∂B1 W1 ) ∗ f′ ¯ (Cp)T (29) ∂E ∂F̄ = ∂E ∂B̄f · Xo (30) IV. SIMULATIONS RESULTS AND DISCUSSION In this section, MNIST handwritten digits data are used to test the CNN performance using the proposed matrix operation. This database consists of a training set of 60,000 images and a testing set of 10,000 images. The images are a gray scale of dimension(28 × 28). The (10 × 1) output vector classifies the digits from 0 to 9. To enhance the CNN performance the following hyper-parameters are varied: • CNN Width. • CNN Height. The performance of the CNN model corresponds to the ratio between the total number of correct classifications and the number of test images. TABLE I CNN PERFORMANCE ACCORDING THE VARIATION OF WIDTH AND HEIGHT HYPER-PARAMETERS N° of training images N° of filters Size of filters Performance 5,000 − − 0.9325 5 (32 × 1) 0.9414 (92 × 1) 0.9508 (132 × 1) 0.9478 10 (92 × 1) 0.9575 20 (92 × 1) 0.9582 10,000 − − 0.9481 5 (32 × 1) 0.9570 (92 × 1) 0.9589 (132 × 1) 0.9579 10 (92 × 1) 0.9623 20 (92 × 1) 0.9677 30,000 − − 0.9773 5 (32 × 1) 0.9785 (92 × 1) 0.9810 (132 × 1) 0.9798 10 (92 × 1) 0.9835 20 (92 × 1) 0.9871 60,000 − − 0.9790 5 (32 × 1) 0.9831 (92 × 1) 0.9867 (132 × 1) 0.9859 10 (92 × 1) 0.9874 20 (92 × 1) 0.9883 The simulated CNN model consists of : • One convolution layer. The convolve activation function is the Relu. The used pooling operation is the average. • A fully connected layer. It comprises 3 hidden layers. Each hidden layer forms of 200 artificial neurons. The activation function is the same used on the convolution layer. The learning rate equals 0.3. Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.
  • 5. As shown in Table 1, we tested the CNN performance through the variation of • Number of convolving filters vector. • Size of each filter. • Number of training images. Here (−) denotes a model consisting just with a fully connected layer. The peak of CNN performance is 0.9883. We obtained it by increasing the size and the number of convolved filters vector and the number of training images. V. CONCLUSION In this paper, a new matrix operation that substitutes the classical convolution operation is developed. MNIST data of handwritten digits is used to test the influence of the CNN hyper-parameters on the model performance. The peak of performance achieved is 0.9883. It is obtained using a CNN model composed of one convolution layer and three fully connected hidden layers. The results deduced from the simulation proposed do not represent the optimal CNN hyper-parameters configuration. Further increase in the number of convolution layers and the number of training dataset can be enhanced the CNN performance. APPENDIX Appendix 1: Average and max-pooling operations Cpj =     cp11 cp12 · · · cp1(u−v+1) cp21 cp22 · · · cp2(u−v+1) . . . . . . ... . . . cp(u−v+1)1 cp(u−v+1)2 · · · cp(u−v+1)(u−v+1)     (31) The pooling map defined as follows PjAve =       P11 P12 · · · P1 (u−v+1) 2 P21 P22 · · · P2 (u−v+1) 2 . . . . . . ... . . . P(u−v+1) 2 1 P(u−v+1) 2 2 · · · P(u−v+1) 2 (u−v+1) 2       (32) For the Average Pooling Operation, P11 = cp11+cp12+cp21+cp22 4 P1 (u−v+1) 2 = cp1(u−v)+cp1(u−v+1)+cp2(u−v)+cp2(u−v+1) 4 P(u−v+1) 2 1 = cp(u−v)1+cp(u−v)2+cp(u−v+1)1+cp(u−v+1)2 4 P(u−v+1) 2 (u−v+1) 2 = cp(u−v)(u−v)+cp(u−v)(u−v+1)+cp(u−v+1)(u−v)+cp(u−v+1)(u−v+1) 4 For the max Pooling Operation, P11 = max(cp11, cp12, cp21, cp22) P1 (u−v+1) 2 = max(cp1(u−v), cp1(u−v+1), cp2(u−v), cp2(u−v+1)) P(u−v+1) 2 1 = max(cp(u−v)1, cp(u−v)2, cp(u−v+1)1, cp(u−v+1)2) P(u−v+1) 2 (u−v+1) 2 = max(cp(u−v)(u−v), cp(u−v)(u−v+1), cp(u−v+1)(u−v), cp(u−v+1)(u−v+1)) Appendix 2: Matrix Operation For a CNN model consisting of an input image X ∈ ℜ(u×u) convolved with a filter F ∈ ℜ(v×v) X =       X1T X2T . . . XuT       , F =       F1T F2T . . . FvT       where, XiT = [ xi 1 xi 2 · · · xi u ] FiT = [ fi 1 fi 2 · · · fi v ] We define XiT j− →k as follows XiT j− →k = [ xi j xi j+1 · · · xi k ] The proposed matrix operation that substituted the classical con- volution operation is: X ~ F = Xx|((u−v+1)2×v2) · F̄|(v2×1) =                                                 XuT 1− →v X (u−1)T 1− →v · · · X1T 1− →v XuT 2− →(v+1) X (u−1)T 2− →(v+1) · · · X1T 2− →(v+1) . . . . . . ... . . . XuT (u−v+1)− →1 X (u−1)T (u−v+1)− →1 · · · X1T (u−v+1)− →1                 X (u−1)T 1− →v X (u−2)T 1− →v · · · X2T 1− →v X (u−1)T 2− →(v+1) X (u−2)T 2− →(v+1) · · · X2T 2− →(v+1) . . . . . . ... . . . X (u−1)T (u−v+1)− →1 X (u−2)T (u−v+1)− →1 · · · X2T (u−v+1)− →1         . . .         XvT 1− →v X (u−v)T 1− →v · · · X (u−v+1)T 1− →v XvT 2− →(v+1) X (u−v)T 2− →(v+1) · · · X (u−v+1)T 2− →(v+1) . . . . . . ... . . . XvT (u−v+1)− →1 X (u−v)T (u−v+1)− →1 · · · X (u−v+1)T (u−v+1)− →1                                                      F1 F2 . . . Fv      (33) Appendix 3: The definition of (∗) operation Let’s define the operation (∗) for any vector U and V : U(n×1) =     u1 u2 . . . un     and V(n×1) =     v1 v2 . . . vn     U ∗ V =     u1v1 u2v2 . . . unvn     (34) REFERENCES [1] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “ImageNet Classifica- tion with Deep ConvolutionalNeural Networks,” Proceedings of the Advances in neural information Processing Systems, pp. 1097–1105, 2012. [2] K. Fukushima, “ Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,” Biol. Cybernetics 36 by Springe, pp. 193–202,1980. Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.
  • 6. [3] Y. LeCun, et al., “Handwritten digit recognition with a backpropagation network,” in Advances in neural information processing systems, pp. 396–404, 1990. [4] Y. LeCun, L.Bottou, Y.Bengio, and P.Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, pp. 2278–2324, 1998. [5] D. Steinkraus, I. Buck, and P.Y. Simard, “Using GPUs for machine learning algorithms,” in Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference, pp. 1115–1120,2005. [6] K.Simonyan, and A.Zisserman, “ Very deep convolutional networks for large-scale image recognition,” Conference paper at ICLR, pp. 1–14, 2015. [7] K. He, X. Zhang, S. Ren,and J. Sun, “ Deep Residual Learning for Image Recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition , pp.1–9,2016. [8] S.Xie, R.Girshick, P.Dollar, Z. Tu, and K.He, “ Aggregated Residual Transformations for Deep Neural Networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–9, 2017. [9] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi , “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” arXiv Prepr. arXiv1602.07261v2, vol. 131, no. 2,pp. 262–263,2016. [10] H.Jang, H-Jun. Yang, D-S. Jeong, and H.Lee, “ Object classification using CNN for video traffic detection system,” 2015 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV), 2015. [11] H.Yanagisawa, T.Yamashita, and H.Watanabe, “ A Study on Object Detection Method from Manga Images using CNN ,”2018 International Workshop on Advanced Image Technology (IWAIT), 2018. [12] R. B. Arif, M. A. B.Siddique, M. M. R. Khan, and M.R. Oishe, “Study and Observation of the Variations of Accuracies for Handwritten Digits Recognition with Various Hidden Layers and Epochs using Convolu- tional Neural Network ,” 4th International Conference on Electrical Engineering and Information and Communication Technology,2018. [13] V.Gullapalli, “ A COMPARISON OF SUPERVISED AND REIN- FORCEMENT LEARNING METHODS ON A REINFORCEMENT LEARNING TASK, ” Proceedings of the 1991 IEEE International Symposium on Intelligent Control, pp. 394–399, 1991. [14] N.Wagaa,and H. Kallel, “Recursive Supervised Artificial Neural Net- work Algorithm for Data Classification and Regression,” unpublished. [15] M. A. B. Siddique, M. M. R. Khan, R. B. Arif, and Z. Ashrafi, “Study and Observation of the Variations of Accuracies for Handwritten Digits Recognition with Various Hidden Layers and Epochs using Neural Network Algorithm,” 4th International Conference on Electrical Engineering and Information and Communication Technology, pp. 118– 123,2018. [16] V. E .Ismailov, “On the approximation by neural networks with bounded number of neurons in hidden layers,” Journal of Mathematical Analysis and Applications, pp. 963–969,2014. [17] M .Rastegari, V.Ordonez, J. Redmon, and A. Farhadi, “XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks, ” Lecture Notes in Computer Science Springer, pp. 525–542, 2016. [18] D . Scherer, A.M¨uller, and S. Behnke, “Evaluation of Pooling Opera- tions in Convolutional Architectures for Object Recognition, ” Interna- tional Conference on Artificial Neural Networks Springer, pp. 92–101, 2010. [19] I. B.Dlimi, and H.Kallel, “Robust Neural Control for Robotic Ma- nipulators,” International Journal of Enhanced Research in Science Technology, and Engineering, vol.5, no.2, pp. 198–205, 2016. [20] M . DONG, Y.LI, X.TANG, J. XU, S.BI, and A. Y.CAI, “Variable Con- volution and Pooling Convolutional Neural Network for Text Sentiment Classification, ” IEEE Access, 2020. [21] X .Glorot , and Y.Bengio, “Understanding the difficulty of training deep feedforward neural networks, ” 13th International Conference on Artificial Intelligence and Statistics, pp. 249–256, 2010. [22] K .He, X.Zhang, S.Ren, and J.Sun, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, ” 2015 IEEE International Conference on Computer Vision, pp. 1026– 1034, 2015. [23] P .Dangeti , “Statistics for Machine Learning: Techniques for explor- ing supervised, unsupervised, and reinforcement learning models with Python and R, ” Packt Publishing, 2017. [24] T .Takase , S.Oyama, and K.Masahito, “Effective neural network training with adaptive learning rate based on training loss, ” Neural Networks, pp. 68–78, 2018. [25] Y . LeCun , “The MNIST database of handwritten digits, ”http://yann.lecun.com/exdb/mnist/”, 1998. [26] I. B.Dlimi, and H.Kallel, “Optimal neural control for constrained robotic manipulators,” 2010 5th IEEE International Conference Intelligent Systems,pp.302–308, 2010. Authorized licensed use limited to: Tsinghua University. Downloaded on December 19,2020 at 08:24:47 UTC from IEEE Xplore. Restrictions apply.