B.tech_project_ppt.pptx

INDIAN INSTITUTE OF ENGINEERING SCIENCE
AND TECHNOLOGY, SHIBPUR
Project Name :- “Hardware Implementation of CNN for Sign Language
Recognition”
Supervisor’s Name :- Dr. Ayan Banerjee (Associate Professor, Electronics and Telecommunication Engineering)
Team Members :- Supratik Mondal(510719009), Iraban Mukherjee(510719011), Arnab Dutta(510719002), Sarnava
Ghosh (510719062), Rohit Naskar(510719017)

PERCEPTRON / NEURONS
SINGLE LAYER NETWORK
MULTI LAYER NETWORK
NEURAL NETWORK CONVOLUTIONAL NEURAL
NETWORK(CNN)
Within Deep Learning, a Convolutional
Neural Network or CNN is a type of
artificial neural network, which is widely
used for image/object recognition and
classification. Deep Learning thus recognizes
objects in an image by using a CNN.

CNN BASIC ARCHITECTURE:
MOTIVATION OF THE
PROJECT:

APPLICATIONS OF CONVOLUTION NEURAL NETWORK

ALGORITHM USED :
Here the CNN (Convolution Neural Network) algorithm has been used for
sign language detection.
The functional blocks for CNN are :
▪ Convolution block
▪ Activation function (Relu) block
▪ Max pooling block
▪ Fully Connected Network (FCN) block
▪ Softmaxing block

Software Implementation :
A Python programme was used to test the precision with which a gesture (or image) may be properly
detected. Convolution arithmetic and Max Pooling routines are used to do this.
Convolution & Convolutional Layer :
In deep learning, a convolutional neural network (CNN or ConvNet) is a class of deep neural networks, that
are typically used to recognize patterns present in images but they are also used for spatial data analysis,
computer vision, natural language processing, signal processing, and various other purposes The architecture
of a Convolutional Network resembles the connectivity pattern of neurons in the Human Brain and was
inspired by the organization of the Visual Cortex. This specific type of Artificial Neural Network gets its
name from one of the most important operations in the network: convolution.
The first layer of a Convolutional Neural Network is always a Convolutional Layer. Convolutional layers
apply a convolution operation to the input, passing the result to the next layer. A convolution converts all the
pixels in its receptive field into a single value. For example, if you would apply a convolution to an image,
you will be decreasing the image size as well as bringing all the information in the field together into a single
pixel. The final output of the convolutional layer is a vector. Based on the type of problem we need to solve
and on the kind of features we are looking to learn, we can use different kinds of convolutions.

Pooling operations :
The output shape of a convolutional layer is influenced by the shape of its input as well as the
kernel used.In CNNs, pooling procedures are a crucial component in addition to discrete
convolutions themselves. By applying a function to summarize the subregions, such as taking
the average or the maximum value, pooling processes reduce the size of the feature maps. By
swiping a window across the input and feeding its content to a pooling function, pooling
operates. Pooling functions somewhat similarly to a discrete convolution, but uses a different
function in place of the linear combination represented by the kernel. Pooling comes in a
variety of forms, including minimum, maximum, average, and more.
Here, Max Pool is used.
Dropout :
A typical characteristic of CNN is Dropout Layer. The Dropout is a mask that nullifies the
contribution of some neurons towards the next layer and leaves unmodified all others.
Dropout refers to data, or noise, that's intentionally dropped from a neural network to
improve processing and time to results.
Filters :
The depth of a filter in a CNN must match the depth of input image. The number of colour channels
in a filter must remain same as the input image. Filter for each layer are randomly initialized based
on either Gaussian or Normal distribution.

Defining the CNN :
Saving the best Model :

Plot of the parameters of the best model :
Training Loss Correction :
SOFTWARE RESULTS:

Dropout :
Data Augmentation :
evaluate_model(best_model, X_test, y_test, label_binarizer)
225/225 [==============================] - 7s 28ms/step - loss: 9.8487 -
accuracy: 0.9689
Loss: 9.849 Accuracy: 0.969
Accuracy 96%

SOFTWARE to HARDWARE TRANSLATION:
• Hardware model gives us more flexibility than software model.
• The proper use of functionalities provides more optimization in realizing
the algorithms over software model.
• Hardware modelling is more efficient in terms of power.

HARDWARE IMPLEMENTATION :
▪ NUMBER SYSTEM FORMAT
▪ LAYER SPECIFICATIONS
▪ CONVOLUTION
▪ ReLU OPERATION
▪ FIFO (FAST IN FAST OUT)
▪ MAX POOLING

❖ NUMBER SYSTEM FORMAT:
In our work we have incorporated 17-bits fixed point number
system format for defining weights and pixel values.
S
Integer Bits
Sign Bit
Fractional Bits

❖ LAYER SPECIFICATIONS:
❖ Here we have incorporated five CNN layers with mentioned specifications (as
shown in the diagram) along with single layers FCN, with terminated by
softmaxing layer for probabilistic output

❖ INSIDE THE LAYERS:
Functional
blocks in each
layer :
▪ Weight buffer
▪ Convolution block
▪ Intermediate FIFO
▪ Relu Array block
▪ Max Pooling block
▪ Terminal FIFO

❖ CONVOLUTION :
▪ Convolution with single channel kernels ▪ Convolution with multiple channel kernels
❖ Let the dimension of input feature map is
(mxm) and that of kernel is (nxn).
❖ So, the dimension of the convolved output
is 🡺 (m – (n-1))x(m – (n-1)) [where,
stride = 1 ]

❖ PROPOSED CONVOLUTION ALGORITHM:
(By taking example of (4x4) image with three (3x3) kernels)
a1 a2 a3 a4
a5 a6 a7 a8
a9 a10 a11 a12
a13 a14 a15 a16
K1 K2 K3
K4 K5 K6
K7 K8 K9
J1 J2 J3
J4 J5 J6
J7 J8 J9
L1 L2 L3
L4 L5 L6
L7 L8 L9
This is a pipeline technique (with
parallelism) of convolution, where kernels
will take the position in the image matrix
which is empty from the SOP process at any
particular clock stamp.
∗

❖ PROPOSED CONVOLUTION ARCHITECTURE:
❖ INSIDE EACH CONVOLUTION BLOCK:

❖ INSIDE EACH STAGE BLOCK:
Functional blocks
in each stage :
▪ MUX Array
▪ Stride Generator (SG)
▪ Convolver
▪ Multi Input Adder
▪ Number of MUX
Arrays = Number of
SGs = Number of
convolvers =
k_number
▪ Adder inputs =
k_number

❖ FIFO (Fast In Fast Out) BLOCK:
▪ Intermediate FIFO Block ▪ Terminal FIFO Block

❖ RELU BLOCK:
Relu block for single channel
Relu Array block (consisting “k_number”
number of relu blocks) for “k_number”
number of channels

❖ MAX POOLING BLOCK:
Max-Pooling block for single
channel
Max-Pooling Array block (consisting
“k_number” number of max-pooling
blocks) for “k_number” number of
channels

❖ MAX DETECTOR BLOCK:
Functional blocks:
▪ Loadable Left Shift Registers
▪ Magnitude Comparator
▪ 2-to-1 Mux
▪ Data Register

The figure (fig. 1) displays the implementation
of the proposed convolution architecture (3-D
convolution with single kernel) for image size
🡺 4x4x3 and kernel size 🡺 3x3x3 (means, N=4,
k=3, n=3).
▪ Number of MUX-ARRAY = 3
▪ Number of CONVOLVER = 3
▪ Number of IMAGE BUFFER & C.U = 1
▪ Number of final FA = 1
▪ Dimension of the final FA = 3-input
The image and kernel matrices whose
convolution architecture has been displayed is
shown below:
0 0 1 2
3 4 1 1
2 3 4 8
10 12 14 18
0 1 1 0
0 1 1 0
0 1 1 0
0 1 1 0
1 1 2 2
3 3 4 4
5 6 7 8
9 9 10 20
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
1 1 1
56 67
115 142
OUTPUTS, WAVEFORM :

❖OUTPUT NEURONS:
Single Output Neuron Block
Output Neuron Block Array
▪ Data Registers
▪ Multiplier array
▪ Compressor array
▪ There will be “k_number”
number of output neurons in an
output neuron block array

❖SOFT-MAXING:
Here, the input – output relation in
the soft-max layer is:

Architecture of single soft-maxing
Architecture of Soft-maxing Array
▪ Exponential Block
▪ Adder Block
▪ Divider
▪ It consist of “k_number” number of
soft-maxing blocks

❖FPGA-to-DISPLAY MONITOR INTERFACING:
VGA controller system VGA-to-HDMI controller system
▪ Frame counter
▪ Stream Control
▪ Decoding Circuit
▪ VGA controller
▪ CMT
▪ VGA to HDMI

❖RESULTS AND DISCUSSIONS:
▪ HARDWARE RESULTS:
Sl. Number Tested Samples Accuracy (%)
1. 10 98.7
2. 50 90
3. 100 88.9
4. 150 88.5
5. 500 87
Vivado Wrokspace Timing Diagram
Viavdo generated accuracy Accuracy Table

FPGA BOARD DETAILS:
Field Programmable Gate Arrays (FPGAs) are semiconductor devices that are based
around a matrix of configurable logic blocks (CLBs) connected via programmable
interconnects. FPGAs can be reprogrammed to desired application or functionality
requirements after manufacturing.

Xilinx Zynq Ultrascale+
Zynq® UltraScale+™ MPSoC devices provide 64-bit
processor scalability while combining real-time
control with soft and hard engines for graphics, video,
waveform, and packet processing. Built on a common
real-time processor and programmable logic equipped
platform, three distinct variants include dual
application processor (CG) devices, quad application
processor and GPU (EG) devices, and video codec
(EV) devices, creating unlimited possibilities for
applications such as 5G Wireless, Next-generation
ADAS, and Industrial Internet-of-Things.

Camera Integration:
Here we are using see3cam provided with the
Xilinx Zynq Ultrascale+ ZCU104 evolution board the
camera is connected with the FPGA with USB port.
The camera module uses AXI protocol to communicate
With the FPGA board.

AXI Protocol:
 The AXI (Advanced extensible Interface) protocol is a popular one in the integration of
System-on-Chip (SoC) and digital design.
 The AXI protocol establishes a set of guidelines and norms for the interaction and
communication between various SoC or FPGA system components. It offers a standardised
interface for tying together IP (Intellectual Property) cores such CPUs, memory controllers,
DMA (Direct Memory Access) controllers, and unique logic blocks.

References:
1. “Efficient Fast Convolution Architectures for Convolutional Neural Network”, Weihong Xu, Zhongfeng Wang, Xiaohu You, and
Chuan Zhang
2. “Efficient Processing of Deep Neural Networks: A Tutorial and Survey”, Vivienne Sze; Yu-Hsin Chen; Tien-Ju Yang; Joel S.
Emer
3. “Use Vivado to build an Embedded System”, Lab Workbook
4. “Writing Basic Software Application”, Lab Workbook
5. “Zynq UltraScale+ Device Technical Reference Manual”, UG1085 (v2.2) December 4, 2020
6. https://www.avnet.com/opasdata/d120001/medias/docus/204/FMC_MULTICAM4_REVISION_Tutorial_2018_2_03.pdf
7. https://thedatabus.io/convolver
8. https://www.xilinx.com/products/boards-and-kits/zcu104.html
9. https://www.e-consystems.com/blog/camera/technology/mipi-camera-vs-usb-camera-a-detailed-comparison/
10. https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/cds2.12074
CONCLUSION:
The sign language implementation design was implemented. This is used to bridge the gap between the people
with hearing and speaking disabilities with normal people. The project was run completely and successfully in
software level as a Python code using Convolution, Max Pooling and the results are obtained as shown above.
The project was also synthesized in Vivado and ran successfully. The camera and the ZCU104 board were
integrated and clear snaps were taken.

B.tech_project_ppt.pptx

Recommended

Recommended

More Related Content

Similar to B.tech_project_ppt.pptx

Similar to B.tech_project_ppt.pptx (20)

Recently uploaded

Recently uploaded (20)

B.tech_project_ppt.pptx