Research Methodolgy & Intellectual Property Rights Series 2
ppt - of a project will help you on your college projects
1. Major Project Presentation
on
REAL TIME OBJECT RECOGNITION FOR VISUALLY
IMPAIRED PEOPLE
Mahatma Gandhi Mission’s College Of
Engineering & Technology
A-09, Sector 62, Noida, Uttar Pradesh 201301
Submitted by:
Vikas Kumar Pandey Akshay kumar Hariom
Roll No.:1900950310011 Roll no:1900950310002 Roll no: 190950310006
2. Content
Introduction
Problems faced by blind peoples
Literature review
Objective
Block diagram
Yolo algorithm
Block diagram of yolo algorithm
Object detection
Database used
Methodology
Flow chart
Hardware used
Advantages of yolo algorithm
Survey
Advantages
Conclusion
Future work
Reference
3. Introduction
The World Health Organization (WHO) had a survey over around 7889 million people.
The statistics showed that among the population under consideration while survey, 253
millions were visually impaired.[4]
There are many visually impaired people facing many problems in our society.
The device developed can detect the objects in the user's surroundings.
This is a model has been proposed which makes the visually impaired people detect
objects in his surroundings. The output of the system is in audio form that can be easily
understandable for a blind user.
4. Problem faced by blind people
Visually Impaired People confront many problems in recognizing the
objects.
Blind people doesn’t able to recognize the objects next to them.
This is developed to detect the objects in the user's surroundings.
It will also solve the problem of keeping a walking stick.
5. Literature review
1. “The authors in(Seema et al ) suggested using a smart system that guides a
blind person in 2016[1]”
• The system detects the obstacles that could not be detected by his/her cane.
However, the proposed system was designed to protect the blind from the area
near to his/her head.
Problem statement - The buzzer and vibrator were used and employed as output
modes to a user. This is useful for obstacles detection only at a head level without
recognizing the type of obstacles.
6. Contd.
2. “A modification of several systems used in visual recognition was proposed
in 2014.[2]”
• The authors used fast-feature pyramids and provided findings on general object
detection systems. The results showed that the proposed scheme can be strictly
used for wide-spectrum images.
Problem statement - It does not succeed for narrow-spectrum images. Hence,
their work cannot be used as efficient general objects detection.
7. Contd.
3. “In (Nazli Mohajeri et al, 2011) the authors suggested a two-camera
system to capture photos”.[3]
• However, the proposed system was only tested under three conditions and for
three objects. Specific obstacles that have distances from cameras of about 70 cm
were detected.
Problem statement - The results showed some range of error. Blind helping
systems need to cover more cases with efficient and satisfied results.
8. Objective
This project aims to relieve some of their problems using assistive technology.
Simply it is the technique of real time stationary object recognition.
To make visually impaired people self independent.
To provide a device for detection of objects.
Our main aim is, an object recognition function with device should be able to detect
certain items from the camera and return an audio output to announce what it is. In
order to recognize object, machine learning has to be involved.
10. Object detection
Object detection is a phenomenon in computer vision that
involves the detection of various objects in digital images or
videos.
Some of the objects detected include people, cars, chairs, stones,
buildings, and animals.
It identify the object in a specific image.
Establish the exact location of the object within the image.
11. Sr no ALGORITHM ADVANTAGE DISADVANTAGE
1 RESNET • solve degradation problem by shortcuts
• skip connections.
• RESNETs are that for a deeper network
the detection of errors becomes difficult.
2 R-CNN • very accurate at image recognition and
classification
• They fail to encode the position and
orientation of objects.
3 FAST R-CNN • save time compared to traditional
algorithms like Selective Search.
• It still uses the Selective Search
Algorithm which is slow and a time-
consuming process.
4 SSD • SSD makes more predictions.
• It has better coverage on location, scale,
and aspect ratios.
• Shallow layers in a neural network may
not generate enough high level features
to do prediction for small objects.
5 YOLO
• Allows real time object detection.
• System trains in single go.
• More efficient and fast.
• Struggles to detect close objects because
each grid can propose only 2 bounding
boxes.
EXISTING ALGORITHM
12. The YOLOv4 performance was evaluated based on previous YOLO versions (YOLOv3
and YOLOv2)as baselines.
The new YOLOv4 shows the best speed-to-accuracy balance compared to state-of-the-art
object detectors.
In general, YOLOv4 surpasses all previous object detectors in terms of both speed and
accuracy, ranging from 5 FPS to as much as 160 FPS.
The YOLO v4 algorithm achieves the highest accuracy among all other real-time object
detection models – while achieving 30 FPS or higher using a GPU.
ALGORITHM SELECTION
13. YOLO algorithm
YOLO is an abbreviation for the term 'You Only Look Once’.
Created by Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi.
YOLO algorithm detects and recognizes various objects in the picture.
Object detection in YOLO is done as a regression problem and provides the class
probabilities of the detected images
Prediction in the entire image is done in a single algorithmic run.
YOLO algorithm consists of various variants including tiny YOLO and YOLOv1,
v2, v3, v4.
Popular because of its speed and accuracy.
14. Yolo evolution
Algorithm Description
The original YOLO - YOLO was the first object detection network to combine the problem of drawing
bounding boxes and identifying class labels in one end-to-end differentiable
network.
YOLOv2 - YOLOv2 made a number of iterative improvements on top of YOLO including
BatchNorm, higher resolution, and anchor boxes.
YOLOv3 - YOLOv3 built upon previous models by adding an objects score to bounding box
prediction, added connections to the backbone network layers and made predictions at
three separate levels of granularity to improve performance on smaller objects.
YOLOv4 - It is a one-stage detector with several components in it. It detects the object in real
time. The speed and accuracy is faster than other algorithm.
16. CSP DARKNET53
CSPDarknet53 is a convolutional neural network and
backbone for object detection.
It employs a strategy to partition the feature map of the
Image into two parts and then merges them through a
cross-stage hierarchy.
The use of a split and merge strategy allows for more
gradient flow through the network.
17. SPATIAL PYRAMID POOLING
A CNN consists of some Convolutional
(Conv) layers followed by some Fully-
Connected (FC) layers. Conv layers don’t
require fixed-size input .
The solution to this problem lies in the
Spatial Pyramid Pooling (SPP) layer. It is
placed between the last Conv layer and the
first FC layer and removes the fixed-size
constraint of the network.
The goal of the SPP layer is to pool the
variable-size features that come from the
Conv layer and generate fixed-length outputs
that will then be fed to the first FC layer of
the network.
18. BAG OF FREEBIES AND SPACIALS
‘Bag of Freebies’ (BoF) is a general framework of training strategies for improving
the overall accuracy of an object detection model.
The set of techniques or methods that change the training strategy or training cost
for improvement of model accuracy is termed as Bag of Freebies.
Bag of Specials (BoS) can be considered as an add-on for any object detectors
present right now to make them more accurate.
19. METHODOLOGY
The steps of a currency recognition system based on image processing are as follows
–
Image capturing
Image Acquisition
Object detection
YOLO algorithm
Prediction
21. Capturing image
Capturing of image is done by camera
module for that purpose the objects
captured in real time and stationary also.
22. Image Acquisition
The image is captured by digital camera
as RGB image and is converted to Gray
scale version by intensity equation 1.
I = (R+G+B)/3
23. RESIDUAL BLOCKS
The image is divided into various grids. Each grid has
a dimension of S x S.
It uses the dimensions of 3 x 3, 13 x 13 and 19 x 19.
There are many grid cells of equal dimension. Every
grid cell will detect objects that appear within it.
24. LOCALIZATION
The term 'localization' refers to where the
object in the image is present. In YOLO object
detection we classify image with localization
i.e., a supervised learning algorithm is trained
to not only predict class but also the bounding
box around the object in image.
Classification + localization = object detection
25. BOUNDING BOXES
A bounding box is an outline that highlights an
object in an image.
Every bounding box in the image consists of
the following attributes:
• Bounding box center (bx, by)
• Height (bh)
• Width (bw)
• Class (for example, person, car, traffic light,
etc.). This is represented by the letter c.
(bw)
(bh) .
(bx, by)
26. BOUNDING BOXES - CONT...
Each 13x13 cell detects objects in the input image
via its specified number of bounding boxes 13x13.
In YOLO v4, each cell has 3 bounding boxes. So
the total number of bounding boxes using 13x13
feature map would be.
(13x13)x3 = 507 bounding boxes.
The remaining bounding boxes are discarded as
they don't localize the objects in the picture.
27. TARGET LABEL Y
Target label y for this supervised learning task is
explained as:
Y is a vector containing Pc, Bx, By, Bh, Bw, CI,..., Ch
Pc is the probability of presence of particular class in
the grid cell. Pc >=0 and <=1. (i.e., Pc=0) means that
object is not found. Pc>I means 100% probability that
object is present.
(Bx, By) defines the mid-point of object and (Bh, Bw)
defines the height and width of bounding box.
Also, if Pc > 0 then there will be n number of C which
represents the classes of objects present in the image.
28. Intersection over union (IOU)
(Intersection over Union) is a term used to
describe the extent of overlap of two boxes. The
greater the region of overlap, the greater the
IOU.
IOU is mainly used in applications related to
object detection, where we train a model to
output a box that fits perfectly around an object.
IOU is also used in non max suppression
algorithm.
IOU=
𝐼𝑁𝑇𝐸𝑅𝑆𝐸𝐶𝑇𝐼𝑂𝑁(𝐴𝑅𝐸𝐴 𝑂𝐹 𝑂𝑉𝐸𝑅𝐿𝐴𝑃)
𝑈𝑁𝐼𝑂𝑁
29. NMS- NON MAX SUPRESSION
To select the best bounding box, from the multiple predicted bounding
boxes, an algorithm called Non-Max Suppression is used to
"suppress" the less likely bounding boxes and keep only the best one.
30. Prediction
YOLO v4 make detections at 3 different points i.e., layer 82, 94, 106. Network
down-samples the input image by the network strides 32, 16 and 8 at those points
respectively.
After reaching a stride of 32, the network produces a 13x13 feature map for an
input image of size 416x416.
Another detection layer when the stride is 16 we obtain a 26x26 output feature
map.
And 52x52 feature map at the detection layer when the stride is 8. Thus, the total
number of bounding boxes by YOLO V4 when the input image size is 416x416.
((13x13)+(26x26)+(52x52))x3 = 10647 bounding boxes32 is per image
31. Database used
Coco dataset – COCO dataset, meaning “Common Objects In Context”.
It is a large-scale image dataset containing 328,000 images of everyday objects
and humans.
The dataset contains annotations of machine learning models to recognize, label,
and describe objects.
COCO provides the following types of annotations:
• Object detection
• Captioning
• Key points
• Dense pose
32. Contd:
Object detection consists of various approaches such as fast R-CNN,
Retina-Net, and Sliding Window detection but none of the
aforementioned methods can detect object in one single run. So there
comes another efficient and faster algorithm called YOLO algorithm.
35. Raspberry pi 3B+
The Raspberry Pi 3 Model B+ is the latest product in the Raspberry Pi 3
range, boasting a 64-bit quad core processor running at 1.4GHz, dual-band
2.4GHz. and 5GHz wireless LAN, Bluetooth 4.2/BLE
36. Camera module v2
The Raspberry Pi Camera v2 is a high quality 8 megapixel Sony
IMX219 image sensor custom designed add-on board for Raspberry Pi,
featuring a fixed focus lens.
37. Advantage of yolo algorithm
YOLO algorithm is important because of the following reasons:
Speed : This algorithm improves the speed of detection because it can predict
objects in real-time.
High accuracy: YOLO is a predictive technique that provides accurate results. It
use Convolutional implementation that means that if you have 3*3 grid (i.e.,
divide image into 9 grid cells) then you don't need to run the algorithm 9 times to
validate presence of object in each grid cell rather this is one single convolutional
implementation.
Learning capabilities: The algorithm has excellent learning capabilities that
enable it to learn the representations of objects and apply them in object
detection.
38. Surveys
According to national federation of blind , blind people can use all the
devices easily so they can also use our object recognition system.[5]
39. Advantage
This work is implemented using PTTS.
Easy to set up.
Open source tools were used for this project.
Cheap and cost-efficient.
This project will work on device only no need to buy any extra
things.
40. Conclusion
Simple Indian object recognition system
based on yolo algorithm has been
proposed.
The system has been written in OpenCV.
41. Future Work
Enhancing the accuracy by building a model of features for each
object class. Working now on using local features instead of
template matching Enhancing the best frame to be processed for
runtime application Adding more objects to the database.