ArtificialIntelligenceInObjectDetection-Report.pdf

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/342733702
Artiﬁcial Intelligence In Object Detection - Report
Preprint · January 2020
CITATIONS
0
READS
5,699
1 author:
Ashish Kumar
National Taipei University of Technology
11 PUBLICATIONS 2 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ashish Kumar on 07 July 2020.
The user has requested enhancement of the downloaded file.

Ashish Kumar
FC Report
Abstract— In this report on Artificial intelligence in object
detection major developments in this research field are presented
also my main research based on face and motion detection is
explained a little bit. Different architectures based on convolutional
neural network a class of deep neural network is studied and
different methodologies for object detection are presented and
compared. Also an approach for 3D modelling using fuzzy logic in
presented.
Index Terms— 3D modelling, Object Detection., AI, Fuzzy logic
I. INTRODUCTION
N computer science, artificial intelligence (AI), sometimes
called machine intelligence, is intelligence demonstrated
by machines, in contrast to the natural intelligence displayed by
humans. AI textbooks define the field as the study of
"intelligent agents": any device that perceives its environment
and takes actions that maximize its chance of successfully
achieving its goals [1]. Colloquially, the term "artificial
intelligence" is often used to describe machines (or computers)
that mimic "cognitive" functions that humans associate with
the human mind, such as "learning" and "problem solving" [2].
The traditional problems of AI research
include reasoning, knowledge
representation, planning, learning, natural language
processing, perception and the ability to move and manipulate
objects [3]. General intelligence is among the field's long-term
goals[4]. Approaches include statistical
methods, computational intelligence, and traditional symbolic
AI. Many tools are used in AI, including versions of search and
mathematical optimization, artificial neural networks,
and methods based on statistics, probability and economics
A fuzzy control system is a control system based on fuzzy logic
i.e. a mathematical system that analyzes analog input values in
terms of logical variables that take on continuous values
between 0 and 1, in contrast to classical or digital logic, which
operates on discrete values of either 1 or 0 (true or false,
respectively). Fuzzy logic is a form of many-valued logic in
which the truth values of variables may be any real
number between 0 and 1 both inclusive. It is employed to
handle the concept of partial truth, where the truth value may
range between completely true and completely false [5]. By
contrast, in Boolean logic, the truth values of variables may
only be the integer values 0 or 1. Fuzzy logic is based on the
observation that people make decisions based on imprecise and
non-numerical information. Fuzzy models or sets are
mathematical means of representing vagueness and imprecise
information. These models have the capability of recognizing,
representing, manipulating, interpreting, and utilizing data and
information that are vague and lack certainty. Object detection
is a computer vision technique whose aim is to detect objects
such as cars, buildings, and human beings, just to mention a
few. The objects can generally be identified from either pictures
or video feeds. Object detection has been applied widely in
video surveillance, self-driving cars, and object/people
tracking. Object detection is widely used in computer vision
tasks such as face detection, face recognition and video object
co-segmentation.
My research project is basically based on face and motion
detection using python programming language. But in this
report I mainly focus on AI and Object detection.
The remainder of this report is structured as follows. In Section
II, AI, machine learning (ML) and deep learning (DL) are told.
Model architectures for object detection explained in section
III. Results and discussion are presented in section IV.
Conclusions and future study directions are concluded in
section V.
II. AI VS ML VS DL
Artificial Intelligence in Object Detection
Ashish Kumar
Department of Electrical Engineering and Computer Science
Taipei, Taiwan 10608
Email: t108998404@ntut.edu.tw
I

Fig. 1: Showing Artificial Intelligence is a superset within
which Machine Learning and Deep Learning belong
[Google images]
AI is human intelligence demonstrated by machines to perform
simple to complex tasks where in ML it provides machines the
ability to learn and understand without being explicitly
programmed. The idea behind AI is to program machines to
carry out tasks in more human ways or smart ways in ML the
key to teach computers to think and understand like we do is
ML.
Methods for object detection generally fall into either machine
learning-based approaches or deep learning-based approaches.
For Machine Learning approaches, it becomes necessary to first
define features using one of the methods below, then using a
technique such as support vector machine (SVM) to do the
classification. On the other hand, deep learning techniques are
able to do end-to-end object detection without specifically
defining features, and are typically based on convolutional
neural networks (CNN).
 Machine Learning approaches:
 Viola–Jones object detection framework based
on Haar features
 Scale-invariant feature transform (SIFT)
 Histogram of oriented gradients (HOG) features
 Deep Learning approaches:
 Region Proposals (R-CNN, Fast R-CNN, Faster R-
CNN)
 Single Shot MultiBox Detector (SSD)
 You Only Look Once (YOLO)
 Single-Shot Refinement Neural Network for Object
Detection (RefineDet)
.
III. MODEL ARCHITECTURES FOR OBJECT
DETECTION
In this section some famous model architectures for object
detection and their main principles are shown:
1. R-CNN
To bypass the problem of selecting a huge number of
regions, Ross Girshick et al. proposed a method where we use
selective search to extract just 2000 regions from the image and
he called them region proposals. Therefore, now, instead of
trying to classify a huge number of regions, we can just work
with 2000 regions. These 2000 region proposals are generated
using the selective search algorithm which is written below [6].
Selective Search:
1. Generate initial sub-segmentation, we generate many candidate
regions
2. Use greedy algorithm to recursively combine similar regions
into larger ones
3. Use the generated regions to produce the final candidate region
proposals
Fig. 2: RCNN: Regions with CNN features [6]
These 2000 candidate region proposals are warped into a square
and fed into a convolutional neural network that produces a
4096-dimensional feature vector as output. The CNN acts as a
feature extractor and the output dense layer consists of the
features extracted from the image and the extracted features are
fed into an SVM to classify the presence of the object within that
candidate region proposal. In addition to predicting the presence
of an object within the region proposals, the algorithm also
predicts four values which are offset values to increase the
precision of the bounding box. For example, given a region
proposal, the algorithm would have predicted the presence of a
person but the face of that person within that region proposal
could’ve been cut in half. Therefore, the offset values help in
adjusting the bounding box of the region proposal.
Major Problems in R-CNN are as follows: (1) It still takes a huge
amount of time to train the network as you would have to
classify 2000 region proposals per image. (2) It cannot be
implemented real time as it takes around 47 seconds for each
test image. (3) The selective search algorithm is a fixed
algorithm. Therefore, no learning is happening at that stage.
This could lead to the generation of bad candidate region
proposals.
2. Fast R-CNN
Fig. 3: Architecture Fast – RCNN [7]
Ross Girshick et al. solved some of the drawbacks of R-CNN to
build a faster object detection algorithm and it was called Fast
R-CNN. The approach is similar to the R-CNN algorithm. But,
instead of feeding the region proposals to the CNN, here feeding
the input image to the CNN to generate a convolutional feature
map is there. From the convolutional feature map, one can
identify the region of proposals and warp them into squares and
by using a RoI pooling layer user can reshape them into a fixed
size so that it can be fed into a fully connected layer [7]. From

the RoI feature vector, user use a softmax layer to predict the
class of the proposed region and also the offset values for the
bounding box. The reason “Fast R-CNN” is faster than R-CNN
is because you don’t have to feed 2000 region proposals to the
convolutional neural network every time. Instead, the
convolution operation is done only once per image and a feature
map is generated from it.
Fig.4: Comparison of object detection algorithms [online
source]
From the above graphs, it is shown that Fast R-CNN is
significantly faster in training and testing sessions over R-CNN.
When you look at the performance of Fast R-CNN during testing
time, including region proposals slows down the algorithm
significantly when compared to not using region proposals.
Therefore, region proposals become bottlenecks in Fast R-CNN
algorithm affecting its performance.
3. Faster R-CNN
Fig. 5: Faster R-CNN Structure [8]
Both of the above algorithms (R-CNN & Fast R-CNN) uses
selective search to find out the region proposals. Selective search
is a slow and time-consuming process affecting the performance
of the network. Therefore, Shaoqing Ren et al. came up with an
object detection algorithm that eliminates the selective search
algorithm and lets the network learn the region proposals [8].
Similar to Fast R-CNN, the image is provided as an input to a
convolutional network which provides a convolutional feature
map. Instead of using selective search algorithm on the feature
map to identify the region proposals, a separate network is used
to predict the region proposals. The predicted region proposals
are then reshaped using a RoI pooling layer which is then used
to classify the image within the proposed region and predict the
offset values for the bounding boxes.
Fig.6 Comparison of test-time speed of object detection
algorithms [online source]
From the above graph, it can be seen that Faster R-CNN is much
faster than it’s predecessors. Therefore, it can even be used for
real-time object detection.
4. YOLO
All of the previous object detection algorithms use regions to
localize the object within the image. The network does not look
at the complete image. Instead, parts of the image which have
high probabilities of containing the object. YOLO or You Only
Look Once is an object detection algorithm much different from
the region based algorithms seen above. In YOLO a single
convolutional network predicts the bounding boxes and the class
probabilities for these boxes [9].
Fig.7: YOLO: You Only Look Once [9]
In YOLO we take an image and split it into an SxS grid, within
each of the grid we take m bounding boxes. For each of the
bounding box, the network outputs a class probability and offset
values for the bounding box. The bounding boxes having the
class probability above a threshold value is selected and used to
locate the object within the image. YOLO is orders of magnitude
faster (45 frames per second) than other object detection
algorithms. The limitation of YOLO algorithm is that it struggles
with small objects within the image, for example it might have
difficulties in detecting a flock of birds. This is due to the spatial
constraints of the algorithm.

Fig.8: Real Time Object detection using Laptop Webcam
I have tried yolo myself and it works good in real time object
detection compared to others. I used it personally to detect the
things around me like cellphone, bottle, mugs and bowls not to
forget myself as a person. I also think there is a gaffe in YOLO
as if a thing or a person is blocking the half frame for detection
or is overlapping a thing YOLO doesn’t detect it even though it
is in the frame. Reason could be the weights are not trained to
do so or the data used for detection doesn’t have such images.
Though the users made it on Linux and IOS based device but it
can also be used on Microsoft OS. Using anconda3, Cygwin,
Cuda and Nvidia – CUDNN.
Also in the International Journal of Science and Research
(IJSR) a paper written by Prerna Dahiya and Kamal Kumar
Ranga on real time object detection and 3D modelling using
fuzzy logic is there which used Entropy based selection of
optimum transformation of input data, wavelet based
transformation and fuzzy logic for visualizing and quantifying
the degree of difficulty of detecting objects and a technique to
detect the object and modeling of the object. Were authors
proposed a system OD3DM that can detect, extract and model
the images in 3D. The experimental results on collected image
dataset show that their proposed approach is more accurate and
efficient than traditional methods. they prepared the model
which accurately detects the complex geometric structures and
model it into 3D. Fuzzy logic and entropy based selection of
optimum based input data has been used to implement their
work. Common pattern detection technique provides efficient
detection and modeling of complex geometric objects. All of
the implementation were done in Matlab fuzzy Logic methods
which provide better and accurate results as compare to the
traditional approaches [10].
Currently I am working on a project which is an application of
Computer Vision using Python. Which can be used to detect
moving objects from the computer/laptop webcam which will
store and visualize the times when the object entered and exited
the video frame which will work as an aid to CCTV in saving
memory and energy as it won’t have to work 24*7 only when
an object is there in the coverage area of it then it will work.
IV. RESULTS & DISCUSSIONS
This report on artificial intelligence (AI) in object detection
shows different approaches in the modern word used for object
detection as can be seen from above sections there are different
techniques with their upgraded versions are there which help in
detection of object as well as real time object detection of
objects.
V. CONCLUSIONS
Real time object detection in today’s world has been made easy
using the various object detection techniques more
advancements can be done in this research field. for example:
in YOLO V3 Anchor box offset prediction, focal loss and liner
prediction instead of logistic didn’t work so in future it can
further be extended by finding solutions to their problems.
There is always a possibility of improvement in this world of
researchers Nothing Is Perfect.
REFERENCES
[1] Poole, David; Mackworth, Alan; Goebel, Randy (1998). Computational
Intelligence: A Logical Approach. New York: Oxford University
Press. ISBN 978-0-19-510270-3.
[2] Russell, Stuart J.; Norvig, Peter (2009). Artificial Intelligence: A Modern
Approach (3rd ed.). Upper Saddle River, New Jersey: Prentice
Hall. ISBN 978-0-13-604259-4.
[3] Luger, George; Stubblefield, William (2004). Artificial Intelligence:
Structures and Strategies for Complex Problem Solving (5th ed.).
Benjamin/Cummings. ISBN 978-0-8053-4780-7
[4] Kurzweil, Ray (1999). The Age of Spiritual Machines. Penguin
Books. ISBN 978-0-670-88217-5.
[5] Novák, V.; Perfilieva, I.; Močkoř, J. (1999). Mathematical principles of
fuzzy logic. Dordrecht: Kluwer Academic. ISBN 978-0-7923-8595-0
[6] Girshick et al, “Rich feature hierarchies for accurate object detection and
semantic segmentation”, CVPR 2014.
[7] Girshick, “Fast R-CNN”, ICCV 2015.
[8] Ren et al, “Faster R-CNN: Towards Real-Time Object Detection with
Region Proposal Networks”, NIPS 2015.
[9] Redmon et al, “You Only Look Once: Unified, Real-Time Object
Detection”, CVPR 2016.
[10] P. Dahiya et al, “Real Time Object Detection and 3D Modeling Using
Fuzzy Logic”, IJSR June 2014.

(a)
Fig. 9: Real Time Object detection using YOLO and Laptop
Webcam (a) wrong detection of towel as person (b) correct
detection of person as person
(b)
View publication stats

ArtificialIntelligenceInObjectDetection-Report.pdf

Recommended

Recommended

More Related Content

Similar to ArtificialIntelligenceInObjectDetection-Report.pdf

Similar to ArtificialIntelligenceInObjectDetection-Report.pdf (20)

Recently uploaded

Recently uploaded (20)

ArtificialIntelligenceInObjectDetection-Report.pdf