SlideShare a Scribd company logo
1 of 6
Download to read offline
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/342733702
Artificial Intelligence In Object Detection - Report
Preprint · January 2020
CITATIONS
0
READS
5,699
1 author:
Ashish Kumar
National Taipei University of Technology
11 PUBLICATIONS 2 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ashish Kumar on 07 July 2020.
The user has requested enhancement of the downloaded file.
Ashish Kumar
National Taipei University of Technology
FC Report
Abstract— In this report on Artificial intelligence in object
detection major developments in this research field are presented
also my main research based on face and motion detection is
explained a little bit. Different architectures based on convolutional
neural network a class of deep neural network is studied and
different methodologies for object detection are presented and
compared. Also an approach for 3D modelling using fuzzy logic in
presented.
Index Terms— 3D modelling, Object Detection., AI, Fuzzy logic
I. INTRODUCTION
N computer science, artificial intelligence (AI), sometimes
called machine intelligence, is intelligence demonstrated
by machines, in contrast to the natural intelligence displayed by
humans. AI textbooks define the field as the study of
"intelligent agents": any device that perceives its environment
and takes actions that maximize its chance of successfully
achieving its goals [1]. Colloquially, the term "artificial
intelligence" is often used to describe machines (or computers)
that mimic "cognitive" functions that humans associate with
the human mind, such as "learning" and "problem solving" [2].
The traditional problems of AI research
include reasoning, knowledge
representation, planning, learning, natural language
processing, perception and the ability to move and manipulate
objects [3]. General intelligence is among the field's long-term
goals[4]. Approaches include statistical
methods, computational intelligence, and traditional symbolic
AI. Many tools are used in AI, including versions of search and
mathematical optimization, artificial neural networks,
and methods based on statistics, probability and economics
A fuzzy control system is a control system based on fuzzy logic
i.e. a mathematical system that analyzes analog input values in
terms of logical variables that take on continuous values
between 0 and 1, in contrast to classical or digital logic, which
operates on discrete values of either 1 or 0 (true or false,
respectively). Fuzzy logic is a form of many-valued logic in
which the truth values of variables may be any real
number between 0 and 1 both inclusive. It is employed to
handle the concept of partial truth, where the truth value may
range between completely true and completely false [5]. By
contrast, in Boolean logic, the truth values of variables may
only be the integer values 0 or 1. Fuzzy logic is based on the
observation that people make decisions based on imprecise and
non-numerical information. Fuzzy models or sets are
mathematical means of representing vagueness and imprecise
information. These models have the capability of recognizing,
representing, manipulating, interpreting, and utilizing data and
information that are vague and lack certainty. Object detection
is a computer vision technique whose aim is to detect objects
such as cars, buildings, and human beings, just to mention a
few. The objects can generally be identified from either pictures
or video feeds. Object detection has been applied widely in
video surveillance, self-driving cars, and object/people
tracking. Object detection is widely used in computer vision
tasks such as face detection, face recognition and video object
co-segmentation.
My research project is basically based on face and motion
detection using python programming language. But in this
report I mainly focus on AI and Object detection.
The remainder of this report is structured as follows. In Section
II, AI, machine learning (ML) and deep learning (DL) are told.
Model architectures for object detection explained in section
III. Results and discussion are presented in section IV.
Conclusions and future study directions are concluded in
section V.
II. AI VS ML VS DL
Artificial Intelligence in Object Detection
Ashish Kumar
Department of Electrical Engineering and Computer Science
National Taipei University of Technology
Taipei, Taiwan 10608
Email: t108998404@ntut.edu.tw
I
Fig. 1: Showing Artificial Intelligence is a superset within
which Machine Learning and Deep Learning belong
[Google images]
AI is human intelligence demonstrated by machines to perform
simple to complex tasks where in ML it provides machines the
ability to learn and understand without being explicitly
programmed. The idea behind AI is to program machines to
carry out tasks in more human ways or smart ways in ML the
key to teach computers to think and understand like we do is
ML.
Methods for object detection generally fall into either machine
learning-based approaches or deep learning-based approaches.
For Machine Learning approaches, it becomes necessary to first
define features using one of the methods below, then using a
technique such as support vector machine (SVM) to do the
classification. On the other hand, deep learning techniques are
able to do end-to-end object detection without specifically
defining features, and are typically based on convolutional
neural networks (CNN).
 Machine Learning approaches:
 Viola–Jones object detection framework based
on Haar features
 Scale-invariant feature transform (SIFT)
 Histogram of oriented gradients (HOG) features
 Deep Learning approaches:
 Region Proposals (R-CNN, Fast R-CNN, Faster R-
CNN)
 Single Shot MultiBox Detector (SSD)
 You Only Look Once (YOLO)
 Single-Shot Refinement Neural Network for Object
Detection (RefineDet)
.
III. MODEL ARCHITECTURES FOR OBJECT
DETECTION
In this section some famous model architectures for object
detection and their main principles are shown:
1. R-CNN
To bypass the problem of selecting a huge number of
regions, Ross Girshick et al. proposed a method where we use
selective search to extract just 2000 regions from the image and
he called them region proposals. Therefore, now, instead of
trying to classify a huge number of regions, we can just work
with 2000 regions. These 2000 region proposals are generated
using the selective search algorithm which is written below [6].
Selective Search:
1. Generate initial sub-segmentation, we generate many candidate
regions
2. Use greedy algorithm to recursively combine similar regions
into larger ones
3. Use the generated regions to produce the final candidate region
proposals
Fig. 2: RCNN: Regions with CNN features [6]
These 2000 candidate region proposals are warped into a square
and fed into a convolutional neural network that produces a
4096-dimensional feature vector as output. The CNN acts as a
feature extractor and the output dense layer consists of the
features extracted from the image and the extracted features are
fed into an SVM to classify the presence of the object within that
candidate region proposal. In addition to predicting the presence
of an object within the region proposals, the algorithm also
predicts four values which are offset values to increase the
precision of the bounding box. For example, given a region
proposal, the algorithm would have predicted the presence of a
person but the face of that person within that region proposal
could’ve been cut in half. Therefore, the offset values help in
adjusting the bounding box of the region proposal.
Major Problems in R-CNN are as follows: (1) It still takes a huge
amount of time to train the network as you would have to
classify 2000 region proposals per image. (2) It cannot be
implemented real time as it takes around 47 seconds for each
test image. (3) The selective search algorithm is a fixed
algorithm. Therefore, no learning is happening at that stage.
This could lead to the generation of bad candidate region
proposals.
2. Fast R-CNN
Fig. 3: Architecture Fast – RCNN [7]
Ross Girshick et al. solved some of the drawbacks of R-CNN to
build a faster object detection algorithm and it was called Fast
R-CNN. The approach is similar to the R-CNN algorithm. But,
instead of feeding the region proposals to the CNN, here feeding
the input image to the CNN to generate a convolutional feature
map is there. From the convolutional feature map, one can
identify the region of proposals and warp them into squares and
by using a RoI pooling layer user can reshape them into a fixed
size so that it can be fed into a fully connected layer [7]. From
the RoI feature vector, user use a softmax layer to predict the
class of the proposed region and also the offset values for the
bounding box. The reason “Fast R-CNN” is faster than R-CNN
is because you don’t have to feed 2000 region proposals to the
convolutional neural network every time. Instead, the
convolution operation is done only once per image and a feature
map is generated from it.
Fig.4: Comparison of object detection algorithms [online
source]
From the above graphs, it is shown that Fast R-CNN is
significantly faster in training and testing sessions over R-CNN.
When you look at the performance of Fast R-CNN during testing
time, including region proposals slows down the algorithm
significantly when compared to not using region proposals.
Therefore, region proposals become bottlenecks in Fast R-CNN
algorithm affecting its performance.
3. Faster R-CNN
Fig. 5: Faster R-CNN Structure [8]
Both of the above algorithms (R-CNN & Fast R-CNN) uses
selective search to find out the region proposals. Selective search
is a slow and time-consuming process affecting the performance
of the network. Therefore, Shaoqing Ren et al. came up with an
object detection algorithm that eliminates the selective search
algorithm and lets the network learn the region proposals [8].
Similar to Fast R-CNN, the image is provided as an input to a
convolutional network which provides a convolutional feature
map. Instead of using selective search algorithm on the feature
map to identify the region proposals, a separate network is used
to predict the region proposals. The predicted region proposals
are then reshaped using a RoI pooling layer which is then used
to classify the image within the proposed region and predict the
offset values for the bounding boxes.
Fig.6 Comparison of test-time speed of object detection
algorithms [online source]
From the above graph, it can be seen that Faster R-CNN is much
faster than it’s predecessors. Therefore, it can even be used for
real-time object detection.
4. YOLO
All of the previous object detection algorithms use regions to
localize the object within the image. The network does not look
at the complete image. Instead, parts of the image which have
high probabilities of containing the object. YOLO or You Only
Look Once is an object detection algorithm much different from
the region based algorithms seen above. In YOLO a single
convolutional network predicts the bounding boxes and the class
probabilities for these boxes [9].
Fig.7: YOLO: You Only Look Once [9]
In YOLO we take an image and split it into an SxS grid, within
each of the grid we take m bounding boxes. For each of the
bounding box, the network outputs a class probability and offset
values for the bounding box. The bounding boxes having the
class probability above a threshold value is selected and used to
locate the object within the image. YOLO is orders of magnitude
faster (45 frames per second) than other object detection
algorithms. The limitation of YOLO algorithm is that it struggles
with small objects within the image, for example it might have
difficulties in detecting a flock of birds. This is due to the spatial
constraints of the algorithm.
Fig.8: Real Time Object detection using Laptop Webcam
I have tried yolo myself and it works good in real time object
detection compared to others. I used it personally to detect the
things around me like cellphone, bottle, mugs and bowls not to
forget myself as a person. I also think there is a gaffe in YOLO
as if a thing or a person is blocking the half frame for detection
or is overlapping a thing YOLO doesn’t detect it even though it
is in the frame. Reason could be the weights are not trained to
do so or the data used for detection doesn’t have such images.
Though the users made it on Linux and IOS based device but it
can also be used on Microsoft OS. Using anconda3, Cygwin,
Cuda and Nvidia – CUDNN.
Also in the International Journal of Science and Research
(IJSR) a paper written by Prerna Dahiya and Kamal Kumar
Ranga on real time object detection and 3D modelling using
fuzzy logic is there which used Entropy based selection of
optimum transformation of input data, wavelet based
transformation and fuzzy logic for visualizing and quantifying
the degree of difficulty of detecting objects and a technique to
detect the object and modeling of the object. Were authors
proposed a system OD3DM that can detect, extract and model
the images in 3D. The experimental results on collected image
dataset show that their proposed approach is more accurate and
efficient than traditional methods. they prepared the model
which accurately detects the complex geometric structures and
model it into 3D. Fuzzy logic and entropy based selection of
optimum based input data has been used to implement their
work. Common pattern detection technique provides efficient
detection and modeling of complex geometric objects. All of
the implementation were done in Matlab fuzzy Logic methods
which provide better and accurate results as compare to the
traditional approaches [10].
Currently I am working on a project which is an application of
Computer Vision using Python. Which can be used to detect
moving objects from the computer/laptop webcam which will
store and visualize the times when the object entered and exited
the video frame which will work as an aid to CCTV in saving
memory and energy as it won’t have to work 24*7 only when
an object is there in the coverage area of it then it will work.
IV. RESULTS & DISCUSSIONS
This report on artificial intelligence (AI) in object detection
shows different approaches in the modern word used for object
detection as can be seen from above sections there are different
techniques with their upgraded versions are there which help in
detection of object as well as real time object detection of
objects.
V. CONCLUSIONS
Real time object detection in today’s world has been made easy
using the various object detection techniques more
advancements can be done in this research field. for example:
in YOLO V3 Anchor box offset prediction, focal loss and liner
prediction instead of logistic didn’t work so in future it can
further be extended by finding solutions to their problems.
There is always a possibility of improvement in this world of
researchers Nothing Is Perfect.
REFERENCES
[1] Poole, David; Mackworth, Alan; Goebel, Randy (1998). Computational
Intelligence: A Logical Approach. New York: Oxford University
Press. ISBN 978-0-19-510270-3.
[2] Russell, Stuart J.; Norvig, Peter (2009). Artificial Intelligence: A Modern
Approach (3rd ed.). Upper Saddle River, New Jersey: Prentice
Hall. ISBN 978-0-13-604259-4.
[3] Luger, George; Stubblefield, William (2004). Artificial Intelligence:
Structures and Strategies for Complex Problem Solving (5th ed.).
Benjamin/Cummings. ISBN 978-0-8053-4780-7
[4] Kurzweil, Ray (1999). The Age of Spiritual Machines. Penguin
Books. ISBN 978-0-670-88217-5.
[5] Novák, V.; Perfilieva, I.; Močkoř, J. (1999). Mathematical principles of
fuzzy logic. Dordrecht: Kluwer Academic. ISBN 978-0-7923-8595-0
[6] Girshick et al, “Rich feature hierarchies for accurate object detection and
semantic segmentation”, CVPR 2014.
[7] Girshick, “Fast R-CNN”, ICCV 2015.
[8] Ren et al, “Faster R-CNN: Towards Real-Time Object Detection with
Region Proposal Networks”, NIPS 2015.
[9] Redmon et al, “You Only Look Once: Unified, Real-Time Object
Detection”, CVPR 2016.
[10] P. Dahiya et al, “Real Time Object Detection and 3D Modeling Using
Fuzzy Logic”, IJSR June 2014.
(a)
Fig. 9: Real Time Object detection using YOLO and Laptop
Webcam (a) wrong detection of towel as person (b) correct
detection of person as person
(b)
View publication stats

More Related Content

Similar to ArtificialIntelligenceInObjectDetection-Report.pdf

Unsupervised learning models of invariant features in images: Recent developm...
Unsupervised learning models of invariant features in images: Recent developm...Unsupervised learning models of invariant features in images: Recent developm...
Unsupervised learning models of invariant features in images: Recent developm...IJSCAI Journal
 
Real-Time Pertinent Maneuver Recognition for Surveillance
Real-Time Pertinent Maneuver Recognition for SurveillanceReal-Time Pertinent Maneuver Recognition for Surveillance
Real-Time Pertinent Maneuver Recognition for SurveillanceIRJET Journal
 
IRJET- Recognition of Handwritten Characters based on Deep Learning with Tens...
IRJET- Recognition of Handwritten Characters based on Deep Learning with Tens...IRJET- Recognition of Handwritten Characters based on Deep Learning with Tens...
IRJET- Recognition of Handwritten Characters based on Deep Learning with Tens...IRJET Journal
 
IRJET- A Survey on Smart Security System using Artificial Intelligence
IRJET- A Survey on Smart Security System using Artificial IntelligenceIRJET- A Survey on Smart Security System using Artificial Intelligence
IRJET- A Survey on Smart Security System using Artificial IntelligenceIRJET Journal
 
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...IAEME Publication
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To SentencesIRJET Journal
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET Journal
 
IRJET- Real-Time Object Detection System using Caffe Model
IRJET- Real-Time Object Detection System using Caffe ModelIRJET- Real-Time Object Detection System using Caffe Model
IRJET- Real-Time Object Detection System using Caffe ModelIRJET Journal
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
IRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for BlindIRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for BlindIRJET Journal
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectIOSR Journals
 
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET Journal
 
Object Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNetObject Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNetIRJET Journal
 
Object Detection An Overview
Object Detection An OverviewObject Detection An Overview
Object Detection An Overviewijtsrd
 
Human Action Recognition using Contour History Images and Neural Networks Cla...
Human Action Recognition using Contour History Images and Neural Networks Cla...Human Action Recognition using Contour History Images and Neural Networks Cla...
Human Action Recognition using Contour History Images and Neural Networks Cla...IRJET Journal
 
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
IRJET- Object Detection and Recognition using Single Shot Multi-Box DetectorIRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
IRJET- Object Detection and Recognition using Single Shot Multi-Box DetectorIRJET Journal
 
Color Based Object Tracking with OpenCV A Survey
Color Based Object Tracking with OpenCV A SurveyColor Based Object Tracking with OpenCV A Survey
Color Based Object Tracking with OpenCV A SurveyYogeshIJTSRD
 

Similar to ArtificialIntelligenceInObjectDetection-Report.pdf (20)

Unsupervised learning models of invariant features in images: Recent developm...
Unsupervised learning models of invariant features in images: Recent developm...Unsupervised learning models of invariant features in images: Recent developm...
Unsupervised learning models of invariant features in images: Recent developm...
 
Real-Time Pertinent Maneuver Recognition for Surveillance
Real-Time Pertinent Maneuver Recognition for SurveillanceReal-Time Pertinent Maneuver Recognition for Surveillance
Real-Time Pertinent Maneuver Recognition for Surveillance
 
A350111
A350111A350111
A350111
 
IRJET- Recognition of Handwritten Characters based on Deep Learning with Tens...
IRJET- Recognition of Handwritten Characters based on Deep Learning with Tens...IRJET- Recognition of Handwritten Characters based on Deep Learning with Tens...
IRJET- Recognition of Handwritten Characters based on Deep Learning with Tens...
 
IRJET- A Survey on Smart Security System using Artificial Intelligence
IRJET- A Survey on Smart Security System using Artificial IntelligenceIRJET- A Survey on Smart Security System using Artificial Intelligence
IRJET- A Survey on Smart Security System using Artificial Intelligence
 
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
MULTIPLE HUMAN TRACKING USING RETINANET FEATURES, SIAMESE NEURAL NETWORK, AND...
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To Sentences
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
 
IRJET- Real-Time Object Detection System using Caffe Model
IRJET- Real-Time Object Detection System using Caffe ModelIRJET- Real-Time Object Detection System using Caffe Model
IRJET- Real-Time Object Detection System using Caffe Model
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
IRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for BlindIRJET- Wearable AI Device for Blind
IRJET- Wearable AI Device for Blind
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Computer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an ObjectComputer Vision: Visual Extent of an Object
Computer Vision: Visual Extent of an Object
 
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNNIRJET- Weakly Supervised Object Detection by using Fast R-CNN
IRJET- Weakly Supervised Object Detection by using Fast R-CNN
 
Object Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNetObject Detetcion using SSD-MobileNet
Object Detetcion using SSD-MobileNet
 
Object Detection An Overview
Object Detection An OverviewObject Detection An Overview
Object Detection An Overview
 
Human Action Recognition using Contour History Images and Neural Networks Cla...
Human Action Recognition using Contour History Images and Neural Networks Cla...Human Action Recognition using Contour History Images and Neural Networks Cla...
Human Action Recognition using Contour History Images and Neural Networks Cla...
 
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
IRJET- Object Detection and Recognition using Single Shot Multi-Box DetectorIRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
 
Color Based Object Tracking with OpenCV A Survey
Color Based Object Tracking with OpenCV A SurveyColor Based Object Tracking with OpenCV A Survey
Color Based Object Tracking with OpenCV A Survey
 

Recently uploaded

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxsqpmdrvczh
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 

Recently uploaded (20)

ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Romantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptxRomantic Opera MUSIC FOR GRADE NINE pptx
Romantic Opera MUSIC FOR GRADE NINE pptx
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 

ArtificialIntelligenceInObjectDetection-Report.pdf

  • 1. See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/342733702 Artificial Intelligence In Object Detection - Report Preprint · January 2020 CITATIONS 0 READS 5,699 1 author: Ashish Kumar National Taipei University of Technology 11 PUBLICATIONS 2 CITATIONS SEE PROFILE All content following this page was uploaded by Ashish Kumar on 07 July 2020. The user has requested enhancement of the downloaded file.
  • 2. Ashish Kumar National Taipei University of Technology FC Report Abstract— In this report on Artificial intelligence in object detection major developments in this research field are presented also my main research based on face and motion detection is explained a little bit. Different architectures based on convolutional neural network a class of deep neural network is studied and different methodologies for object detection are presented and compared. Also an approach for 3D modelling using fuzzy logic in presented. Index Terms— 3D modelling, Object Detection., AI, Fuzzy logic I. INTRODUCTION N computer science, artificial intelligence (AI), sometimes called machine intelligence, is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans. AI textbooks define the field as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals [1]. Colloquially, the term "artificial intelligence" is often used to describe machines (or computers) that mimic "cognitive" functions that humans associate with the human mind, such as "learning" and "problem solving" [2]. The traditional problems of AI research include reasoning, knowledge representation, planning, learning, natural language processing, perception and the ability to move and manipulate objects [3]. General intelligence is among the field's long-term goals[4]. Approaches include statistical methods, computational intelligence, and traditional symbolic AI. Many tools are used in AI, including versions of search and mathematical optimization, artificial neural networks, and methods based on statistics, probability and economics A fuzzy control system is a control system based on fuzzy logic i.e. a mathematical system that analyzes analog input values in terms of logical variables that take on continuous values between 0 and 1, in contrast to classical or digital logic, which operates on discrete values of either 1 or 0 (true or false, respectively). Fuzzy logic is a form of many-valued logic in which the truth values of variables may be any real number between 0 and 1 both inclusive. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely false [5]. By contrast, in Boolean logic, the truth values of variables may only be the integer values 0 or 1. Fuzzy logic is based on the observation that people make decisions based on imprecise and non-numerical information. Fuzzy models or sets are mathematical means of representing vagueness and imprecise information. These models have the capability of recognizing, representing, manipulating, interpreting, and utilizing data and information that are vague and lack certainty. Object detection is a computer vision technique whose aim is to detect objects such as cars, buildings, and human beings, just to mention a few. The objects can generally be identified from either pictures or video feeds. Object detection has been applied widely in video surveillance, self-driving cars, and object/people tracking. Object detection is widely used in computer vision tasks such as face detection, face recognition and video object co-segmentation. My research project is basically based on face and motion detection using python programming language. But in this report I mainly focus on AI and Object detection. The remainder of this report is structured as follows. In Section II, AI, machine learning (ML) and deep learning (DL) are told. Model architectures for object detection explained in section III. Results and discussion are presented in section IV. Conclusions and future study directions are concluded in section V. II. AI VS ML VS DL Artificial Intelligence in Object Detection Ashish Kumar Department of Electrical Engineering and Computer Science National Taipei University of Technology Taipei, Taiwan 10608 Email: t108998404@ntut.edu.tw I
  • 3. Fig. 1: Showing Artificial Intelligence is a superset within which Machine Learning and Deep Learning belong [Google images] AI is human intelligence demonstrated by machines to perform simple to complex tasks where in ML it provides machines the ability to learn and understand without being explicitly programmed. The idea behind AI is to program machines to carry out tasks in more human ways or smart ways in ML the key to teach computers to think and understand like we do is ML. Methods for object detection generally fall into either machine learning-based approaches or deep learning-based approaches. For Machine Learning approaches, it becomes necessary to first define features using one of the methods below, then using a technique such as support vector machine (SVM) to do the classification. On the other hand, deep learning techniques are able to do end-to-end object detection without specifically defining features, and are typically based on convolutional neural networks (CNN).  Machine Learning approaches:  Viola–Jones object detection framework based on Haar features  Scale-invariant feature transform (SIFT)  Histogram of oriented gradients (HOG) features  Deep Learning approaches:  Region Proposals (R-CNN, Fast R-CNN, Faster R- CNN)  Single Shot MultiBox Detector (SSD)  You Only Look Once (YOLO)  Single-Shot Refinement Neural Network for Object Detection (RefineDet) . III. MODEL ARCHITECTURES FOR OBJECT DETECTION In this section some famous model architectures for object detection and their main principles are shown: 1. R-CNN To bypass the problem of selecting a huge number of regions, Ross Girshick et al. proposed a method where we use selective search to extract just 2000 regions from the image and he called them region proposals. Therefore, now, instead of trying to classify a huge number of regions, we can just work with 2000 regions. These 2000 region proposals are generated using the selective search algorithm which is written below [6]. Selective Search: 1. Generate initial sub-segmentation, we generate many candidate regions 2. Use greedy algorithm to recursively combine similar regions into larger ones 3. Use the generated regions to produce the final candidate region proposals Fig. 2: RCNN: Regions with CNN features [6] These 2000 candidate region proposals are warped into a square and fed into a convolutional neural network that produces a 4096-dimensional feature vector as output. The CNN acts as a feature extractor and the output dense layer consists of the features extracted from the image and the extracted features are fed into an SVM to classify the presence of the object within that candidate region proposal. In addition to predicting the presence of an object within the region proposals, the algorithm also predicts four values which are offset values to increase the precision of the bounding box. For example, given a region proposal, the algorithm would have predicted the presence of a person but the face of that person within that region proposal could’ve been cut in half. Therefore, the offset values help in adjusting the bounding box of the region proposal. Major Problems in R-CNN are as follows: (1) It still takes a huge amount of time to train the network as you would have to classify 2000 region proposals per image. (2) It cannot be implemented real time as it takes around 47 seconds for each test image. (3) The selective search algorithm is a fixed algorithm. Therefore, no learning is happening at that stage. This could lead to the generation of bad candidate region proposals. 2. Fast R-CNN Fig. 3: Architecture Fast – RCNN [7] Ross Girshick et al. solved some of the drawbacks of R-CNN to build a faster object detection algorithm and it was called Fast R-CNN. The approach is similar to the R-CNN algorithm. But, instead of feeding the region proposals to the CNN, here feeding the input image to the CNN to generate a convolutional feature map is there. From the convolutional feature map, one can identify the region of proposals and warp them into squares and by using a RoI pooling layer user can reshape them into a fixed size so that it can be fed into a fully connected layer [7]. From
  • 4. the RoI feature vector, user use a softmax layer to predict the class of the proposed region and also the offset values for the bounding box. The reason “Fast R-CNN” is faster than R-CNN is because you don’t have to feed 2000 region proposals to the convolutional neural network every time. Instead, the convolution operation is done only once per image and a feature map is generated from it. Fig.4: Comparison of object detection algorithms [online source] From the above graphs, it is shown that Fast R-CNN is significantly faster in training and testing sessions over R-CNN. When you look at the performance of Fast R-CNN during testing time, including region proposals slows down the algorithm significantly when compared to not using region proposals. Therefore, region proposals become bottlenecks in Fast R-CNN algorithm affecting its performance. 3. Faster R-CNN Fig. 5: Faster R-CNN Structure [8] Both of the above algorithms (R-CNN & Fast R-CNN) uses selective search to find out the region proposals. Selective search is a slow and time-consuming process affecting the performance of the network. Therefore, Shaoqing Ren et al. came up with an object detection algorithm that eliminates the selective search algorithm and lets the network learn the region proposals [8]. Similar to Fast R-CNN, the image is provided as an input to a convolutional network which provides a convolutional feature map. Instead of using selective search algorithm on the feature map to identify the region proposals, a separate network is used to predict the region proposals. The predicted region proposals are then reshaped using a RoI pooling layer which is then used to classify the image within the proposed region and predict the offset values for the bounding boxes. Fig.6 Comparison of test-time speed of object detection algorithms [online source] From the above graph, it can be seen that Faster R-CNN is much faster than it’s predecessors. Therefore, it can even be used for real-time object detection. 4. YOLO All of the previous object detection algorithms use regions to localize the object within the image. The network does not look at the complete image. Instead, parts of the image which have high probabilities of containing the object. YOLO or You Only Look Once is an object detection algorithm much different from the region based algorithms seen above. In YOLO a single convolutional network predicts the bounding boxes and the class probabilities for these boxes [9]. Fig.7: YOLO: You Only Look Once [9] In YOLO we take an image and split it into an SxS grid, within each of the grid we take m bounding boxes. For each of the bounding box, the network outputs a class probability and offset values for the bounding box. The bounding boxes having the class probability above a threshold value is selected and used to locate the object within the image. YOLO is orders of magnitude faster (45 frames per second) than other object detection algorithms. The limitation of YOLO algorithm is that it struggles with small objects within the image, for example it might have difficulties in detecting a flock of birds. This is due to the spatial constraints of the algorithm.
  • 5. Fig.8: Real Time Object detection using Laptop Webcam I have tried yolo myself and it works good in real time object detection compared to others. I used it personally to detect the things around me like cellphone, bottle, mugs and bowls not to forget myself as a person. I also think there is a gaffe in YOLO as if a thing or a person is blocking the half frame for detection or is overlapping a thing YOLO doesn’t detect it even though it is in the frame. Reason could be the weights are not trained to do so or the data used for detection doesn’t have such images. Though the users made it on Linux and IOS based device but it can also be used on Microsoft OS. Using anconda3, Cygwin, Cuda and Nvidia – CUDNN. Also in the International Journal of Science and Research (IJSR) a paper written by Prerna Dahiya and Kamal Kumar Ranga on real time object detection and 3D modelling using fuzzy logic is there which used Entropy based selection of optimum transformation of input data, wavelet based transformation and fuzzy logic for visualizing and quantifying the degree of difficulty of detecting objects and a technique to detect the object and modeling of the object. Were authors proposed a system OD3DM that can detect, extract and model the images in 3D. The experimental results on collected image dataset show that their proposed approach is more accurate and efficient than traditional methods. they prepared the model which accurately detects the complex geometric structures and model it into 3D. Fuzzy logic and entropy based selection of optimum based input data has been used to implement their work. Common pattern detection technique provides efficient detection and modeling of complex geometric objects. All of the implementation were done in Matlab fuzzy Logic methods which provide better and accurate results as compare to the traditional approaches [10]. Currently I am working on a project which is an application of Computer Vision using Python. Which can be used to detect moving objects from the computer/laptop webcam which will store and visualize the times when the object entered and exited the video frame which will work as an aid to CCTV in saving memory and energy as it won’t have to work 24*7 only when an object is there in the coverage area of it then it will work. IV. RESULTS & DISCUSSIONS This report on artificial intelligence (AI) in object detection shows different approaches in the modern word used for object detection as can be seen from above sections there are different techniques with their upgraded versions are there which help in detection of object as well as real time object detection of objects. V. CONCLUSIONS Real time object detection in today’s world has been made easy using the various object detection techniques more advancements can be done in this research field. for example: in YOLO V3 Anchor box offset prediction, focal loss and liner prediction instead of logistic didn’t work so in future it can further be extended by finding solutions to their problems. There is always a possibility of improvement in this world of researchers Nothing Is Perfect. REFERENCES [1] Poole, David; Mackworth, Alan; Goebel, Randy (1998). Computational Intelligence: A Logical Approach. New York: Oxford University Press. ISBN 978-0-19-510270-3. [2] Russell, Stuart J.; Norvig, Peter (2009). Artificial Intelligence: A Modern Approach (3rd ed.). Upper Saddle River, New Jersey: Prentice Hall. ISBN 978-0-13-604259-4. [3] Luger, George; Stubblefield, William (2004). Artificial Intelligence: Structures and Strategies for Complex Problem Solving (5th ed.). Benjamin/Cummings. ISBN 978-0-8053-4780-7 [4] Kurzweil, Ray (1999). The Age of Spiritual Machines. Penguin Books. ISBN 978-0-670-88217-5. [5] Novák, V.; Perfilieva, I.; Močkoř, J. (1999). Mathematical principles of fuzzy logic. Dordrecht: Kluwer Academic. ISBN 978-0-7923-8595-0 [6] Girshick et al, “Rich feature hierarchies for accurate object detection and semantic segmentation”, CVPR 2014. [7] Girshick, “Fast R-CNN”, ICCV 2015. [8] Ren et al, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, NIPS 2015. [9] Redmon et al, “You Only Look Once: Unified, Real-Time Object Detection”, CVPR 2016. [10] P. Dahiya et al, “Real Time Object Detection and 3D Modeling Using Fuzzy Logic”, IJSR June 2014.
  • 6. (a) Fig. 9: Real Time Object detection using YOLO and Laptop Webcam (a) wrong detection of towel as person (b) correct detection of person as person (b) View publication stats