đ This way, your report shows progression: from basics (this week) â to detailed algorithms + implementation (next week). Do you want me to also create a ready-to-submit write-up for next weekâs report (like the one I reframed for this week)?
Similar to đ This way, your report shows progression: from basics (this week) â to detailed algorithms + implementation (next week). Do you want me to also create a ready-to-submit write-up for next weekâs report (like the one I reframed for this week)?
đ This way, your report shows progression: from basics (this week) â to detailed algorithms + implementation (next week). Do you want me to also create a ready-to-submit write-up for next weekâs report (like the one I reframed for this week)?
1.
YOLO and ObjectDetection
Priyanshu Nishad
M.Tech CSE
Date: 08 September 2025
âą Object Detectionis a computer vision
technique used to identify and locate objects
within an image or video.
âą It not only recognizes what the object is but
also where it is located by drawing bounding
boxes.
4.
Object Detection Basics
âąObject detection involves two main steps:
âą 1. Classification â Determining what object is
present (e.g., cat, dog, car).
âą 2. Localization â Finding the position of the
object using bounding boxes.
5.
Traditional vs DeepLearning
âą Traditional methods: Haar Cascades, HOG +
SVM
âą âą Slower and less accurate
âą Deep Learning methods (e.g., YOLO)
âą âą Faster and highly accurate
âą âą Capable of real-time detection
6.
YOLO (You OnlyLook Once)
âą YOLO is one of the most popular algorithms
for object detection.
âą It processes the entire image in a single pass
to detect objects in real time.
7.
How YOLO Works
âąâą YOLO divides the image into grids.
âą âą Each grid cell predicts:
âą - Whether an object is present
âą - The class (type of object)
âą - The bounding box coordinates
8.
IMAGE
YOLO
3X3 IMAGE GRID
(3,3)GRID
(1,1 )
GRID
Bounding Box Values and
Confidence Score of the
Box
Bounding Box Values and
Confidence Score of the
Box
Working of YOLO
9.
METHODOLOGY:
ï§ First, animage is taken, and YOLO algorithm is applied. In our example, the image is
divided as grids of 3x3 matrixes. We can divide the image into any number grids,
depending on the complexity of the image.
ï§ Once the image is divided, each grid undergoes classification and localization of the
object. The objectness or the confidence score of each grid is found. If there is no proper
object found in the grid, then the objectness and bounding box value of the grid will be
zero or if there found an object in the grid then the objectness will be 1 and the bounding
box value will be its corresponding bounding values of the found object.
ï§ Bounding box predictions: YOLO algorithm is used for predicting the accurate
bounding boxes from the image. The image divides into S x S grids by predicting the
bounding boxes for each grid and class probabilities. Both image classification and object
localization techniques are applied for each grid of the image and each grid is assigned
with a label. Then the algorithm checks each grid separately and marks the label which has
an object in it and marks its bounding boxes. The labels of the gird without object are
marked as zero.
10.
Example image with3x3 grids
Consider the above example, an image is taken, and it is divided in the form of 3 x 3 matrixes. Each
grid is labelled, and each grid undergoes both image classification and objects localization techniques.
The label is considered as Y.
Y consists of 8 values :
Pc â Represents whether an object is present in the grid or not. If present pc=1 else 0. bx, by, bh, bw â
are the bounding boxes of the objects (if present). c1, c2, c3 â are the classes. If the object is a car then
c1 and c3 will be 0 and c2 will be 1.
.
Anchor Box: Byusing Bounding boxes for object detection, only one object can be identified by a
grid. So, for detecting more than one object we go for Anchor box.
ACCURACY IMPROVEMENT:
Consider the above picture, in that both the human and the carâs midpoint come under the same grid cell
Example image with Anchor Box
13.
For this case,we use the anchor box method. The red color grid cells are the two anchor boxes for
those objects. Any number of anchor boxes can be used for a single image to detect multiple objects.
In our case, we have taken two anchor boxes.
The above figure represents the anchor box of the image we considered. The vertical anchor box is for the human
and the horizontal one is the anchor box of the car. In this type of overlapping object detection, the label Y
contains 16 values i.e, the values of both anchor boxes.
Pc in both the anchor box represents the presence of the object. bx, by, bh, bw in both the anchor box represents
their corresponding bounding box values. The value of the class in anchor box 1 is (1, 0, 0) because the detected
object is a human. In the case of anchor box 2, the detected object is a car so the class value is (0, 1, 0). In this
case, the matrix form of Y will be Y= 3x3x16 or Y= 3x3x2x8. Because of two anchor box, it is 2x8
Anchor boxes Anchor box prediction values
14.
Design: The ideaof YOLO is to make a Convolutional neural network to predict a (7, 7, 30) tensor. It uses
a Convolutional neural network to scale back the spatial dimension to 7x7 with 1024 output channels at
every location. By using two fully connected layers it performs a linear regression to create a 7x7x2
bounding box prediction. Finally, a prediction is made by considering the high confidence score of a box.
The initial convolutional layers of the network extract feature from the image while the fully connected
layers predict the output probabilities and coordinates. Our network architecture is inspired by the
GoogLeNet model for image classiïŹcation. Our network has 24 convolutional layers followed by 2 fully
connected layers. However, instead of the inception modules used by GoogLeNet we simply use 1Ă1
reduction layers followed by 3Ă3 convolutional layers, similar to Linetal. The full network is shown in Fig.
15.
Advantages of YOLO
âąâą Extremely fast â supports real-time
detection
âą âą Detects multiple objects at the same time
âą âą Balanced performance between accuracy
and speed
16.
Applications of YOLOand Object Detection
âą âą Self-driving cars: Detecting pedestrians,
vehicles, traffic signals
âą âą Healthcare: Identifying tumors or
abnormalities
âą âą Industrial use: Detecting defects in products
âą âą Security: Surveillance and suspicious activity
detection
17.
Future Scope /Next Work
âą âą Implementation of YOLO in practical projects
âą âą Experimenting with different YOLO versions
(YOLOv5, YOLOv8)
âą âą Improving accuracy in complex
environments
YOLO is apowerful and efficient object
detection algorithm.
It combines classification and localization in
real time, making it a valuable tool for modern
computer vision applications.
Learning Models and
Familiarizingwith
frameworks and
Implementation
Implementation
Testing, Analysis and
Report writing
September October November December