شرح تفصيلي لهندسة YOLOv8 - انهيار كامل.pptx

‫لهيكلية‬ ‫تفصيلي‬ ‫شرح‬
YOLOv8
Ahmed R. A. Shamsan

YOLOv8 architecture Details | Blocks
Conv
Block: the
most
common
used Block
• 2D Conv layer
• Batch normalization 2D
• SiLU Activation function
C2F Block
• Conv Blocks
• Bottleneck Blocks
SPPF Block
• Conv Blocks
• 3 2D maxpooling Layers
• Conv Block
Detect
Block
• Where The Detection Happen
I will explain YOLO architecture before explaining the whole YOLOv8 architecture I will explain the details of YOLOv8 blocks
they are all used together into a
single convolutional block

video I will explain YOLO architecture before explaining the
whole YOLO f8 architecture I will explain details of YOLOv8
blocks so that you can understand more easily, the most
commonly used block is the convolutional Block in YOLOv8 a
convolutional block consists of a 2d convolutional layer a 2d
beds normalization and a SiLU activation function

The second block is the C2F block
• contains a convolutional block which then the resulting
feature Maps will be split one goes to the bottleneck block
• whereas the other goes directly into the concet Block
• in the C2F Block we can have many bottleneck blocks at the
end there is another convolutional block.
• bottleneck itself is a sequence of convolutional blocks with a
shortcut
• if you are familiar with the ResNet block Bottleneck block is
pretty similar to the ResNet block the difference is that there
is a bottleneck without a shortcut

The third block is the SPPF block
SPPF stands for SPATIAL PYRAMID POOLING FAST
it is a modification of SP of SPATIAL PYRAMID POOLING
with a higher speed
inside the SPPF there are
• convolutional block at the beginning and
• followed by 3 2D MAX POOLLING Layers
• the interesting part is that every resulting Feature map is
concatenated right before the end of SPPF is ends with a
convolution of block

The Last Block Is The Detect Block
• this is where the detection happens
• Differently from the previous YOLO version YOLOv8 is an anchor-free model
• the predictions happen in the grid cell.
• the detect block contains two tracks
1. the first track is for bonding box prediction
2. whereas the other is for class prediction
• both tracks have the same block sequence which is two convolutional blocks and a single
2D convolutional layer

THE FUNDAMENTAL COMPONENTS OF A CONVOLUTIONAL NEURAL NETWORK | [1] KERNEL
• the kernel is a two-dimensional array kernels are usually
called feature detectors
• the value in the kernel is weights that can be updated
during the training process
• kernel will move across the image and perform a DOT
operation between the input and the value of the kernel to
produce an Output
• The output is also known as a feature map

THE FUNDAMENTAL COMPONENTS OF A CONVOLUTIONAL NEURAL NETWORK | [2] STRID
• The Stride is defined as the displacement distance
during the convolution process.
• the smaller the resulting output the larger the strip
convolution with stride one is demonstrated in this
example

THE FUNDAMENTAL COMPONENTS OF A CONVOLUTIONAL NEURAL NETWORK | [3] PADDING
• The padding adds value to the uttermost element of the image in other words,
there are several types of padding

• The padding adds value to the uttermost element of the image in other words, there are
several types of padding
1. The zeros padding is the default padding type in the zeros padding the pended pick
will have a value of zero.

• the padding adds value to the uttermost element of the image in other words, there are
several types of padding
2. the replication padding the padded pixels will have the same value as the closest real
pixel the panic corners will have the same value as the real Corners

The
YOLOv8
architecture
Backbone
is the Deep learning architecture that
basically acts as a feature extractor
Neck
combines the features acquired from
the various layers of the backbone
model
Head
predicts the classes and bounding box
regions which is the final output
produced by the object detection
model

However, the neck is not explicitly mentioned in YOLOv8 the term neck is only written in the official YOLOv8
documentation

on the YOLOv8 architecture file yolo. yml there are only two
parts the backbone and the head

Next, I will explain the whole YOLOv8
Architecture This architecture drawing is based
on the YOLOv8 architecture file YOLOv8. yml
which is located in the model’s v8 folder it is
also heavily inspired by the drawing from
Range King a GitHub user who posted an issue
in the Yolo GitHub repository we made some
modifications to the drawing to make it more
readable and align with the yo8 source code
itself the explanation of the architecture

• Begins With An Explanation Of The Three Parameters That
Define The Yolov8 These Parameters Are
• Depth Multiple, Width Multiple And Max Channels The
Depth Multiple Parameters Determine How Many Bottleneck
Blocks Are In The C2F Block
• The Width Multiple And Max Channels Parameters Determine
The Output Channel
• The YOLO At The Input Is An Image With Three Channels
• Next The Backbone The Name Of The Backbone In Yolo Is Not
Stated Directly On The Backbone Each Backbone Is Made Up
Of Numerous Convolution Layers That Extract Distinct Features
At Various Resolution Levels

Before Continuing On The Explanation Of The Layers On The Backbone I Will Explain The Numbering On The Yolov8 Architecture Each
Numbering Is Based On The Architecture File Which Is Yolo.yml, numbering starts from the backbone section and starts from zero for example
this convolution block is the first block in the architecture so we assign it to the number zero and draw the block is s below this numbering
continues until the last c2f block

This backbone begins with
• Two convolutional blocks with
• kernel size three, stride size two,
and padding one
• The special resolution of the output is
reduced when stride two is used.
• for example if the input resolution in
the first convolution of the block is
640x 640 the output resolution after
processing will be 320 by 320.
• To obtain the output Channel use the
following formula this formula is
obtained for the code in the tasks.

• To obtain the output Channel use the following
formula this formula is obtained for the code in
the tasks.
• to obtain the output Channel use the following
formula this formula is obtained for the code in
the tasks. py
(1)
(2)

• First we find the minimum value between the
base output Channel and Max channels the
minimum value is then multiplied by the width
multiple parameters
• For example we will calculate the first
convolution of the blocks output Channel using
the YOLOv8 variant with a width multiple of
one and a Max channels of 512.
• The base output channel in the first
convolutional block is 64 so here is the
calculation first we find the minimum value
between 64 and 512 then multiply by 1 the
result is 64. 64 is the output channel in the first
convolutional block.
• If you use the YOLO you can analyze the
second convolutional block in the same way as
the first one

• The base output channel in the first convolutional block is 64 so here is the calculation first we find the minimum value
between 64 and 512 then multiply by 1 the result is 64. 64 is the output channel in the first convolutional block.
• If you use the YOLO you can analyze the second convolutional block in the same way as the first one
W= width multiple
mc= Max channels

Next is THE C2F BLOCK
Contains two parameters shortcut and N
• The shortcut parameter in this block is true indicating that
the shortcut will be used on the bottleneck block
• Whereas n determines how many bottleneck blocks are used
• The N value is calculated by multiplying the depth multiple
value by 3.
• Next is another convolutional block with a kernel=3,
strip=2 and padding =1 .
• The C2F BLOCK comes next with the shortcut
parameter=True and N = 6 multiplied by the depth multiple.
• The output of this block is also connected to the concat
block
• Next is another convolutional block with a kernel =3 stride
=2 and padding =1
• And then another C2F BLOCK with the shortcut =True and
N =6 multiped by the depth multiple this Block's output is
also connected to the concat block
• Next there is another convolutional block with a kernel
=3,stride =2 and padding=1
• After that there is C2F BLOCK with the shortcut =True and
N=3 multiplied by the depth multiple this block will be
connected to SPPF

SPATIAL PYRAMID POOLING FAST is used after
the last convolution layer on the backbone the main
function of the SPPF is to generate a fixed feature
representation of objects of various sizes in an image
without resizing the image or introducing special
information loss

The neck first there is
• The upsampling layer this layer is used to increase the
feature map resolution of the SPPF to match with the
feature map resolution of this C2F block
• The upsample feature map will be combined with the
features from this C2F block using CONCAP when
using CONCAP the number of channels is summed up
whereas the resolution is unchanged
• For example, we will compute the concatenation of this
C2F block feature map and this upsample feature Map
we use the YOLOv8 varient the output of this C2F
block is 40x 40 x 512 and the upsample output is 40x
40 x 512 the result of the concatenation is 40x40 by
1,24

the following is C2F BLOCK (12) on the neck C2F BLOCK does not employ a shortcut and the value of N=3 multiplied by the
depth multiple
the resolution of the C2F BLOCK feature map will be upsampled (13) to match the resolution of the feature map of this C2F
BLOCK(4) using CONCAP(14) the upsample feature map will be combined with the features from this C2F BLOCK(12)

Next, there is another C2F BLOCK(15) this block will reduce the channel size of the feature map. the feature map of this block
will be used as an input for the DETECT BLOCK this detect block is specialized for detecting small objects the output of this
block is also used as an input to this convolutional block(16,P3) the convolutional block uses a kernel =3 stride=2 and
padding =1, the resolution of the feature map will be reduced by half using this block furthermore CONCAT(17) will be used
to combine the feature map from this convolutional block(16, P3) with the feature map from C2F BLOCK(12)

Next, there is another C2F BLOCK (18) this block will
reduce the channel size of the feature map.
the feature map of this block will be used as input for the
DETECT BLOCK this detect block is specialized for
detecting medium-sized objects
the output of this block is also used as input to this
convolutional block(19) the convolutional block uses a
kernel=3, stride =2 and padding=1.
next CONCAT(20) will be used to combine the feature map
from convolutional block(19) with the feature map from
SPPF(9) block
finally there is another C2F BLOCK (31) this Block's feature
map will be utilized as an input for the DETECT BLOCK
this detect block is specialized for detecting large

objects that's all the explanation about YOLO
architecture.

شرح تفصيلي لهندسة YOLOv8 - انهيار كامل.pptx

Recommended

Recommended

More Related Content

Similar to شرح تفصيلي لهندسة YOLOv8 - انهيار كامل.pptx

Similar to شرح تفصيلي لهندسة YOLOv8 - انهيار كامل.pptx (20)

More from ِِِAhmed R. A. Shamsan

More from ِِِAhmed R. A. Shamsan (20)

Recently uploaded

Recently uploaded (20)

شرح تفصيلي لهندسة YOLOv8 - انهيار كامل.pptx

Editor's Notes