2. Authors
• Daniel Dlužnevskij: Department of Electronic Systems, Vilnius Gediminas
Technical University, Naugarduko g. 41, LT-03227 Vilnius, Lithuania
• Pavel Stefanovič: Department of Information Systems, Vilnius Gediminas
Technical University, Saulėtekio al. 11, LT-10223 Vilnius, Lithuania
• Simona Ramanauskaitė: Department of Information Technology, Vilnius
Gediminas Technical University, Saulėtekio al. 11, LT-10223 Vilnius, Lithuania
daniel.dluznevskij@stud.vilniustech.lt, pavel.stefanovic@vilniustech.lt,
simona.ramanauskaite@vilniustech.lt
3.
4. Microsoft Common Object in Context (COCO)
The COCO dataset by Lin et al. contains 330 000 images, where more
than 200 000 images are labeled by human annotators.
(WEB (cocodataset.org),
2021)
13. iPhone 12 inference results
Model
Average time, ANE
(ms)
Average time, GPU
(ms)
Average time, CPU
(ms)
YOLOv5s Int8 77 82 80
YOLOv5m Int8 106 114 148
YOLOv5l Int8 145 181 263
YOLOv5x Int8 341 321 441
14. YOLOv5 models for the real-time inference
37
31
24
20
12
9
6
2
12
8
5
3
12
6
3
2
0 10 20 30 40
YOLOv5s
YOLOv5m
YOLOv5l
YOLOv5x
Images per second
Colab v100 iPhone 12 ANE iPhone 12 GPU iPhone 12 CPU
15.
16.
17.
18. •Use of the dedicated dataset(s) will result in better results;
•YOLOv5 proves to be suitable for mobile object detection;
•Non-optimized models are unsuitable for real-time object detection;
•Optimized models can run at up to 100 images per second on the
Apple Neural Engine.