SlideShare a Scribd company logo
1 of 51
K U L A
You only look once:
Unified, Real-time Object Detection
by Joseph Redmon, Santosh Divvala,
Ross Girshick, Ali Farhadi (CVPR 2016)
K U L A
from deepsystems.io
Pascal VOC2007 test sample results.
K U L A
Main Concept
* Object Detection
* Regression problem
* YOLO
* Only One Feedforward
* Global context
* Unified (Real-time detection)
* YOLO: 45 FPS
* Fast YOLO: 155 FPS
* General representation
* Robust on various background
* Other domain
K U L A
Previous Works: Repurpose classifier to perform detectio
Deformable Parts Models (DPM)
• Sliding window
R-CNN based methods
1) generate potential bounding boxes.
2) run classifiers on these proposed
boxes
3) post-processing (refinement,
elimination, rescore)
K U L A
Object detection as Regression Problem
YOLO: Single Regression Problem
Image → bounding box coordinate and class probability.
* Extremely Fast
* Global reasoning
* Generalizable representation
K U L A
Unified Detection
• All BBox, All classes
1) Image → S x S grids
2) Grid cell
→ B: BBoxes and Confidence score
x, y, w, h, confidence
→ C: class probabilities w.r.t #classes
K U L A
Unified Detection
• Predict one set of class
probabilities per grid cell,
regardless of the number of
boxes B.
• At test time,
individual box confidence
prediction
K U L A
Network Design
• Modified GoogLeNet
• 1x1 reduction layer (“Network in Network”)
K U L A
How it works?
from deepsystems.io
K U L A
from deepsystems.io
How it works?
K U L A
from deepsystems.io
How it works?
K U L A
from deepsystems.io
How it works?
K U L A
from deepsystems.io
How it works?
K U L A
from deepsystems.io
How it works?
K U L A
from deepsystems.io
How it works?
K U L A
from deepsystems.io
How it works?
K U L A
from deepsystems.io
How it works?
K U L A
from deepsystems.io
How it works?
K U L A
How it works?
from deepsystems.io
Total :
7*7*2 = 98 boxes
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Look at detection procedure
from deepsystems.io
K U L A
Limitation of YOLO
from deepsystems.io
• Group of small objects
• Unusual aspect ratios
• Coarse feature
• Localization error of bounding box
K U L A
Comparison to other Real-Time Systems
from deepsystems.io
K U L A
VOC Error
from deepsystems.io
K U L A
Combining Fast R-CNN and YOLO
from deepsystems.io
K U L A
VOC 2012 Leaderboard
from deepsystems.io
K U L A
Generalizability : Person Detection in Artwork
from deepsystems.io
K U L A
Generalizability : Person Detection in Artwork
from deepsystems.io
K U L A
Key Points
from deepsystems.io
1.Fast: YOLO - 45 fps, YOLO-tiny - 155 fps.
2.End-to-end training.
3.Makes more localization errors but is less likely to
predict false positives on background
4.Performance is lower than the current state of the art.
5.Combined Fast R-CNN + YOLO model is one of the
highest performing detection
6.methods.
7.Learns very general representations of objects: it
outperforms other detection methods,
8.including DPM and R-CNN, when generalizing from
natural images to other domains
K U L A
Appendix : Loss Function (sum-squared error)
from deepsystems.io
K U L A
from deepsystems.io
Appendix : Loss Function (sum-squared error)
K U L A
from deepsystems.io
Appendix : Loss Function (sum-squared error)
K U L A
from deepsystems.io
Appendix : Intersection over Union (IoU)
• IoU(pred, truth)=[0, 1]
K U L A
from deepsystems.io
Appendix : Sum-Squared Error (SSE)
sum of squared errors of prediction (SSE), is the sum of the squares of
residuals (deviations predicted from actual empirical values of data). It is a
measure of the discrepancy between the data and an estimation model. A
small RSS indicates a tight fit of the model to the data. It is used as an
optimality criterion in parameter selection and model selection.

More Related Content

What's hot

What's hot (20)

You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Yol ov2
Yol ov2Yol ov2
Yol ov2
 
YOLO
YOLOYOLO
YOLO
 
Object detection with deep learning
Object detection with deep learningObject detection with deep learning
Object detection with deep learning
 
Object detection
Object detectionObject detection
Object detection
 
Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detection
 
Yolo
YoloYolo
Yolo
 
Single Shot Multibox Detector
Single Shot Multibox DetectorSingle Shot Multibox Detector
Single Shot Multibox Detector
 
Anatomy of YOLO - v1
Anatomy of YOLO - v1Anatomy of YOLO - v1
Anatomy of YOLO - v1
 
Object Detection and Recognition
Object Detection and Recognition Object Detection and Recognition
Object Detection and Recognition
 
Real-time object detection coz YOLO!
Real-time object detection coz YOLO!Real-time object detection coz YOLO!
Real-time object detection coz YOLO!
 
Yolov3
Yolov3Yolov3
Yolov3
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
Yolo releases gianmaria
Yolo releases gianmariaYolo releases gianmaria
Yolo releases gianmaria
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 

Similar to You Only Look Once: Unified, Real-Time Object Detection

Python metaprogramming in linear time language for automated runtime verifica...
Python metaprogramming in linear time language for automated runtime verifica...Python metaprogramming in linear time language for automated runtime verifica...
Python metaprogramming in linear time language for automated runtime verifica...
ISSEL
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
ISSEL
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
ISSEL
 
Convolutional Neural Networks CNN
Convolutional Neural Networks CNNConvolutional Neural Networks CNN
Convolutional Neural Networks CNN
Abdullah al Mamun
 

Similar to You Only Look Once: Unified, Real-Time Object Detection (20)

Deep learning based object detection
Deep learning based object detectionDeep learning based object detection
Deep learning based object detection
 
A-13 Iomp-1.pptx
A-13 Iomp-1.pptxA-13 Iomp-1.pptx
A-13 Iomp-1.pptx
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
 
Polymorphism 9
Polymorphism 9Polymorphism 9
Polymorphism 9
 
Polymorphism 9
Polymorphism 9Polymorphism 9
Polymorphism 9
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
Lec11 object-re-id
Lec11 object-re-idLec11 object-re-id
Lec11 object-re-id
 
Python metaprogramming in linear time language for automated runtime verifica...
Python metaprogramming in linear time language for automated runtime verifica...Python metaprogramming in linear time language for automated runtime verifica...
Python metaprogramming in linear time language for automated runtime verifica...
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
 
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
Μεταπρογραµµατισµός κώδικα Python σε γλώσσα γραµµικού χρόνου για αυτόµατη επα...
 
Learning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physicsLearning do discover: machine learning in high-energy physics
Learning do discover: machine learning in high-energy physics
 
20211118 AI+ Remote Sensing
20211118 AI+ Remote Sensing20211118 AI+ Remote Sensing
20211118 AI+ Remote Sensing
 
YOLO v1
YOLO v1YOLO v1
YOLO v1
 
Deep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & FutureDeep Learning Hardware: Past, Present, & Future
Deep Learning Hardware: Past, Present, & Future
 
YOLOv4: A Face Mask Detection System
YOLOv4: A Face Mask Detection SystemYOLOv4: A Face Mask Detection System
YOLOv4: A Face Mask Detection System
 
Real Time Object Detection System with YOLO and CNN Models: A Review
Real Time Object Detection System with YOLO and CNN Models: A ReviewReal Time Object Detection System with YOLO and CNN Models: A Review
Real Time Object Detection System with YOLO and CNN Models: A Review
 
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
“Robust Object Detection Under Dataset Shifts,” a Presentation from Arm
 
物件偵測與辨識技術
物件偵測與辨識技術物件偵測與辨識技術
物件偵測與辨識技術
 
Convolutional Neural Networks CNN
Convolutional Neural Networks CNNConvolutional Neural Networks CNN
Convolutional Neural Networks CNN
 
Real Time Sign Language Recognition Using Deep Learning
Real Time Sign Language Recognition Using Deep LearningReal Time Sign Language Recognition Using Deep Learning
Real Time Sign Language Recognition Using Deep Learning
 

Recently uploaded

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

You Only Look Once: Unified, Real-Time Object Detection

  • 1. K U L A You only look once: Unified, Real-time Object Detection by Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi (CVPR 2016)
  • 2. K U L A from deepsystems.io Pascal VOC2007 test sample results.
  • 3. K U L A Main Concept * Object Detection * Regression problem * YOLO * Only One Feedforward * Global context * Unified (Real-time detection) * YOLO: 45 FPS * Fast YOLO: 155 FPS * General representation * Robust on various background * Other domain
  • 4. K U L A Previous Works: Repurpose classifier to perform detectio Deformable Parts Models (DPM) • Sliding window R-CNN based methods 1) generate potential bounding boxes. 2) run classifiers on these proposed boxes 3) post-processing (refinement, elimination, rescore)
  • 5. K U L A Object detection as Regression Problem YOLO: Single Regression Problem Image → bounding box coordinate and class probability. * Extremely Fast * Global reasoning * Generalizable representation
  • 6. K U L A Unified Detection • All BBox, All classes 1) Image → S x S grids 2) Grid cell → B: BBoxes and Confidence score x, y, w, h, confidence → C: class probabilities w.r.t #classes
  • 7. K U L A Unified Detection • Predict one set of class probabilities per grid cell, regardless of the number of boxes B. • At test time, individual box confidence prediction
  • 8. K U L A Network Design • Modified GoogLeNet • 1x1 reduction layer (“Network in Network”)
  • 9. K U L A How it works? from deepsystems.io
  • 10. K U L A from deepsystems.io How it works?
  • 11. K U L A from deepsystems.io How it works?
  • 12. K U L A from deepsystems.io How it works?
  • 13. K U L A from deepsystems.io How it works?
  • 14. K U L A from deepsystems.io How it works?
  • 15. K U L A from deepsystems.io How it works?
  • 16. K U L A from deepsystems.io How it works?
  • 17. K U L A from deepsystems.io How it works?
  • 18. K U L A from deepsystems.io How it works?
  • 19. K U L A How it works? from deepsystems.io Total : 7*7*2 = 98 boxes
  • 20. K U L A Look at detection procedure from deepsystems.io
  • 21. K U L A Look at detection procedure from deepsystems.io
  • 22. K U L A Look at detection procedure from deepsystems.io
  • 23. K U L A Look at detection procedure from deepsystems.io
  • 24. K U L A Look at detection procedure from deepsystems.io
  • 25. K U L A Look at detection procedure from deepsystems.io
  • 26. K U L A Look at detection procedure from deepsystems.io
  • 27. K U L A Look at detection procedure from deepsystems.io
  • 28. K U L A Look at detection procedure from deepsystems.io
  • 29. K U L A Look at detection procedure from deepsystems.io
  • 30. K U L A Look at detection procedure from deepsystems.io
  • 31. K U L A Look at detection procedure from deepsystems.io
  • 32. K U L A Look at detection procedure from deepsystems.io
  • 33. K U L A Look at detection procedure from deepsystems.io
  • 34. K U L A Look at detection procedure from deepsystems.io
  • 35. K U L A Look at detection procedure from deepsystems.io
  • 36. K U L A Look at detection procedure from deepsystems.io
  • 37. K U L A Look at detection procedure from deepsystems.io
  • 38. K U L A Look at detection procedure from deepsystems.io
  • 39. K U L A Limitation of YOLO from deepsystems.io • Group of small objects • Unusual aspect ratios • Coarse feature • Localization error of bounding box
  • 40. K U L A Comparison to other Real-Time Systems from deepsystems.io
  • 41. K U L A VOC Error from deepsystems.io
  • 42. K U L A Combining Fast R-CNN and YOLO from deepsystems.io
  • 43. K U L A VOC 2012 Leaderboard from deepsystems.io
  • 44. K U L A Generalizability : Person Detection in Artwork from deepsystems.io
  • 45. K U L A Generalizability : Person Detection in Artwork from deepsystems.io
  • 46. K U L A Key Points from deepsystems.io 1.Fast: YOLO - 45 fps, YOLO-tiny - 155 fps. 2.End-to-end training. 3.Makes more localization errors but is less likely to predict false positives on background 4.Performance is lower than the current state of the art. 5.Combined Fast R-CNN + YOLO model is one of the highest performing detection 6.methods. 7.Learns very general representations of objects: it outperforms other detection methods, 8.including DPM and R-CNN, when generalizing from natural images to other domains
  • 47. K U L A Appendix : Loss Function (sum-squared error) from deepsystems.io
  • 48. K U L A from deepsystems.io Appendix : Loss Function (sum-squared error)
  • 49. K U L A from deepsystems.io Appendix : Loss Function (sum-squared error)
  • 50. K U L A from deepsystems.io Appendix : Intersection over Union (IoU) • IoU(pred, truth)=[0, 1]
  • 51. K U L A from deepsystems.io Appendix : Sum-Squared Error (SSE) sum of squared errors of prediction (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepancy between the data and an estimation model. A small RSS indicates a tight fit of the model to the data. It is used as an optimality criterion in parameter selection and model selection.