Andrii Belas "Overview of object detection approaches: cases, algorithms and software"

Андрей Белас Data Scientist, SMART business
 Эксперт в области машинного обучения, публичный
спикер.
 Создатель и ментор SMART Data Science Academy, отвечаю
за техническое развитие data science команды и
архитектуру всех data science проектов SMART business.
 Microsoft Certified Professional в направлениях:
 Big Data and Advanced Analytics
 Cloud Data Science with Azure Machine Learning
 Developing SQL Data Models.
Опыт работы:
 Deep Learning
 Computer Vision
 AI in Forecasting
 AI in Marketing
 Risk management
 Business Intelligence

К
Agenda
1. Overview
2. Business cases
3. Approaches
4. Frameworks

КImage classification
Dogs vs. Cats (Kaggle)
K classes
Task: Assign the correct class label to the whole image

КClassification vs. Detection

Бизнес процесс
1. Тегирование текущего ассортимента SKU Roshen/Конкурентов
(500 SKU)
2. Тегирование нового SKU Roshen/Конкурента
Обучение нейронной сети
распознавания (4-5 часов)
Передача
модели на
устройства
мерчандайзеров
• Контроль соответствия
планограмме
• Контроль невыкладки
• Аудит конкурентных цен,
промо и планограмм
Отчеты для менеджмента в
реальном времени
Оценка и прогноз влияния планограмм
ROSHEN и конкурентов на продажи

Примеры показателей для менеджмента
Оценка доли полки
Появление нового продукта на
всех ТП, ценников и промо
Соответствие доли полки и
собственные стойки в ТП
Контроль соответствия
планограмм, ценников
и промо
Корреляция и прогноз влияния
планограмм, промо и конкурентов
на продажи в ТП (Расширенная
прогнозная аналитика)
Рейтинг точек продаж по
показателям
Рейтинг команд
мерчандайзинга по
показателям

КImageNet Large Scale Visual Recognition Challenge

КClassification
Keyboard, mouse

КObject Detection
keyboard
mouse keyboard
mouse mouse

КWhere to begin
• Data
• Detection algorithm
• Evaluation approach
• Deployment

Tips
 Train on data like ones you’ll see in production
 Label your data well (don’t miss anything)
 Avoid detecting very tiny objects in the image
 https://github.com/Microsoft/VoTT

КOpen Data and Benchmarks
• Pascal VOC (20 classes, ~10-13K images)
http://host.robots.ox.ac.uk/pascal/VOC/
• MS COCO (80 classes, 123K images)
http://mscoco.org/
• ImageNet (200 classes, >500K images)
http://image-net.org/
• Cat Annotation Dataset ( 10K annotated cat images)
http://academictorrents.com/details/145ee4e1fe1acee71b122eab522d14528bbacaf7

КEvaluation
• Compute average precision (AP) separately for each class, then average over classes A detection
is a true positive if it has IoU (Interception over Union) with a ground-truth box greater than
some threshold (usually 0.5) (AP@0.5)

Evaluation Metrics – Precision vs Recall vs mAP


0
10
20
30
40
50
60
70
80
90
100
0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1
Precision
Recall

КThe first efficient Face Detector (Viola-Jones Algorithm, 2001)
•Their demo showed faces being detected in real time on a webcam feed, was the most
stunning demonstration of computer vision and its potential at the time.
•Soon, it was implemented in OpenCV & face detection became synonymous with Viola and
Jones algorithm.
•Hand-coded features (eyes, nose, locations and interactions)
•Bad results for non-frontal/ideal faces

КMuch more efficient detection technique (Histograms of Oriented
Gradients, 2005)
• Navneet Dalal and Bill Triggs invented "HOG" for pedestrian detection
• Their feature descriptor, Histograms of Oriented Gradients (HOG), significantly
outperformed existing algorithms in this task
• Handcoded features, just like before
• For every single pixel, we want to look at the pixels that directly surrounding it

КMuch more efficient detection technique (Histograms of Oriented
Gradients, 2005)
• Goal is, how dark is current pixel compared to surrounding pixels?
• We will then draw an arrow showing in which direction the image is getting darker:
• We repeat that process for every single pixel in the image
• Every pixel is replaced by an arrow. These arrows are called gradients

КResNet and Transfer Learning

КBruteforce approach
• We can take a classifier like VGGNet or Inception and turn it into an object detector by sliding a
small window across the image
• At each step you run the classifier to get a prediction of what sort of object is inside the current
window.
• Using a sliding window gives several hundred or thousand predictions for that image, but you only
keep the ones the classifier is the most certain about.
• This approach works but it’s obviously going to be very slow, since you need to run the classifier
many times.

КA better approach, R-CNN (2015)
• R-CNN creates bounding boxes, or region proposals, using a process called Selective Search
• At a high level, Selective Search looks at the image through windows of different sizes, and for each
size tries to group together adjacent pixels by texture, color, or intensity to identify objects.

КYOLO (2016)
• YOLO takes a completely different approach.
• It’s not a traditional classifier that is repurposed to be an object detector.
• YOLO actually looks at the image just once (hence its name: You Only Look Once) but in a clever way.
• YOLO divides up the image into a grid of 13 by 13 cells

КYOLO (2016)
• Each of these cells is responsible for predicting 5 bounding boxes.
• A bounding box describes the rectangle that encloses an object.
• YOLO also outputs a confidence score that tells us how certain it is that the predicted bounding box
actually encloses some object.
• This score doesn’t say anything about what kind of object is in the box, just if the shape of the box is
any good.

КYOLO (2016)
• For each bounding box, the cell also predicts a class.
• The confidence score for the bounding box and the class prediction are combined into one final score
that tells us the probability that this bounding box contains a specific type of object.
• For example, the big fat yellow box on the left is 85% sure it contains the object “dog”:

КYOLO (2016)
• Since there are 13×13 = 169 grid cells and each cell predicts 5 bounding boxes, we end up with 845
bounding boxes in total.
• It turns out that most of these boxes will have very low confidence scores, so we only keep the boxes
whose final score is 30% or more (you can change this threshold depending on how accurate you want
the detector to be).

КYOLO (2016)
• You Only Look Once
• So we end up with 125 channels for every grid cell.
• x, y, width, height for the bounding box’s rectangle
• the confidence score
• the probability distribution over the classes

Полезные ссылки
• https://www.youtube.com/watch?v=NrmMk1Myrxc – Amazon Go
• https://github.com/Microsoft/VoTT - labeling tool
• https://youtu.be/Cgxsv1riJhI - How computers learn to recognize objects instantly | Joseph
Redmon
• https://pjreddie.com/darknet/yolo/ - YOLO Official
• https://youtu.be/VOC3huqHrss - YOLO demo
• https://github.com/thtrieu/darkflow - TensorFlow implementation
• http://cs231n.stanford.edu/ - Convolutional Neural Networks for Visual Recognition (Stanford)

Andrii Belas "Overview of object detection approaches: cases, algorithms and software"

Andrii Belas "Overview of object detection approaches: cases, algorithms and software"

More Related Content

Similar to Andrii Belas "Overview of object detection approaches: cases, algorithms and software"

More from Lviv Startup Club

Recently uploaded

Andrii Belas "Overview of object detection approaches: cases, algorithms and software"