Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Andrii Belas "Overview of object detection approaches: cases, algorithms and software"


Published on

Data Science Practice

Published in: Business
  • Be the first to comment

  • Be the first to like this

Andrii Belas "Overview of object detection approaches: cases, algorithms and software"

  1. 1. Андрей Белас Data Scientist, SMART business  Эксперт в области машинного обучения, публичный спикер.  Создатель и ментор SMART Data Science Academy, отвечаю за техническое развитие data science команды и архитектуру всех data science проектов SMART business.  Microsoft Certified Professional в направлениях:  Big Data and Advanced Analytics  Cloud Data Science with Azure Machine Learning  Developing SQL Data Models. Опыт работы:  Deep Learning  Computer Vision  AI in Forecasting  AI in Marketing  Risk management  Business Intelligence
  2. 2. К Agenda 1. Overview 2. Business cases 3. Approaches 4. Frameworks
  3. 3. КImage classification Dogs vs. Cats (Kaggle) K classes Task: Assign the correct class label to the whole image
  4. 4. КAnother good challenge
  5. 5. КClassification vs. Detection
  6. 6. КClassification vs. Detection
  7. 7. К
  8. 8. К
  9. 9. Бизнес процесс 1. Тегирование текущего ассортимента SKU Roshen/Конкурентов (500 SKU) 2. Тегирование нового SKU Roshen/Конкурента Обучение нейронной сети распознавания (4-5 часов) Передача модели на устройства мерчандайзеров • Контроль соответствия планограмме • Контроль невыкладки • Аудит конкурентных цен, промо и планограмм Отчеты для менеджмента в реальном времени Оценка и прогноз влияния планограмм ROSHEN и конкурентов на продажи
  10. 10. Примеры показателей для менеджмента Оценка доли полки Появление нового продукта на всех ТП, ценников и промо Соответствие доли полки и собственные стойки в ТП Контроль соответствия планограмм, ценников и промо Корреляция и прогноз влияния планограмм, промо и конкурентов на продажи в ТП (Расширенная прогнозная аналитика) Рейтинг точек продаж по показателям Рейтинг команд мерчандайзинга по показателям
  11. 11. К
  12. 12. КImageNet Large Scale Visual Recognition Challenge
  13. 13. КClassification Keyboard, mouse
  14. 14. КObject Detection keyboard mouse keyboard mouse mouse
  15. 15. КWhere to begin • Data • Detection algorithm • Evaluation approach • Deployment
  16. 16. Tips  Train on data like ones you’ll see in production  Label your data well (don’t miss anything)  Avoid detecting very tiny objects in the image 
  17. 17. КOpen Data and Benchmarks • Pascal VOC (20 classes, ~10-13K images) • MS COCO (80 classes, 123K images) • ImageNet (200 classes, >500K images) • Cat Annotation Dataset ( 10K annotated cat images)
  18. 18. КEvaluation • Compute average precision (AP) separately for each class, then average over classes A detection is a true positive if it has IoU (Interception over Union) with a ground-truth box greater than some threshold (usually 0.5) (AP@0.5)
  19. 19. Evaluation Metrics
  20. 20. Evaluation Metrics
  21. 21. Evaluation Metrics – Precision vs Recall vs mAP   0 10 20 30 40 50 60 70 80 90 100 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 Precision Recall
  22. 22. КThe first efficient Face Detector (Viola-Jones Algorithm, 2001) •Their demo showed faces being detected in real time on a webcam feed, was the most stunning demonstration of computer vision and its potential at the time. •Soon, it was implemented in OpenCV & face detection became synonymous with Viola and Jones algorithm. •Hand-coded features (eyes, nose, locations and interactions) •Bad results for non-frontal/ideal faces
  23. 23. КMuch more efficient detection technique (Histograms of Oriented Gradients, 2005) • Navneet Dalal and Bill Triggs invented "HOG" for pedestrian detection • Their feature descriptor, Histograms of Oriented Gradients (HOG), significantly outperformed existing algorithms in this task • Handcoded features, just like before • For every single pixel, we want to look at the pixels that directly surrounding it
  24. 24. КMuch more efficient detection technique (Histograms of Oriented Gradients, 2005) • Goal is, how dark is current pixel compared to surrounding pixels? • We will then draw an arrow showing in which direction the image is getting darker: • We repeat that process for every single pixel in the image • Every pixel is replaced by an arrow. These arrows are called gradients
  25. 25. КHOG
  26. 26. КResNet and Transfer Learning
  27. 27. КBruteforce approach • We can take a classifier like VGGNet or Inception and turn it into an object detector by sliding a small window across the image • At each step you run the classifier to get a prediction of what sort of object is inside the current window. • Using a sliding window gives several hundred or thousand predictions for that image, but you only keep the ones the classifier is the most certain about. • This approach works but it’s obviously going to be very slow, since you need to run the classifier many times.
  28. 28. КA better approach, R-CNN (2015) • R-CNN creates bounding boxes, or region proposals, using a process called Selective Search • At a high level, Selective Search looks at the image through windows of different sizes, and for each size tries to group together adjacent pixels by texture, color, or intensity to identify objects.
  29. 29. КR-CNN
  30. 30. КFast R-CNN (2015)
  31. 31. КFaster R-CNN (2016)
  32. 32. КYOLO (2016) • YOLO takes a completely different approach. • It’s not a traditional classifier that is repurposed to be an object detector. • YOLO actually looks at the image just once (hence its name: You Only Look Once) but in a clever way. • YOLO divides up the image into a grid of 13 by 13 cells
  33. 33. КYOLO (2016) • Each of these cells is responsible for predicting 5 bounding boxes. • A bounding box describes the rectangle that encloses an object. • YOLO also outputs a confidence score that tells us how certain it is that the predicted bounding box actually encloses some object. • This score doesn’t say anything about what kind of object is in the box, just if the shape of the box is any good.
  34. 34. КYOLO (2016) • For each bounding box, the cell also predicts a class. • The confidence score for the bounding box and the class prediction are combined into one final score that tells us the probability that this bounding box contains a specific type of object. • For example, the big fat yellow box on the left is 85% sure it contains the object “dog”:
  35. 35. КYOLO (2016) • Since there are 13×13 = 169 grid cells and each cell predicts 5 bounding boxes, we end up with 845 bounding boxes in total. • It turns out that most of these boxes will have very low confidence scores, so we only keep the boxes whose final score is 30% or more (you can change this threshold depending on how accurate you want the detector to be).
  36. 36. КNon-Maximum Suppression
  37. 37. КYOLO (2016) • You Only Look Once • So we end up with 125 channels for every grid cell. • x, y, width, height for the bounding box’s rectangle • the confidence score • the probability distribution over the classes
  38. 38. КYOLO
  39. 39. КYOLOv2 (v3…)
  40. 40. КSSD (2016)
  41. 41. Полезные ссылки • – Amazon Go • - labeling tool • - How computers learn to recognize objects instantly | Joseph Redmon • - YOLO Official • - YOLO demo • - TensorFlow implementation • - Convolutional Neural Networks for Visual Recognition (Stanford)
  42. 42. Questions?