Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018

287 views

Published on

https://telecombcn-dl.github.io/2018-dlcv/

Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks and Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles and applications of deep learning to computer vision problems, such as image classification, object detection or image captioning.

Published in: Data & Analytics
  • Be the first to comment

Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018

  1. 1. Prof. Laura Leal-Taixé Multiple object tracking
  2. 2. Dynamic Scene Understanding Understand every pixel of a video road tree Semantic segmentation person car
  3. 3. Dynamic Scene Understanding Understand every pixel of a video person 1 tree Semantic segmentation person 2 person 3 Instance- based segmentation road car
  4. 4. Dynamic Scene Understanding Understand every pixel of a video Semantic segmentation Instance- based segmentation Multiple object tracking
  5. 5. Multiple object tracking Goal: detect and track all objects in a scene
  6. 6. Person detection Tracking-by-detection Video sequence
  7. 7. Person detection Tracking-by-detection Video sequence
  8. 8. Tracking with network flows Video sequence Graphical model Node
  9. 9. Tracking with network flows L. Leal-Taixé et al. ICCVW2011 Solving the Minimum Cost Flow Problem
  10. 10. Tracking with Linear Programming • Objective function T = arg min T i,j C(i, j)f(i, j) T = arg min T i,j C(i, j)f(i, j) L. Leal-Taixé et al. ICCVW2011 Solving the Minimum Cost Flow Problem
  11. 11. • Objective function Tracking with Linear Programming T = arg min T i,j C(i, j)f(i, j) Indicator {0,1} Costs – what will drive the tracking C C
  12. 12. Measuring bounding box similarity Classic measures • Bounding box distance • Appearance similarity Proposed measures Modeling pedestrian motion and interactions
  13. 13. Modeling pedestrian interaction Using a physics- based model of crowd motion Learning an image-based motion context Learning appearance and interactions with Deep Learning
  14. 14. Modeling pedestrian interaction Using a physics- based model of crowd motion Learning an image-based motion context Learning appearance and interactions with Deep Learning
  15. 15. Modeling pedestrian interaction Using a physics- based model of crowd motion Learning an image-based motion context Learning appearance and interactions with Deep Learning
  16. 16. The effect of undetected pedestrians Image Optical flow
  17. 17. Learning an image-based motion context Image Features Estimation of the velocity L. Leal-Taixé, M. Fenzi, A. Kuznetsova, B. Rosenhahn, S. Savarese. CVPR 2014. Hough forests
  18. 18. Pedestrian velocity estimation Histogram of the errors Optical Flow Social Force Model Proposed
  19. 19. Modeling pedestrian interaction Using a physics- based model of crowd motion Learning an image-based motion context Learning appearance and interactions with Deep Learning
  20. 20. Learning appearance and interactions Linear Programming Predicted associations t t+1Siamese neural network L. Leal-Taixé, C. Canton, K. Schindler. CVPRW 2016
  21. 21. Design Stacking 10x121x53I2 O2 Localspatio-temporal features Final Matching Prediction Gradient Boosting Classifier Contextual features Height, width, distance, … C1 C2 C3 F4 F5 F6 F7
  22. 22. Improvement directions • Better appearance models – Ristani & Tomasi. Features for Multi-Target Multi-Camera Tracking and Re-Identification. CVPR 2018 • Multi-detector fusion – Henschel et al. Fusion of Head and Full-Body Detectors for Multi- Object Tracking. CVPRW 2018.
  23. 23. Can we learn it all? • Problem 1: dimensionality of the output – Neural networks can handle fixed sized outputs (tensors) – Tracking: • Unknown number of detections per frame • Unknown length of each trajectory • Solution 1: see the Set Learning lecture • Solution 2: recursive
  24. 24. Can we learn it all? • Solution 2: recursive – Using a Recurrent Neural Network to predict trajectory properties – Birth/death, transition à motion model • Does not work well…. • Is it likely that more data is needed to train such a motion model? Milan et al. Online Multi-Target Tracking Using Recurrent Neural Networks. AAAI 2017
  25. 25. Can we learn it all? • Our Solution 2: recursive – One trajectory at a time, one frame at a time
  26. 26. Can we learn it all? Girshick Fast R-CNN. ICCV 2015
  27. 27. Can we learn it all? • We use the detections from the last frame as proposals • Where did the detection with ID 1 go in the next frame? P. Bergmann. Master Thesis. TUM 2018
  28. 28. Pros and cons • PRO We can reuse an extremely well-trained regressor – We get well-positioned bounding boxes • CON The regressor only shifts the box by a small quantity – We need to compensate for large camera motions à optical flow P. Bergmann. Master Thesis. TUM 2018
  29. 29. Surprisingly good results • Extremely good MOTA improvement
  30. 30. Surprisingly good results • Quite some ID switches but still good identification overall
  31. 31. Conclusions • The old assumption that objects have small displacements still holds and can be used to reach SOA performance • What is next? Hard cases. – Long occlusions – Crowded scenes
  32. 32. Prof. Laura Leal-Taixé Questions? Prof. Laura Leal-Taixé

×