2. 2
Introduction
●
Visual Object Tracking is the task of tracking a visual
object throughout a video based on the localization of
that object in the first frame of that video.
●
It has a wide range of applications in tasks such as:
– Social Distance Tracking
– Human-robot Interaction
– Autonomous Driving
●
VOT is required to be highly efficient and accurate.
Introduction
3. 3
Datasets
●
Object Tracking Benchmarks
– ground truth bounding boxes are axes-aligned
– for example: OTB50 and OTB100
●
Visual Object Tracking challenges
– have rotated ground truth bounding boxes
– provide the orientation of an object
– have lesser background pixels in the GT bounding boxes
– used for action classification for anomaly detection
– for example: VOT2016, VOT2017, VOT2018, and VOT2019
Datasets
4. 4
Literature Review
●
SiamFC [1]
– fully convolutional Siamese network
– trained offline for the visual object detection in videos
– also been commonly used for visual object tracking
– it does not solve the rotation problem
Literature Review
5. 5
Literature Review
●
SiamMask [2]
– inspired and derived from SiamRPN, and SiamRPN++
– predicts a segmentation mask for tracking
– fits a rotated bounding box to the segmented mask with:
●
minimum area
●
the speed of 87 fps (real-time)
– used YouTube−VOS for the training
Literature Review
6. 6
Literature Review
●
Consensus based Matching and Tracking (CMT) [3]
– an efficient approach
– utilizes key points of an object to scale and rotate bounding
boxes for visual object tracking
– fails to handle objects that are deformable
Literature Review
7. 7
Literature Review
●
RAJSSC [4] – Rotation Adaptive Joint Scale-Spatial
Correlation
– uses Log-Polar transformation such that:
●
36 rotation angles are selected for the angle estimation
– is 36 times more inefficient than the Kernelized Correlation
Filters KCF tracker [5].
Literature Review
8. 8
Literature Review
●
A rotation adaptive tracking [6] method
– assumes that the rotation angle is bound to a range
●
SA − Siam [7]
– it is a SiamBM [8] model designed on top of the rotation
adaptive tracking method [6].
– this turned out to be efficient in terms of speed
– it limited the freedom of rotation for the bounding boxes to
some degrees.
Literature Review
9. 9
Proposed Architecture
●
The original SiamMask network is inefficient
– it makes comparisons of the three well known methods:
●
min-max axis-aligned rectangle
●
minimum boundary rectangle
●
optimal bounding box
– it is quite accurate to fit minimum-area rectangles to the target objects
– it fails to generate rotated bounding boxes according to the object orientation.
●
SiamMask_E, the proposed algorithm:
– uses ellipse fitting technique to fit rotated bounding boxes
– achieves angle quite well according to the orientation of an object
– is efficient and works well in real-time
15. 15
Performance Measures
1) Accuracy (A):
– the overlap between the predicted bounding boxes and the ground truth
– the greater the accuracy, the better.
2) Robustness (R):
– the number of times a tracker lost the targeted object with respect to the number of times it resumes
tracking the same object.
– the lesser the robustness, the better.
3) Expected Average Overlap (EAO):
– the compound measure that is the function of A and R.
– the more the EAO, the better.
4) Speed:
– the average number of frames of a video processed through a model
– the faster the model, the better.
22. 22
References
[1] L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. Torr. Fully-convolutional siamese networks for
object tracking. In European conference on computer vision, pages 850–865. Springer, 2016.
[2] Q. Wang, L. Zhang, L. Bertinetto, W. Hu, and P. H. Torr. Fast online object tracking and segmentation: A unifying
approach. arXiv preprint arXiv:1812.05050, 2018.
[3] G. Nebehay and R. Pflugfelder. Consensus-based matching and tracking of key points for object tracking. In IEEE
Winter Conference on Applications of Computer Vision, pages 862–869. IEEE, 2014.
[4] M. Zhang, J. Xing, J. Gao, X. Shi, Q. Wang, and W. Hu. Joint scale-spatial correlation tracking with adaptive
rotation estimation. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages
32–40, 2015.
[5] J. F. Henriques, R. Caseiro, P. Martins, and J. Batista. Highspeed tracking with kernelized correlation filters. IEEE
transactions on pattern analysis and machine intelligence, 37(3):583–596, 2014.
[6] L. Rout, D. Mishra, R. K. S. S. Gorthi, et al. Rotation adaptive visual object tracking with motion consistency. In
2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1047–1055. IEEE, 2018.
[7] A. He, C. Luo, X. Tian, and W. Zeng. A two fold siamese network for real-time object tracking. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition, pages 4834–4843, 2018.
[8] A. He, C. Luo, X. Tian, and W. Zeng. Towards a better match in siamese network based visual object tracker. In
Proceedings of the European Conference on Computer Vision (ECCV), pages 0–0, 2018.