(Paper Review)A versatile learning based 3D temporal tracker - scalable, robust, online

이명규A Versatile Learning-based 3D Temporal Tracker: Scalable, Robust, Online (1/29) 이명규
확장성, 견고성, 온라인
세 가지 장점을 갖춘 다용도
학습 기반 3D 트래커
A Versatile Learning based
3D Temporal Tracker -
Scalable, Robust, Online

이명규A Versatile Learning-based 3D Temporal Tracker: Scalable, Robust, Online (2/29)
AGENDA
01
02
03
04
05
06
Introduction
Contributions
Main Idea
Evaluation
Conclusions & Limitations
Additional Slides

Introduction
Part 01

↳
• Published in 2015 IEEE International Conference on
Computer Vision
• Publisher: IEEE (Electronic ISSN: 2380-7504)
• Keyword (INSPEC) : target tracking, computational complexity,
learning (artificial intelligence), object tracking, pose estimation
• Authors : David Joseph Tan, Federico Tombari,
Slobodan Ilic, Nassir Navab
Introduction
Paper Info
Part 01

Contributions
Part 02

↳
Contributions
Related Works
Part 02
• Using solely depth images
• Energy minimization(ICP)[3, 5]
• learning-based algorithm[23]
• Using RGB-D data
• hand-held object tracking[10]
• particle filter approaches[6, 13]
• Using level-set optimization[17]

↳
Contributions
Contributions
Part 02
• Novel occlusion handling strategy
• Increases the overall robustness
• Use only one depth image to create the entire learning
dataset (‘3D Online Learning’)
• Low Tracking time, Memory consumption, and computational cost.
• Scalable to track a hundred objects in real-time
• Less learning time than previous studies

Main Idea
Part 03

A Versatile Learning-based 3D Temporal Tracker: Scalable, Robust, Online (9/29) 이명규
↳ Random Forest (1/2)
• One of the ensemble techniques that improves
learning performance.
• Derive the results through voting among
multiple decision trees.
• Configure the forest by bootstrapping the data.
• Variables are used randomly without giving preference.
Main IdeaPart 03

↳ Random Forest (2/2)
• Advantages
• Voting through multiple trees prevent for overfitting.
• Both classification and regression problems can be applied.
• Disadvantages
• Learning speed is slow and prediction speed is slow in real-time.
• Do not make predictions beyond the range of learning data.
Main IdeaPart 03

↳ Object temporal tracking (overall idea)
• The objective is solving the registration problem between the
‘3D points on the object’ and the ‘3D points’.
• Predicting the 𝐓𝐓𝒕𝒕 by taking the individual values of 𝜖𝜖𝑗𝑗
𝑣𝑣
.
• Transform X𝑗𝑗 to �T𝑡𝑡＝ ∏𝑖𝑖＝０
𝑡𝑡
T𝑖𝑖.
Main IdeaPart 03
Indivisual displacement
𝝐𝝐𝒋𝒋
𝒗𝒗 �𝑻𝑻𝐭𝐭−𝟏𝟏; 𝑫𝑫𝒕𝒕
Object Transformation
from �𝑻𝑻𝐭𝐭−𝟏𝟏
Current Frame 𝑫𝑫𝒕𝒕 at time 𝒕𝒕
Learned forest
with 6𝒏𝒏𝒗𝒗 trees
Predicted
transform
parameters of 𝑻𝑻𝒕𝒕

↳
1. Among the pixels 𝒙𝒙𝒊𝒊 𝒋𝒋=𝟏𝟏
𝒏𝒏𝒋𝒋
from 𝐃𝐃𝒗𝒗 that are on the object, 𝒏𝒏𝒋𝒋 points are selected,
back-projected and transformed to the object coordinate system.
• Transformed set of points X𝑣𝑣 = 𝑥𝑥𝑖𝑖 𝑗𝑗=1
𝑛𝑛𝑗𝑗
are used to compute the displacements.
Learning from one viewpoint – Dataset
Main IdeaPart 03

↳
2. 𝛕𝛕𝐫𝐫 are randomly parametrized to compose 𝐓𝐓𝐫𝐫 and formulate �𝐓𝐓𝐫𝐫.
• Convert 𝑿𝑿𝒋𝒋 to �𝑻𝑻𝒓𝒓 and calculate displacement vector 𝝐𝝐𝒓𝒓
𝒗𝒗
= 𝝐𝝐𝒋𝒋
𝒗𝒗 �𝑻𝑻𝒓𝒓; 𝑫𝑫𝒗𝒗 𝒋𝒋=𝟏𝟏
𝒏𝒏𝒋𝒋
.
3. Construct a learning data set 𝓢𝓢 = 𝝐𝝐𝒓𝒓
𝒗𝒗
, 𝝉𝝉𝒓𝒓 𝒓𝒓=𝟏𝟏
𝒏𝒏𝒓𝒓
by accumulating
𝝐𝝐𝒓𝒓
𝒗𝒗
and 𝝉𝝉𝒓𝒓 with random parameter 𝒏𝒏𝒓𝒓.
Learning from one viewpoint - Dataset
Main IdeaPart 03

↳
4. The objective is to split using ϵ while optimizing a parameter in τ
to make the values more coherent
• ϵ < Threshold → 𝒮𝒮𝑙𝑙, ϵ > Threshold → 𝒮𝒮𝑟𝑟
5. Testing best split using Information Gain function.
• 𝐺𝐺 = ϵ 𝒮𝒮𝑁𝑁 − ∑𝑖𝑖∈ 𝑙𝑙,𝑟𝑟
𝒮𝒮𝑖𝑖
𝒮𝒮 𝑁𝑁
𝜖𝜖 𝒮𝒮𝑖𝑖 , highest information gain gives the best split
Learning from one viewpoint – Dividing Tree
Main IdeaPart 03

↳
6. Tree stops growing
• Tree stops growing the size of the inherited learning dataset is too small or
• Standard deviation of the parameter is less than a threshold.
• The node set to be a leaf and stores the mean and standard deviation of the
parameter.
7. Iteration with 𝑛𝑛𝑣𝑣 views of the object
Learning from one viewpoint – Dividing Tree
Main IdeaPart 03

Evaluation
Part 04

↳
• Using three benchmark datasets [6, 11, 23] to evaluate the
robustness of algorithm
1. Using data set [11] to compare ‘CT[23]’ and ‘Section 3’ of this paper.
2. Comparing the results of ‘this paper’ with
‘RGB-D filter approach [6, 13, 20]’ with data set [6].
3. Using the benchmark data set of CT[23] to compare the
robustness of using ‘depth image only’.
Robustness
EvaluationPart 04

↳
• Optimal parameters for learning include 642 camera views labeled as
2500 pairs of samples
• It shows the lowest error value when 10 iteration rounds.
Robustness – Optimum Parameters
EvaluationPart 04

↳
• Evaluation via publicly available Synthetic
Dataset [6]
• Each object consists of 1000 RGB-D images with
ground truth results along with the model.
• 0.01mm better translation and 1.01 better rotation,
but notice that this study uses only depth image.
• Only use the object's model without any
prior knowledge of the environment.
Robustness – Synthetic Dataset
EvaluationPart 04

↳
• Evaluate robustness using only actual depth image
• Each sequence consists of 400 RGB-D images and a
ground truth pose tracked using a marker board.
• ICP is trapped in a Local Minimum in (c), but CT [23]
and the results of this paper track the cat well.
• CT [23] fails to track in severe occlusion as in (e),
but this paper succeeds in tracking successfully.
Robustness – Real Dataset
EvaluationPart 04

↳
• Sec 3.1 runs at 1.5ms per frame on a single core Intel I7 Core CPU.
• The memory increases linearly with the number of camera views and the
size of the training data set.
• CT [23] requires 821.3 MB per forest, but only 7.4 MB is needed in this paper.
• Tracking 108 objects shows 33.7ms(30fps) on an 8-core CPU with
799MB of memory usage.
Computational Cost & Learning Time (1/2)
EvaluationPart 04

↳
• The learning time is linearly related to the number of camera views and
the size of the training data set.
• Sec 3.1 with Optimum parameters takes 31.8 seconds. (run on 8-core CPU)
• Learning on 2500 pairs of samples and labels with 642 camera views.
• CT [23] takes 12.3 hours.
Computational Cost & Learning Time (1/2)
EvaluationPart 04

↳
• Evaluate online learning through data set in [6].
• The initial frame starts learning through the ground truth transformation.
• It takes 1.3 seconds to learn with 50 trees per parameter.
• Subsequent frames likewise continue to learn one tree per parameter,
taking 25.6 ms per frame.
• With 8 Core CPU, both learning and tracking take 26.8ms
per frame.
• Tracking is possible even the object's model is not at hand
(e.g. head pose estimation in Fig. 1(c).)
Learning Time – Online Learning
EvaluationPart 04

Conclusions & Limitations
Part 05

↳
• Real-time, scalable and robust 3D tracking algorithm
• Can be employed both in model-based as well as in
online 3D tracking.
• Flexible and versatile so to adapt to a variety of
3D tracking applications.
Conclusions
Conclusions & LimitationsPart 05

↳
• Highly symmetric objects loses some degrees of freedom.
• Rotation around its axis of symmetry is ambiguous
when viewed from depth image
• Fails to estimate the full 3D pose with six degrees of freedom
• Tree is not learned enough if there is a problem with the initial
frame.
• Large holes or occlusions in the initial frames create problems.
Limitations
Conclusions & LimitationsPart 05

Thank you for Listening.
Email : brstar96@naver.com (or brstar96@soongsil.ac.kr)
Mobile : +82-10-8234-3179

(Paper Review)A versatile learning based 3D temporal tracker - scalable, robust, online

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to (Paper Review)A versatile learning based 3D temporal tracker - scalable, robust, online

Similar to (Paper Review)A versatile learning based 3D temporal tracker - scalable, robust, online (20)

More from MYEONGGYU LEE

More from MYEONGGYU LEE (17)

Recently uploaded

Recently uploaded (20)

(Paper Review)A versatile learning based 3D temporal tracker - scalable, robust, online