Kintinuous review

Real-time large scale dense
RGB-D SLAM with volumetric
fusion
March 20, 2017
Dong-Won Shin
T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J. Leonard, and J. McDonald, “Real-time large-scale dense RGB-D SLAM with volumetric fusion,” Int. J. Rob. Res.,
vol. 34, no. 4–5, pp. 598–626, Apr. 2014.

• Extended scale volumetric fusion
• Volume representation
• Volume shifting
• Camera pose estimation
• Geometric camera pose estimation
• Photometric camera pose estimation
• Combined camera pose estimation
• Loop closure
• Pose graph
• Place recognition
• Space deformation
• Optimisation
Contents
2

• KinectFusion
• Reconstructions of an unprecedented quality at real-time speeds
• Drawbacks
Problem Statement
4
Restriction to a fixed
small area in space
No means of explicitly
incorporating loop
closures
Reliance on geometric
information alone for
camera pose estimation

Contributions
5
Representing the
volumetric
reconstruction data
structure in memory
with
a rolling cyclical buffer
Optimizing the dense
map by means of a non-
rigid space deformation
parameterized by a loop
closure constraint
Estimating a dense
photometric camera
constraint in conjunction
with a dense geometric
constraint and jointly
optimizing for a camera
pose estimate
• Kintinuous
• Spatially extended version of KinectFusion
• Advantages

• Flowchart
System Architecture
6

• Volume representation
• Truncated Signed Distance Function (TSDF)
• Raycasting
• Finding a zero-crossing along the ray
Extended Scale Volumetric Fusion
7
𝑠𝑑𝑓𝑖 = 𝑡𝑖 − 𝑣𝑔 − 𝐷𝑖(𝑝)
If (𝑠𝑑𝑓𝑖 > 0) then
𝑡𝑠𝑑𝑓𝑖 = min(1, 𝑠𝑑𝑓𝑖/ max truncation)
Else
𝑡𝑠𝑑𝑓𝑖 = max(−1, 𝑠𝑑𝑓𝑖/ min truncation)
campositive
value
negative
value
surface
Depth
image
Kintinuous_Code_Review/20170309212313

• Structure like a cyclical buffer which virtually translates as the camera moves
through an environment
• It is parameterised by an integer movement threshold 𝑚 𝑠, defining the cubic
movement boundary, around 𝑔𝑖 which upon crossing, causes a volume shift.
Volume Shifting
8

• Animation
• If we want to reconstruct the indoor 3D scene, TSDF volume would move like this.
Volume Shifting
9

• Animation
• Volume-oriented representation
Volume Shifting
10

• Animation
Volume Shifting
11
Convert to
point cloud
slice
& save it
to the memory
empty

• Animation
Volume Shifting
12
Convert to
point cloud
slice
& save it
to the memory
Fill a new volume
Apply it to y and z axis also

• Camera pose
• Motion parameters ξ: Translation + Rotation
• 6 DOF
• A number of volumetric fusion system use only depth information for CPE
• Real-time 3d reconstruction in dynamic scenes using point-based fusion
• KinectFusion: Real-time Dense Surface Mapping and Tracking
• Real-time camera tracking and 3d reconstruction using signed distance functions
• Scalable real-time volumetric surface reconstruction
• Problems of a reliance on geometric information alone for CPE
• Inability to function in corridor-like environments
• Scenes with few 3D features
• More robust pose estimate in more challenging scenes
• Dense geometric camera pose constraints
• Dense photometric constraints
Camera Pose Estimation (CPE)
16

• Point-to-plane error between vertices in the current depth frame and the predicted
raycast surface
• Correspondence finding: Projective data association
Geometric CPE
17 Kintinuous_Code_Review/20170309222011

• Linearizing the transformation around the identity
• 6X6 system of normal equations
• Cholesky decomposition to yield 𝜉
• Three level coarse-to-fine depth map pyramid scheme
Geometric CPE
18

Point-to-Plane Algorithm
• Minimize a perpendicular distance from the source point to tangent plane of
destination point
• Nonlinear least square algorithm using Levenberg-Marquardt method
𝑠𝑖 = (𝑠𝑖𝑥, 𝑠𝑖𝑦, 𝑠𝑖𝑧, 1) 𝑇
𝑑𝑖 = (𝑑𝑖𝑥, 𝑑𝑖𝑦, 𝑑𝑖𝑧, 1) 𝑇
𝑛𝑖 = (𝑛𝑖𝑥, 𝑛𝑖𝑦, 𝑛𝑖𝑧, 0) 𝑇
Source point s
Destination point d
Unit normal vector at d
8K. L. Low, “Linear least-squares optimization for point-to-plane icp surface registration,” Chapel Hill, 2004.

• Transformation matrix M
• Least square problem
• 6-DOF (𝛼, 𝛽, 𝛾, 𝑡 𝑥, 𝑡 𝑦, 𝑡 𝑧)
• However, In case of 𝛼, 𝛽, 𝛾, it is a nonlinear trigonometric function
• Linear approximation is needed
20
where
𝑀 𝑜𝑝𝑡 = arg min
𝑀
𝑖
((𝑀 ∙ 𝑠𝑖 − 𝑑𝑖) ∙ 𝑛𝑖)2

• Approximated Transformation Matrix 𝑴
• Linearized expression for the i-th correspondence
21
𝛼
𝛽
𝛾
𝑡 𝑥
𝑡 𝑦
𝑡 𝑧
(𝑛𝑖𝑧 𝑠𝑖𝑦 − 𝑛𝑖𝑧 𝑠𝑖𝑦)
(𝑛𝑖𝑥 𝑠𝑖𝑧 − 𝑛𝑖𝑧 𝑠𝑖𝑥)
(𝑛𝑖𝑦 𝑠𝑖𝑥 − 𝑛𝑖𝑥 𝑠𝑖𝑦)
𝑛𝑖𝑥
𝑛𝑖𝑦
𝑛𝑖𝑧
T
−(𝑛𝑖𝑥 𝑑𝑖𝑥 + 𝑛𝑖𝑦 𝑑𝑖𝑦 + 𝑛𝑖𝑧 𝑑𝑖𝑧 − 𝑛𝑖𝑥 𝑠𝑖𝑥 − 𝑛𝑖𝑦 𝑠𝑖𝑦 − 𝑛𝑖𝑧 𝑠𝑖𝑧)=
=

• Expand to N correspondences
• Modified form to the general least square problem
• Optimum solution 𝑥 𝑜𝑝𝑡
• Iteratively perform the Levenberg-Marquardt optimization until it converges
22
where

• Given two consecutive RGB-D frames and ,
• Compute a rigid camera transformation between the two frames that maximises
photo consistency.
•
• ℒ = the list of valid interest points
• T = the current estimate of the transformation from 𝐼 𝑛 to 𝐼 𝑛−1
Photometric CPE
23
𝐼 = (𝑟𝑔𝑏 𝑅
∗ 0.299 + 𝑟𝑔𝑏 𝐺
∗ 0.587+𝑟𝑔𝑏 𝐵
∗ 0.114)
Reprojectionprojection
Transformation
𝐼 𝑛−1
𝐼 𝑛
Reprojectionprojection
Transformation

• The sum of the RGB-D and ICP cost
Combined CPE
where

• Problem
• Like all egomotion estimation systems, drift will accumulate over space and time
• Simple approach
• Associate each vertex in the mesh with the nearest camera pose
• Optimize the pose graph
• Reflect the camera pose transformations in the mesh vertices
• Another problem
• Sharp discontinuities at points on the surface where the association between camera
poses changes
• Ignores other important properties of the surface
• Solution
• Loop closure constraint
• Non-rigid method of correcting the map (deformation)
Loop Closure & Deformation
25

• Speeded Up Robust Feature (SURF) descriptors
• Bag-of-word based DBoW loop detector
• DBoW (Database Bag of Words)
• an open source C++ library for indexing and converting images into a bag-of-word
representation
• a hierarchical tree for approximating nearest neighbours in the image feature space and
creating a visual vocabulary.
• an image database with inverted and direct files to index images and enabling quick
queries and feature comparisons
Place Recognition
26
D. Gálvez-López and J. D. Tardós, “Bags of binary words for fast place recognition in image sequences,” IEEE Trans. Robot., vol. 28, no. 5, pp. 1188–1197,
2012.

• Adding every RGB-D frame to the place recognition system is non-optimal
• utilise a movement metric sensitive to both rotation and translation which indicates when
to add a new frame to the place recognition system
• If the metric is above some threshold 𝑚 𝑝, a new frame is added.
• Empirically, 𝑚 𝑝=0.3 provides a good performance.
• Computes a set of SURF keypoints and associated descriptors
• Depth image 𝑑𝑖 is also cached in the memory by real-time lossless compression
• The existing bag-of-words descriptor database is queried
Place Recognition
27
where

• SURF correspondence threshold
• FLANN
• RANSAC transformation estimation
• Given graph G and depth image 𝑑 𝑚,
• Approximate a 6-DOF relative transformation between the camera pose of frames I and
m using RANSAC-based 3-point algorithm
• Point cloud ICP
• Perform a non-linear ICP step between 𝑑𝑖 and 𝑑 𝑚
• Downsampling of each point cloud using a voxel grid filter
• Accept the final refined transformation if ithe mean 𝐿2
2
-norm of all correspondence errors
is below a threshold
• Empirically, the threshold = 0.01
• Once a loop closure candidate has passed all of the described tests, the relative
transformation constraint between the two camera poses is added to the pose
graph maintained by the iSAM module
Place Recognition

• Non-rigid space deformation of the map
• Deformation graph
• Each node 𝑁𝑙 has an associated position 𝑁𝑙
𝑔
and set of neighbouring nodes N(𝑁𝑙)
• Each node also stores an affine transformation in the form of a 3X3 matrix 𝑁𝑙
𝑅
and a 3X1
vector 𝑁𝑙
𝑡
Space Deformation
29 R. W. Sumner, J. Schmid, and M. Pauly, “Embedded deformation for shape manipulation,” ACM Trans. Graph., vol. 26, no. 3, p. 80, 2007.

• Pose graph optimisation
• Carried out by the iSAM framework
• Map deformation
• cost functions over the deformation graph
• 1) maximising rigidity in the deformation
• 2) Regularisation term
• 3) Constraint term that minimises the error on a set of user specified vertex position constraints
Q
Optimisation
32
Graph node
neighbors
[𝑁𝑙
𝑅
, 𝑁𝑙
𝑡
, 𝑁𝑙
𝑔
]
[𝑁 𝑛
𝑡
, 𝑁 𝑛
𝑔
]

• Final cost function
•
• Optimized by Gauss-Newton algorithm
• The Jacobian matrix in these problem is sparse
• Cholesky factorisation
• Then apply the optimised deformation graph N to all vertices over all cloud slices C
Optimisation

• Trajectory estimation
• RGB-D Dataset from TUM (http://vision.in.tum.de/data/datasets/rgbd-dataset/download)
• Absolute trajectory RMSE
• measures the root-mean-square of the Euclidean distances between all estimated camera poses
and the ground truth poses associated by timestamp
Evaluation
34
Two dimensional plot of estimated trajectories versus
ground truth trajectories on evaluated sequences.

• Statistics on ATE on the datasets
• Mean over the ten runs
• Comparative evaluation
• DVO SLAM
• RGB-D SLAM
• Multi-resolution surfel maps (MRS)
• High score on a camera trajectory benchmark does not always imply a high quality
surface reconstruction due to the frame-to-model tracking component of the system.
Evaluation
35

• Surface reconstruction comparison
Evaluation
36
KintinuousRGB-D SLAM
Point
Cloud
Volume
Model

• Keyframe reprojection comparison
Evaluation
37
KintinuousDVO SLAM
Point
Cloud

• Real-time dense SLAM system
• Frontend for camera pose estimation and surface reconstruction
• Backend for a non-rigid map deformation and the loop closure
• Extensive evaluation
• Both quantitatively and qualitatively on common benchmarks
• Ability to produce large scale dense globally consistent maps in real-time
• Limitation
• Reliance on projective data association for camera pose estimation
• Future work
• Real-time large scale dense fused 3D reconstruction which supports online drift
correction
• Globally consistent representation of the map at any time
and allows map re-use and re-fusing
Conclusion
40

• Kintinuous Code Review
• https://dongwonshin.blog/2017/02/24/paper-review-real-time-large-scale-dense-rgb-d-
slam-with-volumetric-fusion/
Appendix
41

Kintinuous review

More Related Content

What's hot

Similar to Kintinuous review

Recently uploaded

Kintinuous review

Editor's Notes