Real-time large scale dense
RGB-D SLAM with volumetric
fusion
March 20, 2017
Dong-Won Shin
T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J. Leonard, and J. McDonald, “Real-time large-scale dense RGB-D SLAM with volumetric fusion,” Int. J. Rob. Res.,
vol. 34, no. 4–5, pp. 598–626, Apr. 2014.
• Extended scale volumetric fusion
• Volume representation
• Volume shifting
• Camera pose estimation
• Geometric camera pose estimation
• Photometric camera pose estimation
• Combined camera pose estimation
• Loop closure
• Pose graph
• Place recognition
• Space deformation
• Optimisation
Contents
2
• KinectFusion
• Reconstructions of an unprecedented quality at real-time speeds
• Drawbacks
Problem Statement
4
Restriction to a fixed
small area in space
No means of explicitly
incorporating loop
closures
Reliance on geometric
information alone for
camera pose estimation
Contributions
5
Representing the
volumetric
reconstruction data
structure in memory
with
a rolling cyclical buffer
Optimizing the dense
map by means of a non-
rigid space deformation
parameterized by a loop
closure constraint
Estimating a dense
photometric camera
constraint in conjunction
with a dense geometric
constraint and jointly
optimizing for a camera
pose estimate
• Kintinuous
• Spatially extended version of KinectFusion
• Advantages
• Flowchart
System Architecture
6
• Volume representation
• Truncated Signed Distance Function (TSDF)
• Raycasting
• Finding a zero-crossing along the ray
Extended Scale Volumetric Fusion
7
𝑠𝑑𝑓𝑖 = 𝑡𝑖 − 𝑣𝑔 − 𝐷𝑖(𝑝)
If (𝑠𝑑𝑓𝑖 > 0) then
𝑡𝑠𝑑𝑓𝑖 = min(1, 𝑠𝑑𝑓𝑖/ max truncation)
Else
𝑡𝑠𝑑𝑓𝑖 = max(−1, 𝑠𝑑𝑓𝑖/ min truncation)
campositive
value
negative
value
surface
Depth
image
Kintinuous_Code_Review/20170309212313
• Structure like a cyclical buffer which virtually translates as the camera moves
through an environment
• It is parameterised by an integer movement threshold 𝑚 𝑠, defining the cubic
movement boundary, around 𝑔𝑖 which upon crossing, causes a volume shift.
Volume Shifting
8
• Animation
• If we want to reconstruct the indoor 3D scene, TSDF volume would move like this.
Volume Shifting
9
• Animation
• Volume-oriented representation
Volume Shifting
10
• Animation
• Volume-oriented representation
Volume Shifting
11
Convert to
point cloud
slice
& save it
to the memory
empty
• Animation
• Volume-oriented representation
Volume Shifting
12
Convert to
point cloud
slice
& save it
to the memory
Fill a new volume
Apply it to y and z axis also
Kintinuous_Code_Review/20170308014243
• Camera pose
• Motion parameters ξ: Translation + Rotation
• 6 DOF
• A number of volumetric fusion system use only depth information for CPE
• Real-time 3d reconstruction in dynamic scenes using point-based fusion
• KinectFusion: Real-time Dense Surface Mapping and Tracking
• Real-time camera tracking and 3d reconstruction using signed distance functions
• Scalable real-time volumetric surface reconstruction
• Problems of a reliance on geometric information alone for CPE
• Inability to function in corridor-like environments
• Scenes with few 3D features
• More robust pose estimate in more challenging scenes
• Dense geometric camera pose constraints
• Dense photometric constraints
Camera Pose Estimation (CPE)
16
• Point-to-plane error between vertices in the current depth frame and the predicted
raycast surface
• Correspondence finding: Projective data association
Geometric CPE
17 Kintinuous_Code_Review/20170309222011
• Linearizing the transformation around the identity
• 6X6 system of normal equations
• Cholesky decomposition to yield 𝜉
• Three level coarse-to-fine depth map pyramid scheme
Geometric CPE
18
Point-to-Plane Algorithm
• Minimize a perpendicular distance from the source point to tangent plane of
destination point
• Nonlinear least square algorithm using Levenberg-Marquardt method
𝑠𝑖 = (𝑠𝑖𝑥, 𝑠𝑖𝑦, 𝑠𝑖𝑧, 1) 𝑇
𝑑𝑖 = (𝑑𝑖𝑥, 𝑑𝑖𝑦, 𝑑𝑖𝑧, 1) 𝑇
𝑛𝑖 = (𝑛𝑖𝑥, 𝑛𝑖𝑦, 𝑛𝑖𝑧, 0) 𝑇
Source point s
Destination point d
Unit normal vector at d
8K. L. Low, “Linear least-squares optimization for point-to-plane icp surface registration,” Chapel Hill, 2004.
Point-to-Plane Algorithm
• Transformation matrix M
• Least square problem
• 6-DOF (𝛼, 𝛽, 𝛾, 𝑡 𝑥, 𝑡 𝑦, 𝑡 𝑧)
• However, In case of 𝛼, 𝛽, 𝛾, it is a nonlinear trigonometric function
• Linear approximation is needed
20
where
𝑀 𝑜𝑝𝑡 = arg min
𝑀
𝑖
((𝑀 ∙ 𝑠𝑖 − 𝑑𝑖) ∙ 𝑛𝑖)2
Point-to-Plane Algorithm
• Approximated Transformation Matrix 𝑴
• Linearized expression for the i-th correspondence
21
𝛼
𝛽
𝛾
𝑡 𝑥
𝑡 𝑦
𝑡 𝑧
(𝑛𝑖𝑧 𝑠𝑖𝑦 − 𝑛𝑖𝑧 𝑠𝑖𝑦)
(𝑛𝑖𝑥 𝑠𝑖𝑧 − 𝑛𝑖𝑧 𝑠𝑖𝑥)
(𝑛𝑖𝑦 𝑠𝑖𝑥 − 𝑛𝑖𝑥 𝑠𝑖𝑦)
𝑛𝑖𝑥
𝑛𝑖𝑦
𝑛𝑖𝑧
T
−(𝑛𝑖𝑥 𝑑𝑖𝑥 + 𝑛𝑖𝑦 𝑑𝑖𝑦 + 𝑛𝑖𝑧 𝑑𝑖𝑧 − 𝑛𝑖𝑥 𝑠𝑖𝑥 − 𝑛𝑖𝑦 𝑠𝑖𝑦 − 𝑛𝑖𝑧 𝑠𝑖𝑧)=
=
Point-to-Plane Algorithm
• Expand to N correspondences
• Modified form to the general least square problem
• Optimum solution 𝑥 𝑜𝑝𝑡
• Iteratively perform the Levenberg-Marquardt optimization until it converges
22
where
Kintinuous_Code_Review/20170308014917
• Given two consecutive RGB-D frames and ,
• Compute a rigid camera transformation between the two frames that maximises
photo consistency.
•
• ℒ = the list of valid interest points
• T = the current estimate of the transformation from 𝐼 𝑛 to 𝐼 𝑛−1
Photometric CPE
23
𝐼 = (𝑟𝑔𝑏 𝑅
∗ 0.299 + 𝑟𝑔𝑏 𝐺
∗ 0.587+𝑟𝑔𝑏 𝐵
∗ 0.114)
Reprojectionprojection
Transformation
𝐼 𝑛−1
𝐼 𝑛
Reprojectionprojection
Transformation
Kintinuous_Code_Review/20170308015138
• The sum of the RGB-D and ICP cost
Combined CPE
24 Kintinuous_Code_Review/20170308015317
where
• Problem
• Like all egomotion estimation systems, drift will accumulate over space and time
• Simple approach
• Associate each vertex in the mesh with the nearest camera pose
• Optimize the pose graph
• Reflect the camera pose transformations in the mesh vertices
• Another problem
• Sharp discontinuities at points on the surface where the association between camera
poses changes
• Ignores other important properties of the surface
• Solution
• Loop closure constraint
• Non-rigid method of correcting the map (deformation)
Loop Closure & Deformation
25
• Speeded Up Robust Feature (SURF) descriptors
• Bag-of-word based DBoW loop detector
• DBoW (Database Bag of Words)
• an open source C++ library for indexing and converting images into a bag-of-word
representation
• a hierarchical tree for approximating nearest neighbours in the image feature space and
creating a visual vocabulary.
• an image database with inverted and direct files to index images and enabling quick
queries and feature comparisons
Place Recognition
26
D. Gálvez-López and J. D. Tardós, “Bags of binary words for fast place recognition in image sequences,” IEEE Trans. Robot., vol. 28, no. 5, pp. 1188–1197,
2012.
• Adding every RGB-D frame to the place recognition system is non-optimal
• utilise a movement metric sensitive to both rotation and translation which indicates when
to add a new frame to the place recognition system
• If the metric is above some threshold 𝑚 𝑝, a new frame is added.
• Empirically, 𝑚 𝑝=0.3 provides a good performance.
• Computes a set of SURF keypoints and associated descriptors
• Depth image 𝑑𝑖 is also cached in the memory by real-time lossless compression
• The existing bag-of-words descriptor database is queried
Place Recognition
27
where
• SURF correspondence threshold
• FLANN
• RANSAC transformation estimation
• Given graph G and depth image 𝑑 𝑚,
• Approximate a 6-DOF relative transformation between the camera pose of frames I and
m using RANSAC-based 3-point algorithm
• Point cloud ICP
• Perform a non-linear ICP step between 𝑑𝑖 and 𝑑 𝑚
• Downsampling of each point cloud using a voxel grid filter
• Accept the final refined transformation if ithe mean 𝐿2
2
-norm of all correspondence errors
is below a threshold
• Empirically, the threshold = 0.01
• Once a loop closure candidate has passed all of the described tests, the relative
transformation constraint between the two camera poses is added to the pose
graph maintained by the iSAM module
Place Recognition
28 Kintinuous_Code_Review/20170308015520
• Non-rigid space deformation of the map
• Deformation graph
• Each node 𝑁𝑙 has an associated position 𝑁𝑙
𝑔
and set of neighbouring nodes N(𝑁𝑙)
• Each node also stores an affine transformation in the form of a 3X3 matrix 𝑁𝑙
𝑅
and a 3X1
vector 𝑁𝑙
𝑡
Space Deformation
29 R. W. Sumner, J. Schmid, and M. Pauly, “Embedded deformation for shape manipulation,” ACM Trans. Graph., vol. 26, no. 3, p. 80, 2007.
• Pose graph optimisation
• Carried out by the iSAM framework
• Map deformation
• cost functions over the deformation graph
• 1) maximising rigidity in the deformation
• 2) Regularisation term
• 3) Constraint term that minimises the error on a set of user specified vertex position constraints
Q
Optimisation
32
Graph node
neighbors
[𝑁𝑙
𝑅
, 𝑁𝑙
𝑡
, 𝑁𝑙
𝑔
]
[𝑁 𝑛
𝑡
, 𝑁 𝑛
𝑔
]
• Final cost function
•
• Optimized by Gauss-Newton algorithm
• The Jacobian matrix in these problem is sparse
• Cholesky factorisation
• Then apply the optimised deformation graph N to all vertices over all cloud slices C
Optimisation
33 Kintinuous_Code_Review/20170308015853
• Trajectory estimation
• RGB-D Dataset from TUM (http://vision.in.tum.de/data/datasets/rgbd-dataset/download)
• Absolute trajectory RMSE
• measures the root-mean-square of the Euclidean distances between all estimated camera poses
and the ground truth poses associated by timestamp
Evaluation
34
Two dimensional plot of estimated trajectories versus
ground truth trajectories on evaluated sequences.
• Statistics on ATE on the datasets
• Mean over the ten runs
• Comparative evaluation
• DVO SLAM
• RGB-D SLAM
• Multi-resolution surfel maps (MRS)
• High score on a camera trajectory benchmark does not always imply a high quality
surface reconstruction due to the frame-to-model tracking component of the system.
Evaluation
35
• Surface reconstruction comparison
Evaluation
36
KintinuousRGB-D SLAM
Point
Cloud
Volume
Model
• Keyframe reprojection comparison
Evaluation
37
KintinuousDVO SLAM
Point
Cloud
• Real-time dense SLAM system
• Frontend for camera pose estimation and surface reconstruction
• Backend for a non-rigid map deformation and the loop closure
• Extensive evaluation
• Both quantitatively and qualitatively on common benchmarks
• Ability to produce large scale dense globally consistent maps in real-time
• Limitation
• Reliance on projective data association for camera pose estimation
• Future work
• Real-time large scale dense fused 3D reconstruction which supports online drift
correction
• Globally consistent representation of the map at any time
and allows map re-use and re-fusing
Conclusion
40
• Kintinuous Code Review
• https://dongwonshin.blog/2017/02/24/paper-review-real-time-large-scale-dense-rgb-d-
slam-with-volumetric-fusion/
Appendix
41
Thank you
42

Kintinuous review

  • 1.
    Real-time large scaledense RGB-D SLAM with volumetric fusion March 20, 2017 Dong-Won Shin T. Whelan, M. Kaess, H. Johannsson, M. Fallon, J. J. Leonard, and J. McDonald, “Real-time large-scale dense RGB-D SLAM with volumetric fusion,” Int. J. Rob. Res., vol. 34, no. 4–5, pp. 598–626, Apr. 2014.
  • 2.
    • Extended scalevolumetric fusion • Volume representation • Volume shifting • Camera pose estimation • Geometric camera pose estimation • Photometric camera pose estimation • Combined camera pose estimation • Loop closure • Pose graph • Place recognition • Space deformation • Optimisation Contents 2
  • 3.
    • KinectFusion • Reconstructionsof an unprecedented quality at real-time speeds • Drawbacks Problem Statement 4 Restriction to a fixed small area in space No means of explicitly incorporating loop closures Reliance on geometric information alone for camera pose estimation
  • 4.
    Contributions 5 Representing the volumetric reconstruction data structurein memory with a rolling cyclical buffer Optimizing the dense map by means of a non- rigid space deformation parameterized by a loop closure constraint Estimating a dense photometric camera constraint in conjunction with a dense geometric constraint and jointly optimizing for a camera pose estimate • Kintinuous • Spatially extended version of KinectFusion • Advantages
  • 5.
  • 6.
    • Volume representation •Truncated Signed Distance Function (TSDF) • Raycasting • Finding a zero-crossing along the ray Extended Scale Volumetric Fusion 7 𝑠𝑑𝑓𝑖 = 𝑡𝑖 − 𝑣𝑔 − 𝐷𝑖(𝑝) If (𝑠𝑑𝑓𝑖 > 0) then 𝑡𝑠𝑑𝑓𝑖 = min(1, 𝑠𝑑𝑓𝑖/ max truncation) Else 𝑡𝑠𝑑𝑓𝑖 = max(−1, 𝑠𝑑𝑓𝑖/ min truncation) campositive value negative value surface Depth image Kintinuous_Code_Review/20170309212313
  • 7.
    • Structure likea cyclical buffer which virtually translates as the camera moves through an environment • It is parameterised by an integer movement threshold 𝑚 𝑠, defining the cubic movement boundary, around 𝑔𝑖 which upon crossing, causes a volume shift. Volume Shifting 8
  • 8.
    • Animation • Ifwe want to reconstruct the indoor 3D scene, TSDF volume would move like this. Volume Shifting 9
  • 9.
    • Animation • Volume-orientedrepresentation Volume Shifting 10
  • 10.
    • Animation • Volume-orientedrepresentation Volume Shifting 11 Convert to point cloud slice & save it to the memory empty
  • 11.
    • Animation • Volume-orientedrepresentation Volume Shifting 12 Convert to point cloud slice & save it to the memory Fill a new volume Apply it to y and z axis also Kintinuous_Code_Review/20170308014243
  • 12.
    • Camera pose •Motion parameters ξ: Translation + Rotation • 6 DOF • A number of volumetric fusion system use only depth information for CPE • Real-time 3d reconstruction in dynamic scenes using point-based fusion • KinectFusion: Real-time Dense Surface Mapping and Tracking • Real-time camera tracking and 3d reconstruction using signed distance functions • Scalable real-time volumetric surface reconstruction • Problems of a reliance on geometric information alone for CPE • Inability to function in corridor-like environments • Scenes with few 3D features • More robust pose estimate in more challenging scenes • Dense geometric camera pose constraints • Dense photometric constraints Camera Pose Estimation (CPE) 16
  • 13.
    • Point-to-plane errorbetween vertices in the current depth frame and the predicted raycast surface • Correspondence finding: Projective data association Geometric CPE 17 Kintinuous_Code_Review/20170309222011
  • 14.
    • Linearizing thetransformation around the identity • 6X6 system of normal equations • Cholesky decomposition to yield 𝜉 • Three level coarse-to-fine depth map pyramid scheme Geometric CPE 18
  • 15.
    Point-to-Plane Algorithm • Minimizea perpendicular distance from the source point to tangent plane of destination point • Nonlinear least square algorithm using Levenberg-Marquardt method 𝑠𝑖 = (𝑠𝑖𝑥, 𝑠𝑖𝑦, 𝑠𝑖𝑧, 1) 𝑇 𝑑𝑖 = (𝑑𝑖𝑥, 𝑑𝑖𝑦, 𝑑𝑖𝑧, 1) 𝑇 𝑛𝑖 = (𝑛𝑖𝑥, 𝑛𝑖𝑦, 𝑛𝑖𝑧, 0) 𝑇 Source point s Destination point d Unit normal vector at d 8K. L. Low, “Linear least-squares optimization for point-to-plane icp surface registration,” Chapel Hill, 2004.
  • 16.
    Point-to-Plane Algorithm • Transformationmatrix M • Least square problem • 6-DOF (𝛼, 𝛽, 𝛾, 𝑡 𝑥, 𝑡 𝑦, 𝑡 𝑧) • However, In case of 𝛼, 𝛽, 𝛾, it is a nonlinear trigonometric function • Linear approximation is needed 20 where 𝑀 𝑜𝑝𝑡 = arg min 𝑀 𝑖 ((𝑀 ∙ 𝑠𝑖 − 𝑑𝑖) ∙ 𝑛𝑖)2
  • 17.
    Point-to-Plane Algorithm • ApproximatedTransformation Matrix 𝑴 • Linearized expression for the i-th correspondence 21 𝛼 𝛽 𝛾 𝑡 𝑥 𝑡 𝑦 𝑡 𝑧 (𝑛𝑖𝑧 𝑠𝑖𝑦 − 𝑛𝑖𝑧 𝑠𝑖𝑦) (𝑛𝑖𝑥 𝑠𝑖𝑧 − 𝑛𝑖𝑧 𝑠𝑖𝑥) (𝑛𝑖𝑦 𝑠𝑖𝑥 − 𝑛𝑖𝑥 𝑠𝑖𝑦) 𝑛𝑖𝑥 𝑛𝑖𝑦 𝑛𝑖𝑧 T −(𝑛𝑖𝑥 𝑑𝑖𝑥 + 𝑛𝑖𝑦 𝑑𝑖𝑦 + 𝑛𝑖𝑧 𝑑𝑖𝑧 − 𝑛𝑖𝑥 𝑠𝑖𝑥 − 𝑛𝑖𝑦 𝑠𝑖𝑦 − 𝑛𝑖𝑧 𝑠𝑖𝑧)= =
  • 18.
    Point-to-Plane Algorithm • Expandto N correspondences • Modified form to the general least square problem • Optimum solution 𝑥 𝑜𝑝𝑡 • Iteratively perform the Levenberg-Marquardt optimization until it converges 22 where Kintinuous_Code_Review/20170308014917
  • 19.
    • Given twoconsecutive RGB-D frames and , • Compute a rigid camera transformation between the two frames that maximises photo consistency. • • ℒ = the list of valid interest points • T = the current estimate of the transformation from 𝐼 𝑛 to 𝐼 𝑛−1 Photometric CPE 23 𝐼 = (𝑟𝑔𝑏 𝑅 ∗ 0.299 + 𝑟𝑔𝑏 𝐺 ∗ 0.587+𝑟𝑔𝑏 𝐵 ∗ 0.114) Reprojectionprojection Transformation 𝐼 𝑛−1 𝐼 𝑛 Reprojectionprojection Transformation Kintinuous_Code_Review/20170308015138
  • 20.
    • The sumof the RGB-D and ICP cost Combined CPE 24 Kintinuous_Code_Review/20170308015317 where
  • 21.
    • Problem • Likeall egomotion estimation systems, drift will accumulate over space and time • Simple approach • Associate each vertex in the mesh with the nearest camera pose • Optimize the pose graph • Reflect the camera pose transformations in the mesh vertices • Another problem • Sharp discontinuities at points on the surface where the association between camera poses changes • Ignores other important properties of the surface • Solution • Loop closure constraint • Non-rigid method of correcting the map (deformation) Loop Closure & Deformation 25
  • 22.
    • Speeded UpRobust Feature (SURF) descriptors • Bag-of-word based DBoW loop detector • DBoW (Database Bag of Words) • an open source C++ library for indexing and converting images into a bag-of-word representation • a hierarchical tree for approximating nearest neighbours in the image feature space and creating a visual vocabulary. • an image database with inverted and direct files to index images and enabling quick queries and feature comparisons Place Recognition 26 D. Gálvez-López and J. D. Tardós, “Bags of binary words for fast place recognition in image sequences,” IEEE Trans. Robot., vol. 28, no. 5, pp. 1188–1197, 2012.
  • 23.
    • Adding everyRGB-D frame to the place recognition system is non-optimal • utilise a movement metric sensitive to both rotation and translation which indicates when to add a new frame to the place recognition system • If the metric is above some threshold 𝑚 𝑝, a new frame is added. • Empirically, 𝑚 𝑝=0.3 provides a good performance. • Computes a set of SURF keypoints and associated descriptors • Depth image 𝑑𝑖 is also cached in the memory by real-time lossless compression • The existing bag-of-words descriptor database is queried Place Recognition 27 where
  • 24.
    • SURF correspondencethreshold • FLANN • RANSAC transformation estimation • Given graph G and depth image 𝑑 𝑚, • Approximate a 6-DOF relative transformation between the camera pose of frames I and m using RANSAC-based 3-point algorithm • Point cloud ICP • Perform a non-linear ICP step between 𝑑𝑖 and 𝑑 𝑚 • Downsampling of each point cloud using a voxel grid filter • Accept the final refined transformation if ithe mean 𝐿2 2 -norm of all correspondence errors is below a threshold • Empirically, the threshold = 0.01 • Once a loop closure candidate has passed all of the described tests, the relative transformation constraint between the two camera poses is added to the pose graph maintained by the iSAM module Place Recognition 28 Kintinuous_Code_Review/20170308015520
  • 25.
    • Non-rigid spacedeformation of the map • Deformation graph • Each node 𝑁𝑙 has an associated position 𝑁𝑙 𝑔 and set of neighbouring nodes N(𝑁𝑙) • Each node also stores an affine transformation in the form of a 3X3 matrix 𝑁𝑙 𝑅 and a 3X1 vector 𝑁𝑙 𝑡 Space Deformation 29 R. W. Sumner, J. Schmid, and M. Pauly, “Embedded deformation for shape manipulation,” ACM Trans. Graph., vol. 26, no. 3, p. 80, 2007.
  • 26.
    • Pose graphoptimisation • Carried out by the iSAM framework • Map deformation • cost functions over the deformation graph • 1) maximising rigidity in the deformation • 2) Regularisation term • 3) Constraint term that minimises the error on a set of user specified vertex position constraints Q Optimisation 32 Graph node neighbors [𝑁𝑙 𝑅 , 𝑁𝑙 𝑡 , 𝑁𝑙 𝑔 ] [𝑁 𝑛 𝑡 , 𝑁 𝑛 𝑔 ]
  • 27.
    • Final costfunction • • Optimized by Gauss-Newton algorithm • The Jacobian matrix in these problem is sparse • Cholesky factorisation • Then apply the optimised deformation graph N to all vertices over all cloud slices C Optimisation 33 Kintinuous_Code_Review/20170308015853
  • 28.
    • Trajectory estimation •RGB-D Dataset from TUM (http://vision.in.tum.de/data/datasets/rgbd-dataset/download) • Absolute trajectory RMSE • measures the root-mean-square of the Euclidean distances between all estimated camera poses and the ground truth poses associated by timestamp Evaluation 34 Two dimensional plot of estimated trajectories versus ground truth trajectories on evaluated sequences.
  • 29.
    • Statistics onATE on the datasets • Mean over the ten runs • Comparative evaluation • DVO SLAM • RGB-D SLAM • Multi-resolution surfel maps (MRS) • High score on a camera trajectory benchmark does not always imply a high quality surface reconstruction due to the frame-to-model tracking component of the system. Evaluation 35
  • 30.
    • Surface reconstructioncomparison Evaluation 36 KintinuousRGB-D SLAM Point Cloud Volume Model
  • 31.
    • Keyframe reprojectioncomparison Evaluation 37 KintinuousDVO SLAM Point Cloud
  • 32.
    • Real-time denseSLAM system • Frontend for camera pose estimation and surface reconstruction • Backend for a non-rigid map deformation and the loop closure • Extensive evaluation • Both quantitatively and qualitatively on common benchmarks • Ability to produce large scale dense globally consistent maps in real-time • Limitation • Reliance on projective data association for camera pose estimation • Future work • Real-time large scale dense fused 3D reconstruction which supports online drift correction • Globally consistent representation of the map at any time and allows map re-use and re-fusing Conclusion 40
  • 33.
    • Kintinuous CodeReview • https://dongwonshin.blog/2017/02/24/paper-review-real-time-large-scale-dense-rgb-d- slam-with-volumetric-fusion/ Appendix 41
  • 34.

Editor's Notes

  • #14 How many voxels exist within one meter.
  • #15 βi to be the rotation around the y-axis of the camera pose at time i
  • #18 크시
  • #20 Let’s see another algorithm, point-to-plane algorithm. It is to minimize a perpendicular distance from the source point to tangent plane of destination point. It uses the nonlinear least square algorithm using Levenberg-Marquardt method. The main difference from point-to-point algorithm is that it considers the normal vector of destination point When minimizing. And the cost function is like this.
  • #21 Let’s assume the transformation matrix M like this. It consists of translation T and rotation R. Then the cost function considering the normal vector is like this. It has 6-DOF However, in case of ~~~, it is nonlinear trigonometric function So linear approximation is needed.
  • #22 In the reference paper, authors derived this approximated transformation matrix M hat. We can make a linearized expression for the ith correspondence like this.
  • #23 If we expand that equation to N correspondences, we can think of this matrix equation. Therefore, the modified form to the general least square problem is like this. And we can find the optimum solution x_opt like this by iteratively performing the SVD optimization until it converges.
  • #24 이전 프레임의 색상영상을 3차원 공간에 리프로젝션후 현재 프레임으로 프로젝션 3D 워핑을 의미하는것 같다.