SlideShare a Scribd company logo
1 of 39
Download to read offline
Visual SLAM and VO
Jakob Engel
Surreal Vision, Oculus Research
*Presentation reflects work done in collaboration with
Daniel Cremers at Technical University Munich and
Vladlen Koltun at Intel Research
Jakob Engel Visual SLAM 1
2
Jakob Engel Visual SLAM
Live Operation
[1] LSD-SLAM, Engel, Schoeps, Cremers ‘14
3
Jakob Engel Visual SLAM
Live Operation
monoSLAM PATM MSCKF DTAM
Semi-Dense VO ORB-SLAM 2 SVO
LSD-SLAM OKVIS DSO
And many, many more!
4
• Geometric SLAM formulations.
• Indirect vs. direct error formulation.
• Geometry parametrizations choices.
• Sparse vs. dense model.
• Optimization approach.
• Concrete example: DSO direct error formulation.
• System architecture of two examples: ORB-SLAM and DSO
• Different sensor combinations (IMU, mono vs. stereo, RGB-D)
• Evaluating SLAM / VO, loopclosure metric.
• Overall analysis & parameter studies:
Impact of field of view, image resolution, different noise sources, different algorithm and
model choices in controlled experiments: Analyzing results from 40’000 tracked trajectories
(100 million tracked frames, or 38 days of video at 30fps) on 50 sequences (190k frames).
Overview
Jakob Engel Visual SLAM
5
Everything is based on maximum likelihood estimation:
Find model parameters X that maximise the probability of observing Y.
Camera Poses, Camera
Intrinsics, 3D Geometry, ...
Indirect: Observations Y are 2D point positions.
=> P models (geometric) noise on
point positions, residual unit is pixels.
Direct: Observations Y are pixel Intensities.
=> P models (photometric) noise on
pixel intensities, residual unit is radiance.
Probabilistic model P(Y | X)
(likelihood)
Observed Data (Measurements)
Jakob Engel Visual SLAM
Indirect vs. Direct
[2] DSO, Engel, Koltun, Cremers ‘17
6
Measurement
Noise Model:
Model
Assumptions:
Negative log-
likelihood
(Energy):
(template intensity equals image
intensity at projected point)
(gaussian white noise on imge)
Direct Model
LSD-SLAM [1], DSO [2], DTAM [9], REMODE [10],
SVO [7] (front-end), [12], [13], [15] ...
(summed over a set of template pixel)
(gaussian white noise 2D keypoint)
(2D keypoint location equals
projection of associated 3D point)
Indirect Model
(summed over a set of model points)
ORB-SLAM [3], PTAM [4], OKVIS [5], MSCKF [6],
SVO [7] (back-end), monoSLAM [8], [11], ...
Jakob Engel Visual SLAM
Indirect vs. Direct
[2] DSO, Engel, Koltun,
Cremers ‘17
7
There are many ways to represent geometry: pointcloud, depth maps,
TSDF, Mesh, occupancy grid, ….
For real-time visual SLAM most important options (non-exhaustive list):
• 3D euclidean points (world coordinates)
, with
• 3D euclidean points (local coordinates in „host“ frame h)
, with
• 3D inverse points (local coordinates in „host“ frame h)
, with
• 1D inverse depth (local coordinates in „host“ frame h)
, with
• 1D inverse distance, 3D inverse (distance + 2 angles), .....
Jakob Engel Visual SLAM
Geometry Parametrization
8
Local vs. World respresentation of points
Local representation is more flexible as points move with host frame (good
when fixing values or linearization points). However, residuals now depend
on two camera-to-world poses. Requires to define a „host“ frame.
Reduces geometry-camerapose correlation
Jakob Engel Visual SLAM
Geometry Parametrization
Inverse vs. Euclidean
Inverse representation more linear wrt. epipolar geometry, almost always
preferable.
1D vs 3D representation
1D obtained by fixing (u,v) in host frame. Correct for direct model when
using host as template image. Strictly speaking incorrect for indirect model
(see [2]).
[2] DSO, Engel, Koltun, Cremers ‘17
9
Not synonymous with direct vs. indirect!
Dense+Indirect: Compute dense optical flow, then estimate
(dense) geometry from flow-correspondences.
Sparse+Direct: Just use a selected subset of pixels & group them.
• For dense, we need to integrate a prior, since (from passive vision
alone), dense geometry is not observable (white walls).
• Typically, this prior is some version of „the world is smooth“.
• Causes correlations between geometry parameters (and
significantly increases their number), which is bad for GN-like
optimization methods.
=>
=>
Jakob Engel Visual SLAM
Dense vs. Sparse
[2] DSO, Engel, Koltun, Cremers ‘17
10
Poses:
~5k Parameters.
Inverse Depth Values:
~100M Parameters.
Pose-Pose block
(sparse block-matrix)
Depth-Depth block:
very large, very sparse, but
irregular (depth residuals +
smoothness prior terms)
Pose-Depth correlations.
(sparse block-matrix)
(theoretical) Hessian Structure of LSD-SLAM [1]
Jakob Engel Visual SLAM
Dense vs. Sparse
[2] DSO, Engel, Koltun, Cremers ‘17
11
• As a result, real-time Dense or Semi-Dense appraoches refrain
from jointly optimizing the complete model.
• They use alternating optimization (LSD-SLAM) or other
optimization method like Primal-Dual (REMODE, DTAM).
• They favor large amounts of data to constrain the large number of
unknowns, ignoring or approximating correlations.
• This works well if the data is good. It fails if the configuration is
close-to-degenerate, since we may accumulate wrong
linearizations, which will not be corrected as we ignored
correlations & don‘t re-linearize.
Jakob Engel Visual SLAM
Dense vs. Sparse
12
Kalman Filter MSCKF [6] (indirect VIO), monoSLAM* [8] (indirect VO)], Soatto et.al [11,12]
Kausal integration of information over time, representing the current state as mean and covariance and
marginalizing old information. Today mostly used for lightweight visual-inertial odometry.
Real-time by early fixation of linearizations and marginalization of old states
Information Filter OKVIS* [5] (indirect VIO), DSO* [2] (direct VO)
Similar to EKF, but representing the state as minimum of a quadratic energy function. Today mostly
used for visual-inertial odometry (slightly more accurate and expensive than EKF).
Real-time by early (but slightly later than EKF) fixation of linearizations and marginalization of old states
Graph-Optimization ORB-SLAM* [3], PTAM* [4] (indirect SLAM), LSD-SLAM* [1] (direct SLAM), SVO*
Maintains full map as grap, typically optimizing in multiple layers: local active window (front-end) and
full map (back-end). Allows re-using old geometry and loop-closures (drift-free scene browsing), rather
than marginalizing everything outside the field of view.
Real-time by sub-sampling residuals / states (only optimize Keyframes)
Primal-dual and others REMODE* [10], DTAM [9] (direct dense reconstruction)
Allows to efficiently solve regularized dense systems wrt. depth, which are intractible for GN-based
optimization. Typicaly poses are estimated with a separate optimization (sequential / alternating).
Real-time by GPU acceleration & fixing poses.
* open-source code by authors available
Jakob Engel Visual SLAM
Optimization
13
ORB-SLAM 2 [3]
• Fully indirect, sparse model (FAST corners +
ORB descriptors)
• Points represented as 3D euclidean points
in world coordinates
• Optimized using graph-optimization (g2o)
with local and global window.
Open-source code:
• monocular, stereo- and RGB-D visual SLAM
• Implicit and explicit loop-closure, re-
localization and map re-use.
Direct Sparse Odometry (DSO) [2]
• Fully direct, sparse model (plus
photometric calibration)
• Points represented as 1D inverse depth
in local frames
• Optimized using information filtering
(sliding window of keyframes)
Open-source code:
• monocular visual odometry
Jakob Engel Visual SLAM
Two Examples
• Warped point:
• Weights:
• Expoisure times:
• Photometrically corrected Image:
• Per-Frame affine lighting parameters
Direct Sparse Odometry (DSO):
Use photometric error over a small neighbourhood (8 pixel, ).
All host frames
hosted points
observations
Jakob Engel Visual SLAM 14
DSO: direct, sparse model
[2] DSO, Engel, Koltun, Cremers ‘17
Indirect model: Sensor „measures“ 2D point positions.
Direct model: Sensor measures pixel radiance.
Geometric Calibration: 3D point -> 2D pixel coordinate.
Well understood, and always done (often refined online).
Photometric Calibration: Scene irradiance (lux) -> 8-12 bit pixel value.
Widely ignored: Features are invariant - brightness constancy is not.
pixel value
camera response function
(gamma function)
pixel radiance
attenuation factor
(vignetting)
exposure time
Jakob Engel Visual SLAM 15
Photometric Calibration
[14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
pixel value
camera response function
(gamma function)
pixel irradiance
attenuation factor
(vignetting)
exposure time
log(irradiance B) Vignette V hardware gamma G
Noise: Irradiance noise is not Gaussian but
Poisson-distributed (mean = variance). With 0.5
gamma correction, variance becomes constant
across pixel values.
Jakob Engel Visual SLAM 16
Photometric Calibration
So... what else is there to visual SLAM / Odometry than picking a probabilistic
model (direct / indirect, sparse / dense), parameter representation (1D/3D,
inverse/euclidean, local/world) and optimization method (EKF / Information
Filster / Graph-Optimization / Primal-Dual)?
What will make or break a real-time SLAM / VO system are all the (often heuristic)
nitty-gritty details (90% of effort and code, major impact on accuracy / robustness):
• How to select points / pixels to use?
• How to select (key-)frames to use?
• How to select observations / residuals to use (which points are visible where)?
• How to initialize values?
• Which energy terms / parameters are optimized / fixed / marginalized where
and when?
=> So far, this just gives you an energy function, and tells
you how to optimize it (assuming a good initializatoin).
Jakob Engel Visual SLAM 17
Summary
Point Selection
DSO:
Divide image into coarse grid, select pixel
with largest absolute gradient in each cell
(on finest resolution only).
Add new points to the optimization such
that a certain number spatially well-
distributed of points is observed in each
keyframe.
Few good, outlier-free and spatially well-distributed points is typically better
than many noisy points including a larger fraction of outliers.
ORB-SLAM2:
Divide image into coarse grid, select best
FAST-corner(s) from each cell, and across
scale-space pyramid.
Add new points to the optimization such
that a certain number of spatially well-
distributed points is observed in each
keyframe.
Similar for most systems.
Jakob Engel Visual SLAM 18
Nitty-Gritty Details
[2] DSO, Engel, Koltun, Cremers ’17;
[3] ORB-SLAM, Mur-Artal, Montiel, Tardos ‘15
Keyframe Selection
DSO:
Combined threshold on rotation,
translation (measured as optical flow) and
exposure change to last keyframe.
Keep sliding window of „active“ keyframes,
as mix of short-baseline and large-baseline
frames.
Critical for performance in practice. Lots of different approaches in literature.
ORB-SLAM2:
Threshold on successfully tracked points
and passed time between keyframes.
General strategy: take more keyframes
than needed, and sparsify afterwards
(survival of the fittest).
Keep window of „active“ keyframes as n
frames with largest co-visibility.
MSCKF: No keyframes at all.
Jakob Engel Visual SLAM 19
Nitty-Gritty Details
[2] DSO, Engel, Koltun, Cremers ’17;
[3] ORB-SLAM, Mur-Artal, Montiel, Tardos ‘15
Residual Selection
DSO:
Decide on visibility by thresholding
photometric energy per-patch with
adaptive threshold.
Only attempt to add residuals within
sliding window of active frames, discard
everything else to enable efficient
optimization.
Determines actual energy terms optimized over.
ORB-SLAM2:
ORB descriptors are used for data
association. Accelerate & robustify
matching by exploiting geometric
constraints.
Attempt to observe each point in as many
Keyframes as possible, maximising number
of observations per point. Continuously
optimize this data-association by merging
points and adding new residuals where
possible.
Widely varying strategies.
Jakob Engel Visual SLAM 20
Nitty-Gritty Details
[2] DSO, Engel, Koltun, Cremers ’17;
[3] ORB-SLAM, Mur-Artal, Montiel, Tardos ‘15
Parameter Initialization
DSO:
Poses: Coarse-to-fine direct image alignment.
Geometry: Pixel-wise, exhaustive search
along epipolar line. Start with minimal
baseline frames, then increase baseline while
using prior from previous search. Use points
at any distance (no minimal parallax
required).
More difficult for direct approach due to non-linearity of energy.
Critical to avoid adding bad data to system.
ORB-SLAM2:
Poses: No good initialization needed.
Geometry: Closed-form triangulation from
initial matches. Only use points that are
sufficiently well localized in euclidean 3D
(sufficient observation parallax required).
Similar for most approaches
Jakob Engel Visual SLAM 21
Nitty-Gritty Details
[2] DSO, Engel, Koltun, Cremers ’17;
[3] ORB-SLAM, Mur-Artal, Montiel, Tardos ‘15
Optimization Strategy
DSO:
Front-End (at framerate):
Optimize newest frame‘s pose wrt. fixed
map.
Sliding Window (at keyframe-rate):
Optimize local window of keyframes and
observed points. Permanently marginalize
points once they leave the field of view, and
keyframes once they drop out of the active
window.
Loop-Closing:
Not supported, pure VO.
Here it can get tricky. Avoid adding biasses or locking down estimated
parameters by premature linearization / fixation of parameters.
ORB-SLAM2:
Front-End (at framerate):
Optimize newest frame‘s pose wrt. fixed
map.
Local Window (at keyframe-rate):
Optimize local window of keyframe poses
and all observed points. Temporarily fix
absolute poses of periphery keyframes.
Loop-Closing (when required):
Pose-graph optimization followed by full
map optimization (BA).
MSCKF: one-off geometry
triangolation & marginalization,
„structureless“
Jakob Engel Visual SLAM 22
Nitty-Gritty Details
[2] DSO, Engel, Koltun, Cremers ’17;
[3] ORB-SLAM, Mur-Artal, Montiel, Tardos ‘15
• Stereo / multi-camera (incl. non-overlapping stereo):
Absolute scale observable, increases FOV. Best to use both static
constraints (across static baseline), as well as temporal constraints (across
time) thus not depend on a fixed stereo-baseline.
• RBG-D cameras (ToF or structured light):
Metric geometry directly observable. Great for dense reconstruction,
however commercial sensors are hard to calibrate and have a limited field
of view. Can create „dense“ model without depending on regularization /
geometry prior.
• IMU:
In many ways complementary to vision. Tight IMU integration always is a
good idea in practice, and gives impressive results.
• Rolling shutter camera(s): global vs. Rolling shutter makes a difference!
Jakob Engel Visual SLAM 23
Sensor Choices
Important not just to get your paper accepted!
Should guide model selection algorithm design choices.
SLAM / VO, due to it‘s incremental nature, is a highly
chaotic process: small perturbations can have big impact..
- Evaluate on many datasets in a wide range of „realistic“ environments
(indoor/outdoor, bright/dark, high-frequent/low-frequent, unique /
repetitive environments, ....). Avoid manual over-fitting to a few scenes.
- Augment your data (run forwards & backwards, differnt image crops)
- Exploit non-deterministic implementation to get a better sampling.
Jakob Engel Visual SLAM 24
Evaluating SLAM / VO
[14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
- 50 sequences
- 105 minutes
- 190k frames
Jakob Engel Visual SLAM 25
TUM monoVO dataset
[14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
Impossible to get good ground-truth.
But we can still use it to evaluate tracking accuracy!
1. Use LSD-SLAM (or any other SLAM / SfM software) to track start- and end-segment.
 they show the same scene with nice, loopy motion
 very precise alignment.
 „ground-truth“ poses for start- and end-segment.
2. Sim (3)-algin tracked start- and end-segment to „gt“
3. Explicitly compute accumulated error as Sim(3) transform:
Measure accumulated drift after a large loop.
(obviously before / without closing the loop)
Jakob Engel Visual SLAM 26
Evaluation Metric
[14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
How to summarize this SE(3) drift in one number?
„RMSE between tracked trajectory
(a) aligned to the start segment (blue), and
(b) aligned to the end segment (red).”
Proposed Alignment Error:
(all trajectories are normalized to have approximately length 100)
• Unifies translational, rotational and scale drift in
one number. Implicitly weighs rotation & scale by
how they would affect translation.
• More meaningful than just translational drift.
• Works for other sensor modalities (VIO, RGB-D,
Stereo, …)
• No set-up required, everyone can do this at home.
• Use in addition to other metrics (e.g. absolute
RMSE on datasets with MoCap groundtruth)
Jakob Engel Visual SLAM 27
Evaluation Metric
[14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
Jakob Engel Visual SLAM 28
Data Presentation
[14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
LSD-SLAM & SVO fail on almost all of the sequences, hence we evaluate only ORB-SLAM and DSO.
How to summarize 50 sequences in one number?
 Average / median don‘t work well, in
particular if some sequences fail entirely.
 Tables are not practical for 50 sequences.
use Cumulative Error Plots!
(„how many sequences Y were tracked more accurately than X?“)
[50 sequences] x [forward, backward] x [5 runs] = 500 run sequences (~20hours of video) per line.
Jakob Engel Visual SLAM 29
Data Presentation
[14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
Field of View
larger FoV is better for both methods.
(wrt. tracking under general motion. Accurate geometry
with minimal parallax requires large angular resolution).
* Resolution fixed at 640x480.
** Raw (distorted) data at
1240 x 1024
Jakob Engel Visual SLAM 30
General Analysis
[14] TUM monoVO Dataset,
Engel, Usenko, Cremers ‘16
Image Resolution
ORB-SLAM‘s accuracy strongly depends on resolution.
DS-VO‘s accuracy only slightly depends on resolution, no gain above VGA.
* FoV fixed.
** Ignoring effect on compute requirement
Jakob Engel Visual SLAM 31
General Analysis
[14] TUM monoVO Dataset,
Engel, Usenko, Cremers ‘16
Let‘s play around a bit: Forward / Backward
ORB-SLAM seems to perform better when moving backwards.
(Impact of all the nitti-gritty details)
* Sequences mostly contain forward motion.
Jakob Engel Visual SLAM 32
General Analysis
[14] TUM monoVO Dataset,
Engel, Usenko, Cremers ‘16
Corresponds to direct model!
Add geometric Noise:
(motion blur, sensor noise / low light, ...)
20ms exposure time, 100deg/s rotation,
Kinect camera geometry: delta = 10px
Add photometric Noise:
Corresponds to indirect model!
(bad calibration, rolling shutter,
chromatic abberation, ...)
25ms readout time, 100deg/s rotation,
Kinect camera geometry: delta = 6 px
Jakob Engel Visual SLAM 33
Effect of different noise
[2] DSO, Engel ‘17
Point Selection
Relatively robust to gradient threshold.
Using only corners decreases robustness.
Jakob Engel Visual SLAM 34
Controlled parameter analysis
[2] DSO, Engel, Koltun, Cremers ‘17
Number of Points
1000 points are enough (default: 2k points).
Jakob Engel Visual SLAM 35
Controlled parameter analysis
[2] DSO, Engel, Koltun, Cremers ‘17
Number of Keyframes taken
More KF -> better robustness (if too little KF, gets lost on stong self-occlusion).
Less KF -> better accuracy (points & frames get marginalized later).
Jakob Engel Visual SLAM 36
Some more results
[2] DSO, Engel, Koltun, Cremers ‘17
Algorithm Component: Energy Pixel Pattern
1-3 are slightly worse, otherwise no big difference.
Jakob Engel Visual SLAM 37
Some more results
[2] DSO, Engel, Koltun, Cremers ‘17
Jakob Engel Visual SLAM 38
Summary
• Classify different Systems according to
• Direct vs. Indirect
• Sparse vs. Dense
• Geometry Representation
• Optimization approach
• All the “system-level” details to make a full SLAM / VO
system out of it.
• Evaluate on large datasets, use loop-closure metric if
groundtruth cannot be obtained.
• Use data to analyze systems and parameters.
Jakob Engel Visual SLAM 39
References
[1] J. Engel, T. Schoeps, and D. Cremers. LSD-SLAM: Largescale direct monocular SLAM. In ECCV, 2014.
[2] J. Engel, V. Koltun, and D. Cremers. “Direct Sparse Odometry”. In: TPAMI, 2017
[3] R. Mur-Artal, J. Montiel, and J. Tardos. “ORB-SLAM: a versatile and accurate monocular SLAM system”. In: Transactions on Robotics
(TRO) 31.5 (2015), pp. 1147–1163
[4] G. Klein and D. Murray. “Parallel Tracking and Mapping for Small AR Workspaces”. In: International Symposium on Mixed and
Augmented Reality (ISMAR). 2007
[5] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale. “Keyframe-based visual–inertial odometry using nonlinear
optimization”. In: IJRR 34.3 (2015), pp. 314–334
[6] A.I. Mourikis and S.I. Roumeliotis. “A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation”. In: ICRA. 2007
[7] C. Forster, M. Pizzoli, and D. Scaramuzza. “SVO: Fast Semi-Direct Monocular Visual Odometry”. In: International Conference on
Robotics and Automation (ICRA). 2014
[8] A.J. Davison, I. Reid, N. Molton, and O. Stasse. “MonoSLAM: Realtime single camera SLAM”. In: TPAMI 29.6 (2007), pp. 1052–1067
[9] R. Newcombe, S. Lovegrove, and A. Davison. “DTAM: Dense tracking and mapping in real-time”. In: International Conference on
Computer Vision (ICCV). 2011
[10] M. Pizzoli, C. Forster, and D. Scaramuzza. “REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time”. In: ICRA. 2014
[11] H. Jin, P. Favaro, and S. Soatto. “Real-time 3-d motion and structure of point features: Front-end system for vision-based control and
interaction”. In: CVPR 2000
[12] H. Jin, P. Favaro, and S. Soatto. “A semi-direct approach to structure from motion”. In: The Visual Computer 19.6 (2003), pp. 377–394
[13] J. Engel, J. Sturm, and D. Cremers. “Semi-Dense Visual Odometry for a Monocular Camera”. In: International Conference on
Computer Vision (ICCV). 2013
[14] J. Engel, V. Usenko, and D. Cremers. “A Photometrically Calibrated Benchmark For Monocular Visual Odometry”. In:
arXiv:1607.02555. 2016
[15] K.J. Hanna. “Direct multi-resolution estimation of ego-motion and structure from motion”. In: Proceedings of the IEEE Workshop on
Visual Motion. 1991
For a more complete list of references and related work, see e.g. my PhD Thesis.

More Related Content

What's hot

Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIYu Huang
 
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingDD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingElectronic Arts / DICE
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereoBaliThorat1
 
SLAM入門 第2章 SLAMの基礎
SLAM入門 第2章 SLAMの基礎SLAM入門 第2章 SLAMの基礎
SLAM入門 第2章 SLAMの基礎yohei okawa
 
Creating A Character in Uncharted: Drake's Fortune
Creating A Character in Uncharted: Drake's FortuneCreating A Character in Uncharted: Drake's Fortune
Creating A Character in Uncharted: Drake's FortuneNaughty Dog
 
Rosserial無線化への招待 〜Invitation to wirelessization by rosserial〜
Rosserial無線化への招待 〜Invitation to wirelessization by rosserial〜Rosserial無線化への招待 〜Invitation to wirelessization by rosserial〜
Rosserial無線化への招待 〜Invitation to wirelessization by rosserial〜Tatsuya Fukuta
 
Simultaneous Localization, Mapping and Self-body Shape Estimation by a Mobile...
Simultaneous Localization, Mapping and Self-body Shape Estimation by a Mobile...Simultaneous Localization, Mapping and Self-body Shape Estimation by a Mobile...
Simultaneous Localization, Mapping and Self-body Shape Estimation by a Mobile...Akira Taniguchi
 
object recognition for robots
object recognition for robotsobject recognition for robots
object recognition for robotss1240148
 
VSlam 2017 11_20(張閎智)
VSlam 2017 11_20(張閎智)VSlam 2017 11_20(張閎智)
VSlam 2017 11_20(張閎智)Hung-Chih Chang
 
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lightingozlael ozlael
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAMYu Huang
 
Real Time Object Tracking
Real Time Object TrackingReal Time Object Tracking
Real Time Object TrackingVanya Valindria
 

What's hot (20)

Snakeppt1 copy
Snakeppt1   copySnakeppt1   copy
Snakeppt1 copy
 
Depth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors IIDepth Fusion from RGB and Depth Sensors II
Depth Fusion from RGB and Depth Sensors II
 
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time RenderingDD18 - SEED - Raytracing in Hybrid Real-Time Rendering
DD18 - SEED - Raytracing in Hybrid Real-Time Rendering
 
camera calibration
 camera calibration camera calibration
camera calibration
 
Mapping mobile robotics
Mapping mobile roboticsMapping mobile robotics
Mapping mobile robotics
 
swarm robotics
swarm roboticsswarm robotics
swarm robotics
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereo
 
SLAM入門 第2章 SLAMの基礎
SLAM入門 第2章 SLAMの基礎SLAM入門 第2章 SLAMの基礎
SLAM入門 第2章 SLAMの基礎
 
Creating A Character in Uncharted: Drake's Fortune
Creating A Character in Uncharted: Drake's FortuneCreating A Character in Uncharted: Drake's Fortune
Creating A Character in Uncharted: Drake's Fortune
 
Rosserial無線化への招待 〜Invitation to wirelessization by rosserial〜
Rosserial無線化への招待 〜Invitation to wirelessization by rosserial〜Rosserial無線化への招待 〜Invitation to wirelessization by rosserial〜
Rosserial無線化への招待 〜Invitation to wirelessization by rosserial〜
 
Simultaneous Localization, Mapping and Self-body Shape Estimation by a Mobile...
Simultaneous Localization, Mapping and Self-body Shape Estimation by a Mobile...Simultaneous Localization, Mapping and Self-body Shape Estimation by a Mobile...
Simultaneous Localization, Mapping and Self-body Shape Estimation by a Mobile...
 
Lecture 09: Localization and Mapping III
Lecture 09: Localization and Mapping IIILecture 09: Localization and Mapping III
Lecture 09: Localization and Mapping III
 
object recognition for robots
object recognition for robotsobject recognition for robots
object recognition for robots
 
VSlam 2017 11_20(張閎智)
VSlam 2017 11_20(張閎智)VSlam 2017 11_20(張閎智)
VSlam 2017 11_20(張閎智)
 
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lighting
 
AI Robotics
AI RoboticsAI Robotics
AI Robotics
 
Image Stitching for Panorama View
Image Stitching for Panorama ViewImage Stitching for Panorama View
Image Stitching for Panorama View
 
Deep VO and SLAM
Deep VO and SLAMDeep VO and SLAM
Deep VO and SLAM
 
Real Time Object Tracking
Real Time Object TrackingReal Time Object Tracking
Real Time Object Tracking
 
SIFT
SIFTSIFT
SIFT
 

Similar to =SLAM ppt.pdf

論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])Masaya Kaneko
 
2022_01_TSMP_HPC_Asia_Final.pptx
2022_01_TSMP_HPC_Asia_Final.pptx2022_01_TSMP_HPC_Asia_Final.pptx
2022_01_TSMP_HPC_Asia_Final.pptxGhazal Tashakor
 
CyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdfCyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdfMohammadAzreeYahaya
 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014Simon Green
 
Ice: lightweight, efficient rendering for remote sensing images
Ice: lightweight, efficient rendering for remote sensing imagesIce: lightweight, efficient rendering for remote sensing images
Ice: lightweight, efficient rendering for remote sensing imagesotb
 
LVTS - Image Resolution Monitor for Litho-Metrology
LVTS - Image Resolution Monitor for Litho-MetrologyLVTS - Image Resolution Monitor for Litho-Metrology
LVTS - Image Resolution Monitor for Litho-MetrologyVladislav Kaplan
 
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmSelf Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmChenghao Jin
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...Johan Andersson
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DLLeapMind Inc
 
Conception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfConception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfSofianeHassine2
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Benoit Combemale
 
DEEP LEARNING TECHNIQUES POWER POINT PRESENTATION
DEEP LEARNING TECHNIQUES POWER POINT PRESENTATIONDEEP LEARNING TECHNIQUES POWER POINT PRESENTATION
DEEP LEARNING TECHNIQUES POWER POINT PRESENTATIONSelvaLakshmi63
 
Seismic_Processing.pdf
Seismic_Processing.pdfSeismic_Processing.pdf
Seismic_Processing.pdfssuserdfad17
 
LIAO TSEN YUNG Cover Letter
LIAO TSEN YUNG Cover LetterLIAO TSEN YUNG Cover Letter
LIAO TSEN YUNG Cover LetterTsen Yung Liao
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersVasileiosMezaris
 
Neighborhood Component Analysis 20071108
Neighborhood Component Analysis 20071108Neighborhood Component Analysis 20071108
Neighborhood Component Analysis 20071108Ting-Shuo Yo
 

Similar to =SLAM ppt.pdf (20)

論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
論文読み会@AIST (Deep Virtual Stereo Odometry [ECCV2018])
 
2022_01_TSMP_HPC_Asia_Final.pptx
2022_01_TSMP_HPC_Asia_Final.pptx2022_01_TSMP_HPC_Asia_Final.pptx
2022_01_TSMP_HPC_Asia_Final.pptx
 
CyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdfCyberSec_JPEGcompressionForensics.pdf
CyberSec_JPEGcompressionForensics.pdf
 
FlameWorks GTC 2014
FlameWorks GTC 2014FlameWorks GTC 2014
FlameWorks GTC 2014
 
Thesis presentation
Thesis presentationThesis presentation
Thesis presentation
 
Ice: lightweight, efficient rendering for remote sensing images
Ice: lightweight, efficient rendering for remote sensing imagesIce: lightweight, efficient rendering for remote sensing images
Ice: lightweight, efficient rendering for remote sensing images
 
LVTS - Image Resolution Monitor for Litho-Metrology
LVTS - Image Resolution Monitor for Litho-MetrologyLVTS - Image Resolution Monitor for Litho-Metrology
LVTS - Image Resolution Monitor for Litho-Metrology
 
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 AlgorithmSelf Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
Self Organizing Feature Map(SOM), Topographic Product, Cascade 2 Algorithm
 
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
The Intersection of Game Engines & GPUs: Current & Future (Graphics Hardware ...
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DL
 
Conception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdfConception_et_realisation_dun_site_Web_d.pdf
Conception_et_realisation_dun_site_Web_d.pdf
 
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
Efficient and Advanced Omniscient Debugging for xDSMLs (SLE 2015)
 
DEEP LEARNING TECHNIQUES POWER POINT PRESENTATION
DEEP LEARNING TECHNIQUES POWER POINT PRESENTATIONDEEP LEARNING TECHNIQUES POWER POINT PRESENTATION
DEEP LEARNING TECHNIQUES POWER POINT PRESENTATION
 
Reyes
ReyesReyes
Reyes
 
Seismic_Processing.pdf
Seismic_Processing.pdfSeismic_Processing.pdf
Seismic_Processing.pdf
 
LIAO TSEN YUNG Cover Letter
LIAO TSEN YUNG Cover LetterLIAO TSEN YUNG Cover Letter
LIAO TSEN YUNG Cover Letter
 
Explaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiersExplaining the decisions of image/video classifiers
Explaining the decisions of image/video classifiers
 
3D reconstruction
3D reconstruction3D reconstruction
3D reconstruction
 
Neighborhood Component Analysis 20071108
Neighborhood Component Analysis 20071108Neighborhood Component Analysis 20071108
Neighborhood Component Analysis 20071108
 
mid_presentation
mid_presentationmid_presentation
mid_presentation
 

Recently uploaded

COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...josephjonse
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...ssuserdfc773
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...ppkakm
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessorAshwiniTodkar4
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsmeharikiros2
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...drmkjayanthikannan
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelDrAjayKumarYadav4
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesChandrakantDivate1
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxNANDHAKUMARA10
 

Recently uploaded (20)

COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...8th International Conference on Soft Computing, Mathematics and Control (SMC ...
8th International Conference on Soft Computing, Mathematics and Control (SMC ...
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 

=SLAM ppt.pdf

  • 1. Visual SLAM and VO Jakob Engel Surreal Vision, Oculus Research *Presentation reflects work done in collaboration with Daniel Cremers at Technical University Munich and Vladlen Koltun at Intel Research Jakob Engel Visual SLAM 1
  • 2. 2 Jakob Engel Visual SLAM Live Operation [1] LSD-SLAM, Engel, Schoeps, Cremers ‘14
  • 3. 3 Jakob Engel Visual SLAM Live Operation monoSLAM PATM MSCKF DTAM Semi-Dense VO ORB-SLAM 2 SVO LSD-SLAM OKVIS DSO And many, many more!
  • 4. 4 • Geometric SLAM formulations. • Indirect vs. direct error formulation. • Geometry parametrizations choices. • Sparse vs. dense model. • Optimization approach. • Concrete example: DSO direct error formulation. • System architecture of two examples: ORB-SLAM and DSO • Different sensor combinations (IMU, mono vs. stereo, RGB-D) • Evaluating SLAM / VO, loopclosure metric. • Overall analysis & parameter studies: Impact of field of view, image resolution, different noise sources, different algorithm and model choices in controlled experiments: Analyzing results from 40’000 tracked trajectories (100 million tracked frames, or 38 days of video at 30fps) on 50 sequences (190k frames). Overview Jakob Engel Visual SLAM
  • 5. 5 Everything is based on maximum likelihood estimation: Find model parameters X that maximise the probability of observing Y. Camera Poses, Camera Intrinsics, 3D Geometry, ... Indirect: Observations Y are 2D point positions. => P models (geometric) noise on point positions, residual unit is pixels. Direct: Observations Y are pixel Intensities. => P models (photometric) noise on pixel intensities, residual unit is radiance. Probabilistic model P(Y | X) (likelihood) Observed Data (Measurements) Jakob Engel Visual SLAM Indirect vs. Direct [2] DSO, Engel, Koltun, Cremers ‘17
  • 6. 6 Measurement Noise Model: Model Assumptions: Negative log- likelihood (Energy): (template intensity equals image intensity at projected point) (gaussian white noise on imge) Direct Model LSD-SLAM [1], DSO [2], DTAM [9], REMODE [10], SVO [7] (front-end), [12], [13], [15] ... (summed over a set of template pixel) (gaussian white noise 2D keypoint) (2D keypoint location equals projection of associated 3D point) Indirect Model (summed over a set of model points) ORB-SLAM [3], PTAM [4], OKVIS [5], MSCKF [6], SVO [7] (back-end), monoSLAM [8], [11], ... Jakob Engel Visual SLAM Indirect vs. Direct [2] DSO, Engel, Koltun, Cremers ‘17
  • 7. 7 There are many ways to represent geometry: pointcloud, depth maps, TSDF, Mesh, occupancy grid, …. For real-time visual SLAM most important options (non-exhaustive list): • 3D euclidean points (world coordinates) , with • 3D euclidean points (local coordinates in „host“ frame h) , with • 3D inverse points (local coordinates in „host“ frame h) , with • 1D inverse depth (local coordinates in „host“ frame h) , with • 1D inverse distance, 3D inverse (distance + 2 angles), ..... Jakob Engel Visual SLAM Geometry Parametrization
  • 8. 8 Local vs. World respresentation of points Local representation is more flexible as points move with host frame (good when fixing values or linearization points). However, residuals now depend on two camera-to-world poses. Requires to define a „host“ frame. Reduces geometry-camerapose correlation Jakob Engel Visual SLAM Geometry Parametrization Inverse vs. Euclidean Inverse representation more linear wrt. epipolar geometry, almost always preferable. 1D vs 3D representation 1D obtained by fixing (u,v) in host frame. Correct for direct model when using host as template image. Strictly speaking incorrect for indirect model (see [2]). [2] DSO, Engel, Koltun, Cremers ‘17
  • 9. 9 Not synonymous with direct vs. indirect! Dense+Indirect: Compute dense optical flow, then estimate (dense) geometry from flow-correspondences. Sparse+Direct: Just use a selected subset of pixels & group them. • For dense, we need to integrate a prior, since (from passive vision alone), dense geometry is not observable (white walls). • Typically, this prior is some version of „the world is smooth“. • Causes correlations between geometry parameters (and significantly increases their number), which is bad for GN-like optimization methods. => => Jakob Engel Visual SLAM Dense vs. Sparse [2] DSO, Engel, Koltun, Cremers ‘17
  • 10. 10 Poses: ~5k Parameters. Inverse Depth Values: ~100M Parameters. Pose-Pose block (sparse block-matrix) Depth-Depth block: very large, very sparse, but irregular (depth residuals + smoothness prior terms) Pose-Depth correlations. (sparse block-matrix) (theoretical) Hessian Structure of LSD-SLAM [1] Jakob Engel Visual SLAM Dense vs. Sparse [2] DSO, Engel, Koltun, Cremers ‘17
  • 11. 11 • As a result, real-time Dense or Semi-Dense appraoches refrain from jointly optimizing the complete model. • They use alternating optimization (LSD-SLAM) or other optimization method like Primal-Dual (REMODE, DTAM). • They favor large amounts of data to constrain the large number of unknowns, ignoring or approximating correlations. • This works well if the data is good. It fails if the configuration is close-to-degenerate, since we may accumulate wrong linearizations, which will not be corrected as we ignored correlations & don‘t re-linearize. Jakob Engel Visual SLAM Dense vs. Sparse
  • 12. 12 Kalman Filter MSCKF [6] (indirect VIO), monoSLAM* [8] (indirect VO)], Soatto et.al [11,12] Kausal integration of information over time, representing the current state as mean and covariance and marginalizing old information. Today mostly used for lightweight visual-inertial odometry. Real-time by early fixation of linearizations and marginalization of old states Information Filter OKVIS* [5] (indirect VIO), DSO* [2] (direct VO) Similar to EKF, but representing the state as minimum of a quadratic energy function. Today mostly used for visual-inertial odometry (slightly more accurate and expensive than EKF). Real-time by early (but slightly later than EKF) fixation of linearizations and marginalization of old states Graph-Optimization ORB-SLAM* [3], PTAM* [4] (indirect SLAM), LSD-SLAM* [1] (direct SLAM), SVO* Maintains full map as grap, typically optimizing in multiple layers: local active window (front-end) and full map (back-end). Allows re-using old geometry and loop-closures (drift-free scene browsing), rather than marginalizing everything outside the field of view. Real-time by sub-sampling residuals / states (only optimize Keyframes) Primal-dual and others REMODE* [10], DTAM [9] (direct dense reconstruction) Allows to efficiently solve regularized dense systems wrt. depth, which are intractible for GN-based optimization. Typicaly poses are estimated with a separate optimization (sequential / alternating). Real-time by GPU acceleration & fixing poses. * open-source code by authors available Jakob Engel Visual SLAM Optimization
  • 13. 13 ORB-SLAM 2 [3] • Fully indirect, sparse model (FAST corners + ORB descriptors) • Points represented as 3D euclidean points in world coordinates • Optimized using graph-optimization (g2o) with local and global window. Open-source code: • monocular, stereo- and RGB-D visual SLAM • Implicit and explicit loop-closure, re- localization and map re-use. Direct Sparse Odometry (DSO) [2] • Fully direct, sparse model (plus photometric calibration) • Points represented as 1D inverse depth in local frames • Optimized using information filtering (sliding window of keyframes) Open-source code: • monocular visual odometry Jakob Engel Visual SLAM Two Examples
  • 14. • Warped point: • Weights: • Expoisure times: • Photometrically corrected Image: • Per-Frame affine lighting parameters Direct Sparse Odometry (DSO): Use photometric error over a small neighbourhood (8 pixel, ). All host frames hosted points observations Jakob Engel Visual SLAM 14 DSO: direct, sparse model [2] DSO, Engel, Koltun, Cremers ‘17
  • 15. Indirect model: Sensor „measures“ 2D point positions. Direct model: Sensor measures pixel radiance. Geometric Calibration: 3D point -> 2D pixel coordinate. Well understood, and always done (often refined online). Photometric Calibration: Scene irradiance (lux) -> 8-12 bit pixel value. Widely ignored: Features are invariant - brightness constancy is not. pixel value camera response function (gamma function) pixel radiance attenuation factor (vignetting) exposure time Jakob Engel Visual SLAM 15 Photometric Calibration [14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
  • 16. pixel value camera response function (gamma function) pixel irradiance attenuation factor (vignetting) exposure time log(irradiance B) Vignette V hardware gamma G Noise: Irradiance noise is not Gaussian but Poisson-distributed (mean = variance). With 0.5 gamma correction, variance becomes constant across pixel values. Jakob Engel Visual SLAM 16 Photometric Calibration
  • 17. So... what else is there to visual SLAM / Odometry than picking a probabilistic model (direct / indirect, sparse / dense), parameter representation (1D/3D, inverse/euclidean, local/world) and optimization method (EKF / Information Filster / Graph-Optimization / Primal-Dual)? What will make or break a real-time SLAM / VO system are all the (often heuristic) nitty-gritty details (90% of effort and code, major impact on accuracy / robustness): • How to select points / pixels to use? • How to select (key-)frames to use? • How to select observations / residuals to use (which points are visible where)? • How to initialize values? • Which energy terms / parameters are optimized / fixed / marginalized where and when? => So far, this just gives you an energy function, and tells you how to optimize it (assuming a good initializatoin). Jakob Engel Visual SLAM 17 Summary
  • 18. Point Selection DSO: Divide image into coarse grid, select pixel with largest absolute gradient in each cell (on finest resolution only). Add new points to the optimization such that a certain number spatially well- distributed of points is observed in each keyframe. Few good, outlier-free and spatially well-distributed points is typically better than many noisy points including a larger fraction of outliers. ORB-SLAM2: Divide image into coarse grid, select best FAST-corner(s) from each cell, and across scale-space pyramid. Add new points to the optimization such that a certain number of spatially well- distributed points is observed in each keyframe. Similar for most systems. Jakob Engel Visual SLAM 18 Nitty-Gritty Details [2] DSO, Engel, Koltun, Cremers ’17; [3] ORB-SLAM, Mur-Artal, Montiel, Tardos ‘15
  • 19. Keyframe Selection DSO: Combined threshold on rotation, translation (measured as optical flow) and exposure change to last keyframe. Keep sliding window of „active“ keyframes, as mix of short-baseline and large-baseline frames. Critical for performance in practice. Lots of different approaches in literature. ORB-SLAM2: Threshold on successfully tracked points and passed time between keyframes. General strategy: take more keyframes than needed, and sparsify afterwards (survival of the fittest). Keep window of „active“ keyframes as n frames with largest co-visibility. MSCKF: No keyframes at all. Jakob Engel Visual SLAM 19 Nitty-Gritty Details [2] DSO, Engel, Koltun, Cremers ’17; [3] ORB-SLAM, Mur-Artal, Montiel, Tardos ‘15
  • 20. Residual Selection DSO: Decide on visibility by thresholding photometric energy per-patch with adaptive threshold. Only attempt to add residuals within sliding window of active frames, discard everything else to enable efficient optimization. Determines actual energy terms optimized over. ORB-SLAM2: ORB descriptors are used for data association. Accelerate & robustify matching by exploiting geometric constraints. Attempt to observe each point in as many Keyframes as possible, maximising number of observations per point. Continuously optimize this data-association by merging points and adding new residuals where possible. Widely varying strategies. Jakob Engel Visual SLAM 20 Nitty-Gritty Details [2] DSO, Engel, Koltun, Cremers ’17; [3] ORB-SLAM, Mur-Artal, Montiel, Tardos ‘15
  • 21. Parameter Initialization DSO: Poses: Coarse-to-fine direct image alignment. Geometry: Pixel-wise, exhaustive search along epipolar line. Start with minimal baseline frames, then increase baseline while using prior from previous search. Use points at any distance (no minimal parallax required). More difficult for direct approach due to non-linearity of energy. Critical to avoid adding bad data to system. ORB-SLAM2: Poses: No good initialization needed. Geometry: Closed-form triangulation from initial matches. Only use points that are sufficiently well localized in euclidean 3D (sufficient observation parallax required). Similar for most approaches Jakob Engel Visual SLAM 21 Nitty-Gritty Details [2] DSO, Engel, Koltun, Cremers ’17; [3] ORB-SLAM, Mur-Artal, Montiel, Tardos ‘15
  • 22. Optimization Strategy DSO: Front-End (at framerate): Optimize newest frame‘s pose wrt. fixed map. Sliding Window (at keyframe-rate): Optimize local window of keyframes and observed points. Permanently marginalize points once they leave the field of view, and keyframes once they drop out of the active window. Loop-Closing: Not supported, pure VO. Here it can get tricky. Avoid adding biasses or locking down estimated parameters by premature linearization / fixation of parameters. ORB-SLAM2: Front-End (at framerate): Optimize newest frame‘s pose wrt. fixed map. Local Window (at keyframe-rate): Optimize local window of keyframe poses and all observed points. Temporarily fix absolute poses of periphery keyframes. Loop-Closing (when required): Pose-graph optimization followed by full map optimization (BA). MSCKF: one-off geometry triangolation & marginalization, „structureless“ Jakob Engel Visual SLAM 22 Nitty-Gritty Details [2] DSO, Engel, Koltun, Cremers ’17; [3] ORB-SLAM, Mur-Artal, Montiel, Tardos ‘15
  • 23. • Stereo / multi-camera (incl. non-overlapping stereo): Absolute scale observable, increases FOV. Best to use both static constraints (across static baseline), as well as temporal constraints (across time) thus not depend on a fixed stereo-baseline. • RBG-D cameras (ToF or structured light): Metric geometry directly observable. Great for dense reconstruction, however commercial sensors are hard to calibrate and have a limited field of view. Can create „dense“ model without depending on regularization / geometry prior. • IMU: In many ways complementary to vision. Tight IMU integration always is a good idea in practice, and gives impressive results. • Rolling shutter camera(s): global vs. Rolling shutter makes a difference! Jakob Engel Visual SLAM 23 Sensor Choices
  • 24. Important not just to get your paper accepted! Should guide model selection algorithm design choices. SLAM / VO, due to it‘s incremental nature, is a highly chaotic process: small perturbations can have big impact.. - Evaluate on many datasets in a wide range of „realistic“ environments (indoor/outdoor, bright/dark, high-frequent/low-frequent, unique / repetitive environments, ....). Avoid manual over-fitting to a few scenes. - Augment your data (run forwards & backwards, differnt image crops) - Exploit non-deterministic implementation to get a better sampling. Jakob Engel Visual SLAM 24 Evaluating SLAM / VO [14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
  • 25. - 50 sequences - 105 minutes - 190k frames Jakob Engel Visual SLAM 25 TUM monoVO dataset [14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
  • 26. Impossible to get good ground-truth. But we can still use it to evaluate tracking accuracy! 1. Use LSD-SLAM (or any other SLAM / SfM software) to track start- and end-segment.  they show the same scene with nice, loopy motion  very precise alignment.  „ground-truth“ poses for start- and end-segment. 2. Sim (3)-algin tracked start- and end-segment to „gt“ 3. Explicitly compute accumulated error as Sim(3) transform: Measure accumulated drift after a large loop. (obviously before / without closing the loop) Jakob Engel Visual SLAM 26 Evaluation Metric [14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
  • 27. How to summarize this SE(3) drift in one number? „RMSE between tracked trajectory (a) aligned to the start segment (blue), and (b) aligned to the end segment (red).” Proposed Alignment Error: (all trajectories are normalized to have approximately length 100) • Unifies translational, rotational and scale drift in one number. Implicitly weighs rotation & scale by how they would affect translation. • More meaningful than just translational drift. • Works for other sensor modalities (VIO, RGB-D, Stereo, …) • No set-up required, everyone can do this at home. • Use in addition to other metrics (e.g. absolute RMSE on datasets with MoCap groundtruth) Jakob Engel Visual SLAM 27 Evaluation Metric [14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
  • 28. Jakob Engel Visual SLAM 28 Data Presentation [14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
  • 29. LSD-SLAM & SVO fail on almost all of the sequences, hence we evaluate only ORB-SLAM and DSO. How to summarize 50 sequences in one number?  Average / median don‘t work well, in particular if some sequences fail entirely.  Tables are not practical for 50 sequences. use Cumulative Error Plots! („how many sequences Y were tracked more accurately than X?“) [50 sequences] x [forward, backward] x [5 runs] = 500 run sequences (~20hours of video) per line. Jakob Engel Visual SLAM 29 Data Presentation [14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
  • 30. Field of View larger FoV is better for both methods. (wrt. tracking under general motion. Accurate geometry with minimal parallax requires large angular resolution). * Resolution fixed at 640x480. ** Raw (distorted) data at 1240 x 1024 Jakob Engel Visual SLAM 30 General Analysis [14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
  • 31. Image Resolution ORB-SLAM‘s accuracy strongly depends on resolution. DS-VO‘s accuracy only slightly depends on resolution, no gain above VGA. * FoV fixed. ** Ignoring effect on compute requirement Jakob Engel Visual SLAM 31 General Analysis [14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
  • 32. Let‘s play around a bit: Forward / Backward ORB-SLAM seems to perform better when moving backwards. (Impact of all the nitti-gritty details) * Sequences mostly contain forward motion. Jakob Engel Visual SLAM 32 General Analysis [14] TUM monoVO Dataset, Engel, Usenko, Cremers ‘16
  • 33. Corresponds to direct model! Add geometric Noise: (motion blur, sensor noise / low light, ...) 20ms exposure time, 100deg/s rotation, Kinect camera geometry: delta = 10px Add photometric Noise: Corresponds to indirect model! (bad calibration, rolling shutter, chromatic abberation, ...) 25ms readout time, 100deg/s rotation, Kinect camera geometry: delta = 6 px Jakob Engel Visual SLAM 33 Effect of different noise [2] DSO, Engel ‘17
  • 34. Point Selection Relatively robust to gradient threshold. Using only corners decreases robustness. Jakob Engel Visual SLAM 34 Controlled parameter analysis [2] DSO, Engel, Koltun, Cremers ‘17
  • 35. Number of Points 1000 points are enough (default: 2k points). Jakob Engel Visual SLAM 35 Controlled parameter analysis [2] DSO, Engel, Koltun, Cremers ‘17
  • 36. Number of Keyframes taken More KF -> better robustness (if too little KF, gets lost on stong self-occlusion). Less KF -> better accuracy (points & frames get marginalized later). Jakob Engel Visual SLAM 36 Some more results [2] DSO, Engel, Koltun, Cremers ‘17
  • 37. Algorithm Component: Energy Pixel Pattern 1-3 are slightly worse, otherwise no big difference. Jakob Engel Visual SLAM 37 Some more results [2] DSO, Engel, Koltun, Cremers ‘17
  • 38. Jakob Engel Visual SLAM 38 Summary • Classify different Systems according to • Direct vs. Indirect • Sparse vs. Dense • Geometry Representation • Optimization approach • All the “system-level” details to make a full SLAM / VO system out of it. • Evaluate on large datasets, use loop-closure metric if groundtruth cannot be obtained. • Use data to analyze systems and parameters.
  • 39. Jakob Engel Visual SLAM 39 References [1] J. Engel, T. Schoeps, and D. Cremers. LSD-SLAM: Largescale direct monocular SLAM. In ECCV, 2014. [2] J. Engel, V. Koltun, and D. Cremers. “Direct Sparse Odometry”. In: TPAMI, 2017 [3] R. Mur-Artal, J. Montiel, and J. Tardos. “ORB-SLAM: a versatile and accurate monocular SLAM system”. In: Transactions on Robotics (TRO) 31.5 (2015), pp. 1147–1163 [4] G. Klein and D. Murray. “Parallel Tracking and Mapping for Small AR Workspaces”. In: International Symposium on Mixed and Augmented Reality (ISMAR). 2007 [5] S. Leutenegger, S. Lynen, M. Bosse, R. Siegwart, and P. Furgale. “Keyframe-based visual–inertial odometry using nonlinear optimization”. In: IJRR 34.3 (2015), pp. 314–334 [6] A.I. Mourikis and S.I. Roumeliotis. “A Multi-State Constraint Kalman Filter for Vision-aided Inertial Navigation”. In: ICRA. 2007 [7] C. Forster, M. Pizzoli, and D. Scaramuzza. “SVO: Fast Semi-Direct Monocular Visual Odometry”. In: International Conference on Robotics and Automation (ICRA). 2014 [8] A.J. Davison, I. Reid, N. Molton, and O. Stasse. “MonoSLAM: Realtime single camera SLAM”. In: TPAMI 29.6 (2007), pp. 1052–1067 [9] R. Newcombe, S. Lovegrove, and A. Davison. “DTAM: Dense tracking and mapping in real-time”. In: International Conference on Computer Vision (ICCV). 2011 [10] M. Pizzoli, C. Forster, and D. Scaramuzza. “REMODE: Probabilistic, Monocular Dense Reconstruction in Real Time”. In: ICRA. 2014 [11] H. Jin, P. Favaro, and S. Soatto. “Real-time 3-d motion and structure of point features: Front-end system for vision-based control and interaction”. In: CVPR 2000 [12] H. Jin, P. Favaro, and S. Soatto. “A semi-direct approach to structure from motion”. In: The Visual Computer 19.6 (2003), pp. 377–394 [13] J. Engel, J. Sturm, and D. Cremers. “Semi-Dense Visual Odometry for a Monocular Camera”. In: International Conference on Computer Vision (ICCV). 2013 [14] J. Engel, V. Usenko, and D. Cremers. “A Photometrically Calibrated Benchmark For Monocular Visual Odometry”. In: arXiv:1607.02555. 2016 [15] K.J. Hanna. “Direct multi-resolution estimation of ego-motion and structure from motion”. In: Proceedings of the IEEE Workshop on Visual Motion. 1991 For a more complete list of references and related work, see e.g. my PhD Thesis.