DeepVO - Towards Visual Odometry with Deep Learning

•

5 likes•1,064 views

Author: Sen Wang1,2, Ronald Clark2, Hongkai Wen2 and Niki Trigoni2 1. Edinburgh Centre for Robotics, Heriot-Watt University, UK 2. University of Oxford, UK Download this paper: http://senwang.gitlab.io/DeepVO/#paper Watch video: http://senwang.gitlab.io/DeepVO/#video

Engineering

DeepVO
Towards End-to-End Visual Odometry with Deep
Recurrent Convolutional Neural Networks
National Chung Cheng University, Taiwan
Robot Vision Laboratory
2017/11/08
Jacky Liu

About this work
DeepVO : Towards Visual Odometry with Deep Learning
Sen Wang1,2, Ronald Clark2, Hongkai Wen2 and Niki Trigoni2
1. Edinburgh Centre for Robotics, Heriot-Watt University, UK
2. University of Oxford, UK
Download this paper: http://senwang.gitlab.io/DeepVO/#paper
Watch video: http://senwang.gitlab.io/DeepVO/#video
2
DeepVO : Towards Visual Odometry with Deep Learning

Contributions
1. Proving that
Monocular VO could
be build by End-to-
End training
2. RCNN architecture
could generalized to
unseen environment
3. Complex movement
could be modeled by
RCNN
3
DeepVO : Towards Visual Odometry with Deep Learning

Related works
4
Visual odometry
Geometric
Sparse Direct
Learning

Related works
Sparse
 PTAM
 ORB-SLAM
Direct
 DTAM
5
Network
 CNN
 RNN
 LSTM

Network design
1. Traditional computer vision learn knowledge from
appearance and image context
2. Visual odometry should learn from geometry.
This is what RCNN tried to address
6
DeepVO : Towards Visual Odometry with Deep Learning

Network design
7
DeepVO : Towards Visual Odometry with Deep Learning

8
DeepVO : Towards Visual Odometry with Deep Learning

Preprocessing
 Normalizing inputs (speed up training)
=> subtracting the mean RGB values of the
training set
 Resize image to 64x
 Stack two images to form a tensor
9
DeepVO : Towards Visual Odometry with Deep Learning

CNN
 What this research mean by learning
“geometric” feature?
=> They stacking two RGB images and feed it
into CNN. Expecting the network to perform
feature extraction on the concatenation of
two consecutive monocular RGB images.
10
DeepVO : Towards Visual Odometry with Deep Learning

RNN
 RNN is not suitable to directly learn sequential
representation from high-dimensional raw
data, such as images.
 Hidden state:
ℎ 𝑘 = ℋ 𝑊𝑥ℎ 𝑥 𝑘 + 𝑊ℎℎℎ 𝑘−1 + 𝑏ℎ
 Output:
𝑦 𝑘 = 𝑊ℎ𝑦ℎ 𝑘 + 𝑏 𝑦
11
DeepVO : Towards Visual Odometry with Deep Learning
𝑏: bias vector𝑊: weight matrix
𝑘: time index ℋ: activation function
Vanishing gradient
problem

LSTM (Long short-term memory)
12
DeepVO : Towards Visual Odometry with Deep Learning
Need depth to
learn high level
representation

13
DeepVO : Towards Visual Odometry with Deep Learning

14
Cost function
𝜃∗
= argmin
𝜃
1
𝑁
෍
𝑖=1
𝑁
෍
𝑘=1
𝑡
Ƹ𝑝 𝑘 − 𝑝 𝑘 2
2
+ 𝜘 ො𝜑 𝑘 − 𝜑 𝑘 2
2
Conditional probability of pose
𝑝 𝑌𝑡 𝑋𝑡 = 𝑝(𝑦1, … , 𝑦𝑡|𝑥1, … , 𝑥𝑡)
𝜃∗
= argmin
𝜃
𝑝(𝑌𝑡|𝑋𝑡; 𝜃)
Ground truth pose (𝑝 𝑘, 𝜑 𝑘) = (position, orientation)
𝑠𝑐𝑎𝑙𝑒 𝑓𝑎𝑐𝑡𝑜𝑟

Training & testing
1. Dataset: KITTI VO/SLAM benchmark
(22 sequences of images / 10fps / dynamic object)
2. 7410 training samples (image and trajectory pair)
3. Implemented based on Theano
4. Hardware: Nvidia Tesla K40 GPU
5. 200 epochs
6. Learning rate 0.001
7. Regularization: dropout / early stopping
8. CNN: transfer learning from FlowNet
16

overfitting
 Orientation is more
prone to overfitting
17
DeepVO : Towards Visual Odometry with Deep Learning

Compare with
traditional VO
 Open-source VO library
LIBVISO2
 Monocular / Stereo
18
DeepVO : Towards Visual Odometry with Deep Learning

Trajectory (1/2)
19
DeepVO : Towards Visual Odometry with Deep Learning

Trajectory (2/2)
 No ground truth:
Seq11~19
20
DeepVO : Towards Visual Odometry with Deep Learning

21
DeepVO : Towards Visual Odometry with Deep Learning

Dynamic
 This research don’t
know how to deal
with this issue
 Traditional VO –
RANSAC (remove
outlier)
 Get more training
data
22
DeepVO : Towards Visual Odometry with Deep Learning

Conclusion
23
 End-to-end monocular VO based on Deep learning
 Deep RCNN
 No need to carefully tune the parameters of the
VO system
 It is not expected as a replacement to the classic
geometry based approach

What's hot

Image enhancementvsaranya169

Introduction to image contrast and enhancement methodAbhishekvb

New landmines detection using ground penetrating radarsourabh kant

Software Defined RadioKumar Vimal

Fuzzy Logic Based Edge DetectionDawn Raider Gupta

MicroLED : Latest Display Technology | PPTSeminar Links

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...harmonylab

Deep learning based object detection basicsBrodmann17

[CVPR2020読み会＠CV勉強会] 3D Packing for Self-Supervised Monocular Depth EstimationKazuyuki Miyazawa

【2017年】ディープラーニングのフレームワーク比較Ryota Suzuki

Image pre processingAshish Kumar

Introductory Level of SLAM SeminarDong-Won Shin

SfMLearner++ IntroHirohito Okuda

SCA Next Part 1 - Software Defined Radio (SDR) Webcast SlidesADLINK Technology IoT

A Beginner's Guide to Monocular Depth EstimationRyo Takahashi

Human Pose Estimation by Deep LearningWei Yang

Cw and fm cw radarVijendrasingh Rathor

Free space opticsPritesh Desai

Digital image processing kavitha muneeshwaran

Wavelet based image compression techniquePriyanka Pachori

What's hot (20)

Image enhancement

Introduction to image contrast and enhancement method

New landmines detection using ground penetrating radar

Software Defined Radio

Fuzzy Logic Based Edge Detection

MicroLED : Latest Display Technology | PPT

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transfo...

Deep learning based object detection basics

[CVPR2020読み会＠CV勉強会] 3D Packing for Self-Supervised Monocular Depth Estimation

【2017年】ディープラーニングのフレームワーク比較

Image pre processing

Introductory Level of SLAM Seminar

SfMLearner++ Intro

SCA Next Part 1 - Software Defined Radio (SDR) Webcast Slides

A Beginner's Guide to Monocular Depth Estimation

Human Pose Estimation by Deep Learning

Cw and fm cw radar

Free space optics

Digital image processing

Wavelet based image compression technique

Similar to DeepVO - Towards Visual Odometry with Deep Learning

(Research Note) Delving deeper into convolutional neural networks for camera ...Jacky Liu

Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Universitat Politècnica de Catalunya

Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformFadwa Fouad

Deep Learning Hardware: Past, Present, & FutureRouyun Pan

Review of Pose Recognition Systemsvivatechijri

Details of Lazy Deep Learning for Images Recognition in ZZ Photo appPAY2 YOU

Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Luba Elliott

Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals

H2O Distributed Deep Learning by Arno Candel 071614Sri Ambati

Iciap 2Ionut Mironica

Human Action Recognition Based on Spacio-temporal features-Posternikhilus85

Sparse representation based human action recognition using an action region-a...Wesley De Neve

Action Genome: Action As Composition of Spatio Temporal Scene GraphsSangmin Woo

Exploring visual and motion saliency for automatic video object extractionMuthu Samy

Sub-sampled dictionaries for coarse-to-fine sparse representation-based human...Wesley De Neve

lec_11_self_supervised_learning.pdfAlamgirAkash3

Particle filter framework for salient object detection in videosProjectsatbangalore

最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui

Multispectral Purkinje ImagingPetteriTeikariPhD

Similar to DeepVO - Towards Visual Odometry with Deep Learning (20)

(Research Note) Delving deeper into convolutional neural networks for camera ...

Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019

Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform

Deep Learning Hardware: Past, Present, & Future

Review of Pose Recognition Systems

Details of Lazy Deep Learning for Images Recognition in ZZ Photo app

Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...

Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...

H2O Distributed Deep Learning by Arno Candel 071614

Iciap 2

Human Action Recognition Based on Spacio-temporal features-Poster

Sparse representation based human action recognition using an action region-a...

Action Genome: Action As Composition of Spatio Temporal Scene Graphs

Exploring visual and motion saliency for automatic video object extraction

Sub-sampled dictionaries for coarse-to-fine sparse representation-based human...

lec_11_self_supervised_learning.pdf

Particle filter framework for salient object detection in videos

最近の研究情勢についていくために - Deep Learningを中心に -

Multispectral Purkinje Imaging

Recently uploaded

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

Introduction and different types of Ethernet.pptxupamatechverse

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEslot gacor bisa pakai pulsa

Introduction to Multiple Access Protocol.pptxupamatechverse

Porous Ceramics seminar and technical writingrakeshbaidya232001

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...Call girls in Ahmedabad High profile

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth

(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

Recently uploaded (20)

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service

UNIT-III FMM. DIMENSIONAL ANALYSIS

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR

(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

Introduction and different types of Ethernet.pptx

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working

DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE

Introduction to Multiple Access Protocol.pptx

Porous Ceramics seminar and technical writing

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...

Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...

(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR

DeepVO - Towards Visual Odometry with Deep Learning

1. DeepVO Towards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks National Chung Cheng University, Taiwan Robot Vision Laboratory 2017/11/08 Jacky Liu

2. About this work DeepVO : Towards Visual Odometry with Deep Learning Sen Wang1,2, Ronald Clark2, Hongkai Wen2 and Niki Trigoni2 1. Edinburgh Centre for Robotics, Heriot-Watt University, UK 2. University of Oxford, UK Download this paper: http://senwang.gitlab.io/DeepVO/#paper Watch video: http://senwang.gitlab.io/DeepVO/#video 2 DeepVO : Towards Visual Odometry with Deep Learning

3. Contributions 1. Proving that Monocular VO could be build by End-to- End training 2. RCNN architecture could generalized to unseen environment 3. Complex movement could be modeled by RCNN 3 DeepVO : Towards Visual Odometry with Deep Learning

5. Related works Sparse  PTAM  ORB-SLAM Direct  DTAM 5 Network  CNN  RNN  LSTM

6. Network design 1. Traditional computer vision learn knowledge from appearance and image context 2. Visual odometry should learn from geometry. This is what RCNN tried to address 6 DeepVO : Towards Visual Odometry with Deep Learning

7. Network design 7 DeepVO : Towards Visual Odometry with Deep Learning

8. 8 DeepVO : Towards Visual Odometry with Deep Learning

9. Preprocessing  Normalizing inputs (speed up training) => subtracting the mean RGB values of the training set  Resize image to 64x  Stack two images to form a tensor 9 DeepVO : Towards Visual Odometry with Deep Learning

10. CNN  What this research mean by learning “geometric” feature? => They stacking two RGB images and feed it into CNN. Expecting the network to perform feature extraction on the concatenation of two consecutive monocular RGB images. 10 DeepVO : Towards Visual Odometry with Deep Learning

11. RNN  RNN is not suitable to directly learn sequential representation from high-dimensional raw data, such as images.  Hidden state: ℎ 𝑘 = ℋ 𝑊𝑥ℎ 𝑥 𝑘 + 𝑊ℎℎℎ 𝑘−1 + 𝑏ℎ  Output: 𝑦 𝑘 = 𝑊ℎ𝑦ℎ 𝑘 + 𝑏 𝑦 11 DeepVO : Towards Visual Odometry with Deep Learning 𝑏: bias vector𝑊: weight matrix 𝑘: time index ℋ: activation function Vanishing gradient problem

12. LSTM (Long short-term memory) 12 DeepVO : Towards Visual Odometry with Deep Learning Need depth to learn high level representation

13. 13 DeepVO : Towards Visual Odometry with Deep Learning

14. 14 Cost function 𝜃∗ = argmin 𝜃 1 𝑁 ෍ 𝑖=1 𝑁 ෍ 𝑘=1 𝑡 Ƹ𝑝 𝑘 − 𝑝 𝑘 2 2 + 𝜘 ො𝜑 𝑘 − 𝜑 𝑘 2 2 Conditional probability of pose 𝑝 𝑌𝑡 𝑋𝑡 = 𝑝(𝑦1, … , 𝑦𝑡|𝑥1, … , 𝑥𝑡) 𝜃∗ = argmin 𝜃 𝑝(𝑌𝑡|𝑋𝑡; 𝜃) Ground truth pose (𝑝 𝑘, 𝜑 𝑘) = (position, orientation) 𝑠𝑐𝑎𝑙𝑒 𝑓𝑎𝑐𝑡𝑜𝑟

15. Experimental results DeepVO VISO2 15

16. Training & testing 1. Dataset: KITTI VO/SLAM benchmark (22 sequences of images / 10fps / dynamic object) 2. 7410 training samples (image and trajectory pair) 3. Implemented based on Theano 4. Hardware: Nvidia Tesla K40 GPU 5. 200 epochs 6. Learning rate 0.001 7. Regularization: dropout / early stopping 8. CNN: transfer learning from FlowNet 16

17. overfitting  Orientation is more prone to overfitting 17 DeepVO : Towards Visual Odometry with Deep Learning

18. Compare with traditional VO  Open-source VO library LIBVISO2  Monocular / Stereo 18 DeepVO : Towards Visual Odometry with Deep Learning

19. Trajectory (1/2) 19 DeepVO : Towards Visual Odometry with Deep Learning

20. Trajectory (2/2)  No ground truth: Seq11~19 20 DeepVO : Towards Visual Odometry with Deep Learning

21. 21 DeepVO : Towards Visual Odometry with Deep Learning

22. Dynamic  This research don’t know how to deal with this issue  Traditional VO – RANSAC (remove outlier)  Get more training data 22 DeepVO : Towards Visual Odometry with Deep Learning

23. Conclusion 23  End-to-end monocular VO based on Deep learning  Deep RCNN  No need to carefully tune the parameters of the VO system  It is not expected as a replacement to the classic geometry based approach

DeepVO - Towards Visual Odometry with Deep Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to DeepVO - Towards Visual Odometry with Deep Learning

Similar to DeepVO - Towards Visual Odometry with Deep Learning (20)

Recently uploaded

Recently uploaded (20)

DeepVO - Towards Visual Odometry with Deep Learning