SlideShare a Scribd company logo

"Fundamentals of Monocular SLAM," a Presentation from Cadence

Edge AI and Vision Alliance
Edge AI and Vision Alliance
Edge AI and Vision AllianceEdge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.embedded-vision.com/platinum-members/cadence/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit-gadkari For more information about embedded vision, please visit: http://www.embedded-vision.com Shrinivas Gadkari, Design Engineering Director at Cadence, presents the "Fundamentals of Monocular SLAM" tutorial at the May 2019 Embedded Vision Summit. Simultaneous Localization and Mapping (SLAM) refers to a class of algorithms that enables a device with one or more cameras and/or other sensors to create an accurate map of its surroundings, to determine the device’s location relative to its surroundings and to track its path as it moves through this environment. This is a key capability for many new use cases and applications, especially in the domains of augmented reality, virtual reality and mobile robots. Monocular SLAM is a type of SLAM that relies exclusively on a monocular image sequence captured by a moving camera. In this talk, Gadkari introduces the fundamentals of monocular SLAM algorithms, from input images to 3D map. He takes a close look at key components of monocular SLAM algorithms, including Oriented Fast and Oriented Brief (ORB), Fundamental Matrix-based Pose Estimation, stitching together poses using translation estimation and loop closure. He also discusses implementation considerations for these components, including arithmetic precision required to achieve acceptable mapping and tracking accuracy.

"Fundamentals of Monocular SLAM," a Presentation from Cadence

1 of 29
Download to read offline
© 2019 Cadence Design Systems, Inc
Fundamentals of Monocular
SLAM
Shrinivas Gadkari and Rajas Mhaskar
Cadence Design Systems
May 2019
© 2019 Cadence Design Systems, Inc
SLAM: Problem description
• A camera mounted on a mobile device (cell phone, robot, drone) is moving in an unknown region,
capturing an image sequence of the surroundings. By analyzing the captured image sequence,
we would like to:
• Identify the distinct features (key points) in the surroundings, along with their locations
(Mapping)
• Trace the path taken by the moving camera in this region (Localization)
• Simultaneous Localization And Mapping = SLAM
2
Path of the
moving cameraKey Points
© 2019 Cadence Design Systems, Inc
SLAM: Block Diagram
3
Sensors
ORB Based
Keypoint Detection
and Matching
Pose Estimation:
Fundamental Matrix
Triangulation
Bundle Adjustment
Camera Position
+
Local Map
Path
Estimation
Loop Closure
+
Global
Bundle
Adjustment
© 2019 Cadence Design Systems, Inc
• Given a sequence of images,
• Find the important key points: (2D Location + Quantitative descriptor) per key point
• Match Key Points in different frames : Minimum “distance” between descriptors
• Final Goal:
o Locate key points
o Track the 2D movement of the key points across the image sequence
• We use ORB (Oriented fast, Rotated BRIEF) for key point detection and descriptor
computation
• Match Key-Points via minimum Hamming distance between BRIEF descriptors
4
Part-1: Key Point Estimation and Matching
© 2019 Cadence Design Systems, Inc
ORB-1: Image Pyramid
• Create a scaled image pyramid for
each frame
• Basic Idea: Different key points
(Features) may be best detected at
different image scales.
• Example: Consider Image scaling
needed to best detect the following
Key-Points
• Corner of room, corner of a table, corner
of a laptop, corner of a cell phone
• Typical Image Pyramid:
• 8 level scaling
• Down scale the image by a factor of 1.2
5
© 2019 Cadence Design Systems, Inc
ORB-2: FAST9: Features from Accelerated Segment Test
• Consider the center pixel – C in the window.
• Consider a circle of 16-pixels around C
To classify the pixel C as a key point:
• 9 or more contiguous pixels out of the 16 should
be darker than pixel C, with intensity difference >
Threshold
• Or, 9 or more contiguous pixels out of the 16
should be brighter than pixel C, with intensity
difference > Threshold
• Threshold can be say 20% of pixel value at C
6

Recommended

Introductory Level of SLAM Seminar
Introductory Level of SLAM SeminarIntroductory Level of SLAM Seminar
Introductory Level of SLAM SeminarDong-Won Shin
 
Computer Vision alignment
Computer Vision alignmentComputer Vision alignment
Computer Vision alignmentWael Badawy
 
Let's do Inverse RL
Let's do Inverse RLLet's do Inverse RL
Let's do Inverse RLDongmin Lee
 
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation..."Semantic Segmentation for Scene Understanding: Algorithms and Implementation...
"Semantic Segmentation for Scene Understanding: Algorithms and Implementation...Edge AI and Vision Alliance
 
Visual odometry presentation_without_video
Visual odometry presentation_without_videoVisual odometry presentation_without_video
Visual odometry presentation_without_videoIan Sa
 
Hable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr LightingHable John Uncharted2 Hdr Lighting
Hable John Uncharted2 Hdr Lightingozlael ozlael
 
ORB SLAM Proposal for NTU GPU Programming Course 2016
ORB SLAM Proposal for NTU GPU Programming Course 2016ORB SLAM Proposal for NTU GPU Programming Course 2016
ORB SLAM Proposal for NTU GPU Programming Course 2016Mindos Cheng
 

More Related Content

What's hot

VSlam 2017 11_20(張閎智)
VSlam 2017 11_20(張閎智)VSlam 2017 11_20(張閎智)
VSlam 2017 11_20(張閎智)Hung-Chih Chang
 
Interactive Editing of Signed Distance Fields
Interactive Editing of Signed Distance FieldsInteractive Editing of Signed Distance Fields
Interactive Editing of Signed Distance FieldsMatthias Trapp
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3stevemcauley
 
Computer vision - two view geometry
Computer vision -  two view geometryComputer vision -  two view geometry
Computer vision - two view geometryWael Badawy
 
07 regularization
07 regularization07 regularization
07 regularizationRonald Teo
 
More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...
More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...
More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...Colin Barré-Brisebois
 
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present..."Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...Edge AI and Vision Alliance
 
Lec11: Active Contour and Level Set for Medical Image Segmentation
Lec11: Active Contour and Level Set for Medical Image SegmentationLec11: Active Contour and Level Set for Medical Image Segmentation
Lec11: Active Contour and Level Set for Medical Image SegmentationUlaş Bağcı
 
Computer Vision - Stereo Vision
Computer Vision - Stereo VisionComputer Vision - Stereo Vision
Computer Vision - Stereo VisionWael Badawy
 
Objective Evaluation of Video Quality
Objective Evaluation of Video QualityObjective Evaluation of Video Quality
Objective Evaluation of Video QualityAnton Venema
 
Indoor Point Cloud Processing
Indoor Point Cloud ProcessingIndoor Point Cloud Processing
Indoor Point Cloud ProcessingPetteriTeikariPhD
 
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an Object3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an ObjectAnkur Tyagi
 
Depth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayNAVER Engineering
 
Chapter10 image segmentation
Chapter10 image segmentationChapter10 image segmentation
Chapter10 image segmentationasodariyabhavesh
 
Image processing presentation
Image processing presentationImage processing presentation
Image processing presentationBibus Poudel
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoSeongwon Hwang
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning Sean Meyn
 

What's hot (20)

VSlam 2017 11_20(張閎智)
VSlam 2017 11_20(張閎智)VSlam 2017 11_20(張閎智)
VSlam 2017 11_20(張閎智)
 
Interactive Editing of Signed Distance Fields
Interactive Editing of Signed Distance FieldsInteractive Editing of Signed Distance Fields
Interactive Editing of Signed Distance Fields
 
Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3Calibrating Lighting and Materials in Far Cry 3
Calibrating Lighting and Materials in Far Cry 3
 
Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
Computer vision - two view geometry
Computer vision -  two view geometryComputer vision -  two view geometry
Computer vision - two view geometry
 
07 regularization
07 regularization07 regularization
07 regularization
 
=SLAM ppt.pdf
=SLAM ppt.pdf=SLAM ppt.pdf
=SLAM ppt.pdf
 
Stereo vision
Stereo visionStereo vision
Stereo vision
 
More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...
More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...
More Performance! Five Rendering Ideas From Battlefield 3 and Need For Speed:...
 
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present..."Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
"Introduction to Feature Descriptors in Vision: From Haar to SIFT," A Present...
 
Lec11: Active Contour and Level Set for Medical Image Segmentation
Lec11: Active Contour and Level Set for Medical Image SegmentationLec11: Active Contour and Level Set for Medical Image Segmentation
Lec11: Active Contour and Level Set for Medical Image Segmentation
 
Computer Vision - Stereo Vision
Computer Vision - Stereo VisionComputer Vision - Stereo Vision
Computer Vision - Stereo Vision
 
Objective Evaluation of Video Quality
Objective Evaluation of Video QualityObjective Evaluation of Video Quality
Objective Evaluation of Video Quality
 
Indoor Point Cloud Processing
Indoor Point Cloud ProcessingIndoor Point Cloud Processing
Indoor Point Cloud Processing
 
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an Object3D Reconstruction from Multiple uncalibrated 2D Images of an Object
3D Reconstruction from Multiple uncalibrated 2D Images of an Object
 
Depth estimation do we need to throw old things away
Depth estimation do we need to throw old things awayDepth estimation do we need to throw old things away
Depth estimation do we need to throw old things away
 
Chapter10 image segmentation
Chapter10 image segmentationChapter10 image segmentation
Chapter10 image segmentation
 
Image processing presentation
Image processing presentationImage processing presentation
Image processing presentation
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Introducing Zap Q-Learning
Introducing Zap Q-Learning   Introducing Zap Q-Learning
Introducing Zap Q-Learning
 

Similar to "Fundamentals of Monocular SLAM," a Presentation from Cadence

Lecture 9-online
Lecture 9-onlineLecture 9-online
Lecture 9-onlinelifebreath
 
Computer Graphics Unit 1
Computer Graphics Unit 1Computer Graphics Unit 1
Computer Graphics Unit 1aravindangc
 
07 cie552 image_mosaicing
07 cie552 image_mosaicing07 cie552 image_mosaicing
07 cie552 image_mosaicingElsayed Hemayed
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 
Chapter 3 - Part 1 [Autosaved].pptx
Chapter 3 - Part 1 [Autosaved].pptxChapter 3 - Part 1 [Autosaved].pptx
Chapter 3 - Part 1 [Autosaved].pptxKokebe2
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Sean Moran
 
"The Perspective Transform in Embedded Vision," a Presentation from Cadence
"The Perspective Transform in Embedded Vision," a Presentation from Cadence"The Perspective Transform in Embedded Vision," a Presentation from Cadence
"The Perspective Transform in Embedded Vision," a Presentation from CadenceEdge AI and Vision Alliance
 
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdfCD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdfRajJain516913
 

Similar to "Fundamentals of Monocular SLAM," a Presentation from Cadence (20)

Lecture 9-online
Lecture 9-onlineLecture 9-online
Lecture 9-online
 
3DSensing.ppt
3DSensing.ppt3DSensing.ppt
3DSensing.ppt
 
Computer Graphics Unit 1
Computer Graphics Unit 1Computer Graphics Unit 1
Computer Graphics Unit 1
 
Lec-4_Rendering-1.pdf
Lec-4_Rendering-1.pdfLec-4_Rendering-1.pdf
Lec-4_Rendering-1.pdf
 
Mmclass5
Mmclass5Mmclass5
Mmclass5
 
2.circle
2.circle2.circle
2.circle
 
CG-Lecture3.pptx
CG-Lecture3.pptxCG-Lecture3.pptx
CG-Lecture3.pptx
 
Computer graphics 2
Computer graphics 2Computer graphics 2
Computer graphics 2
 
Cgm Lab Manual
Cgm Lab ManualCgm Lab Manual
Cgm Lab Manual
 
Primitives
PrimitivesPrimitives
Primitives
 
UNIT-1.pptx
UNIT-1.pptxUNIT-1.pptx
UNIT-1.pptx
 
07 cie552 image_mosaicing
07 cie552 image_mosaicing07 cie552 image_mosaicing
07 cie552 image_mosaicing
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 
Edge detection
Edge detectionEdge detection
Edge detection
 
Chapter 3 - Part 1 [Autosaved].pptx
Chapter 3 - Part 1 [Autosaved].pptxChapter 3 - Part 1 [Autosaved].pptx
Chapter 3 - Part 1 [Autosaved].pptx
 
raster algorithm.pdf
raster algorithm.pdfraster algorithm.pdf
raster algorithm.pdf
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
"The Perspective Transform in Embedded Vision," a Presentation from Cadence
"The Perspective Transform in Embedded Vision," a Presentation from Cadence"The Perspective Transform in Embedded Vision," a Presentation from Cadence
"The Perspective Transform in Embedded Vision," a Presentation from Cadence
 
PECCS 2014
PECCS 2014PECCS 2014
PECCS 2014
 
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdfCD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
CD504 CGM_Lab Manual_004e08d3838702ed11fc6d03cc82f7be.pdf
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...Edge AI and Vision Alliance
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic LeapEdge AI and Vision Alliance
 
“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from InstrumentalEdge AI and Vision Alliance
 
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...Edge AI and Vision Alliance
 
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...Edge AI and Vision Alliance
 
May 2023 Embedded Vision Summit Opening Remarks (May 23)
May 2023 Embedded Vision Summit Opening Remarks (May 23)May 2023 Embedded Vision Summit Opening Remarks (May 23)
May 2023 Embedded Vision Summit Opening Remarks (May 23)Edge AI and Vision Alliance
 
“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...
“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...
“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...Edge AI and Vision Alliance
 
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole GroupEdge AI and Vision Alliance
 
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...Edge AI and Vision Alliance
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...Edge AI and Vision Alliance
 
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...Edge AI and Vision Alliance
 
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...Edge AI and Vision Alliance
 
“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...
“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...
“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
 
“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental
 
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
“Introduction to Optimizing ML Models for the Edge,” a Presentation from Cisc...
 
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
“Efficient Neuromorphic Computing with Dynamic Vision Sensor, Spiking Neural ...
 
May 2023 Embedded Vision Summit Opening Remarks (May 23)
May 2023 Embedded Vision Summit Opening Remarks (May 23)May 2023 Embedded Vision Summit Opening Remarks (May 23)
May 2023 Embedded Vision Summit Opening Remarks (May 23)
 
“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...
“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...
“Frontiers in Perceptual AI: First-person Video and Multimodal Perception,” a...
 
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
“3D Sensing: Market and Industry Update,” a Presentation from the Yole Group
 
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
“Open Standards Unleash Hardware Acceleration for Embedded Vision,” a Present...
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
“Next-generation Computer Vision Methods for Automated Navigation of Unmanned...
 
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
“The OpenVX Standard API: Computer Vision for the Masses,” a Presentation fro...
 
“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...
“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...
“Modernizing the Development of AI-based IoT Devices with Wedge,” a Presentat...
 

Recently uploaded

Semiconductor Review Magazine Feature.pdf
Semiconductor Review Magazine Feature.pdfSemiconductor Review Magazine Feature.pdf
Semiconductor Review Magazine Feature.pdfkeyaramicrochipusa
 
Bluetooth Low Energy(BLE) and beacons working
Bluetooth Low Energy(BLE) and beacons workingBluetooth Low Energy(BLE) and beacons working
Bluetooth Low Energy(BLE) and beacons workingshrey Ansh
 
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre..."Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...shaiyuvasv
 
My self introduction to know others abut me
My self  introduction to know others abut meMy self  introduction to know others abut me
My self introduction to know others abut meManoj Prabakar B
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!KivenRaySarsaba
 
Azure Migration Guide for IT Professionals
Azure Migration Guide for IT ProfessionalsAzure Migration Guide for IT Professionals
Azure Migration Guide for IT ProfessionalsChristine Shepherd
 
M.Aathiraju Self Intro.docx-AD21001_____
M.Aathiraju Self Intro.docx-AD21001_____M.Aathiraju Self Intro.docx-AD21001_____
M.Aathiraju Self Intro.docx-AD21001_____Aathiraju
 
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdfQuinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdfDomotica daVinci
 
Importance of magazines in education ppt
Importance of magazines in education pptImportance of magazines in education ppt
Importance of magazines in education pptsafnarafeek2002
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch TuesdayIvanti
 
AWS for the beginning is cloud computing
AWS for the beginning  is  cloud computingAWS for the beginning  is  cloud computing
AWS for the beginning is cloud computingkajalghule1
 
Zi-Stick UBS Dongle ZIgbee from Aeotec manual
Zi-Stick UBS Dongle ZIgbee from  Aeotec manualZi-Stick UBS Dongle ZIgbee from  Aeotec manual
Zi-Stick UBS Dongle ZIgbee from Aeotec manualDomotica daVinci
 
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptxEvolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptxKyle Willson
 
Z-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdf
Z-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdfZ-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdf
Z-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdfDomotica daVinci
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Cfgmgmtcamp 2024 — eBPF-based Security Observability & Runtime Enforcement wi...
Cfgmgmtcamp 2024 — eBPF-based Security Observability & Runtime Enforcement wi...Cfgmgmtcamp 2024 — eBPF-based Security Observability & Runtime Enforcement wi...
Cfgmgmtcamp 2024 — eBPF-based Security Observability & Runtime Enforcement wi...Raphaël PINSON
 
Traffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptxTraffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptxharimaxwell0712
 
Navigating the Never Normal Strategies for Portfolio Leaders
Navigating the Never Normal Strategies for Portfolio LeadersNavigating the Never Normal Strategies for Portfolio Leaders
Navigating the Never Normal Strategies for Portfolio LeadersOnePlan Solutions
 

Recently uploaded (20)

Semiconductor Review Magazine Feature.pdf
Semiconductor Review Magazine Feature.pdfSemiconductor Review Magazine Feature.pdf
Semiconductor Review Magazine Feature.pdf
 
Bluetooth Low Energy(BLE) and beacons working
Bluetooth Low Energy(BLE) and beacons workingBluetooth Low Energy(BLE) and beacons working
Bluetooth Low Energy(BLE) and beacons working
 
GTA 6.pdf
GTA 6.pdfGTA 6.pdf
GTA 6.pdf
 
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre..."Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
 
My self introduction to know others abut me
My self  introduction to know others abut meMy self  introduction to know others abut me
My self introduction to know others abut me
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!
 
Azure Migration Guide for IT Professionals
Azure Migration Guide for IT ProfessionalsAzure Migration Guide for IT Professionals
Azure Migration Guide for IT Professionals
 
M.Aathiraju Self Intro.docx-AD21001_____
M.Aathiraju Self Intro.docx-AD21001_____M.Aathiraju Self Intro.docx-AD21001_____
M.Aathiraju Self Intro.docx-AD21001_____
 
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdfQuinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
 
Importance of magazines in education ppt
Importance of magazines in education pptImportance of magazines in education ppt
Importance of magazines in education ppt
 
My Country Mobile
My Country MobileMy Country Mobile
My Country Mobile
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch Tuesday
 
AWS for the beginning is cloud computing
AWS for the beginning  is  cloud computingAWS for the beginning  is  cloud computing
AWS for the beginning is cloud computing
 
Zi-Stick UBS Dongle ZIgbee from Aeotec manual
Zi-Stick UBS Dongle ZIgbee from  Aeotec manualZi-Stick UBS Dongle ZIgbee from  Aeotec manual
Zi-Stick UBS Dongle ZIgbee from Aeotec manual
 
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptxEvolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
 
Z-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdf
Z-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdfZ-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdf
Z-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdf
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Cfgmgmtcamp 2024 — eBPF-based Security Observability & Runtime Enforcement wi...
Cfgmgmtcamp 2024 — eBPF-based Security Observability & Runtime Enforcement wi...Cfgmgmtcamp 2024 — eBPF-based Security Observability & Runtime Enforcement wi...
Cfgmgmtcamp 2024 — eBPF-based Security Observability & Runtime Enforcement wi...
 
Traffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptxTraffic Signboard Classification with Voice alert to the driver.pptx
Traffic Signboard Classification with Voice alert to the driver.pptx
 
Navigating the Never Normal Strategies for Portfolio Leaders
Navigating the Never Normal Strategies for Portfolio LeadersNavigating the Never Normal Strategies for Portfolio Leaders
Navigating the Never Normal Strategies for Portfolio Leaders
 

"Fundamentals of Monocular SLAM," a Presentation from Cadence

  • 1. © 2019 Cadence Design Systems, Inc Fundamentals of Monocular SLAM Shrinivas Gadkari and Rajas Mhaskar Cadence Design Systems May 2019
  • 2. © 2019 Cadence Design Systems, Inc SLAM: Problem description • A camera mounted on a mobile device (cell phone, robot, drone) is moving in an unknown region, capturing an image sequence of the surroundings. By analyzing the captured image sequence, we would like to: • Identify the distinct features (key points) in the surroundings, along with their locations (Mapping) • Trace the path taken by the moving camera in this region (Localization) • Simultaneous Localization And Mapping = SLAM 2 Path of the moving cameraKey Points
  • 3. © 2019 Cadence Design Systems, Inc SLAM: Block Diagram 3 Sensors ORB Based Keypoint Detection and Matching Pose Estimation: Fundamental Matrix Triangulation Bundle Adjustment Camera Position + Local Map Path Estimation Loop Closure + Global Bundle Adjustment
  • 4. © 2019 Cadence Design Systems, Inc • Given a sequence of images, • Find the important key points: (2D Location + Quantitative descriptor) per key point • Match Key Points in different frames : Minimum “distance” between descriptors • Final Goal: o Locate key points o Track the 2D movement of the key points across the image sequence • We use ORB (Oriented fast, Rotated BRIEF) for key point detection and descriptor computation • Match Key-Points via minimum Hamming distance between BRIEF descriptors 4 Part-1: Key Point Estimation and Matching
  • 5. © 2019 Cadence Design Systems, Inc ORB-1: Image Pyramid • Create a scaled image pyramid for each frame • Basic Idea: Different key points (Features) may be best detected at different image scales. • Example: Consider Image scaling needed to best detect the following Key-Points • Corner of room, corner of a table, corner of a laptop, corner of a cell phone • Typical Image Pyramid: • 8 level scaling • Down scale the image by a factor of 1.2 5
  • 6. © 2019 Cadence Design Systems, Inc ORB-2: FAST9: Features from Accelerated Segment Test • Consider the center pixel – C in the window. • Consider a circle of 16-pixels around C To classify the pixel C as a key point: • 9 or more contiguous pixels out of the 16 should be darker than pixel C, with intensity difference > Threshold • Or, 9 or more contiguous pixels out of the 16 should be brighter than pixel C, with intensity difference > Threshold • Threshold can be say 20% of pixel value at C 6
  • 7. © 2019 Cadence Design Systems, Inc ORB-3: NMS: Non Maximum Suppression • Typically each corner tends to be detected at several adjacent locations/ pixels • Non-Max suppression is used to retain the most dominant corner pixel • Simple NMS on a 3x3 window: If the FAST score is maximum for the center pixel C compared 8 neighbors then the pixel is considered as corner. 7
  • 8. © 2019 Cadence Design Systems, Inc ORB-4: Rotation Invariance • Problem: As the camera moves, there can a non trivial camera rotation • The orientation of the surroundings for the same key point can be significantly different from frame to frame. • BRIEF descriptors are NOT rotationally invariant • Need to explicitly correct for Key Point rotation: 8 • Consider a Circular patch of fixed radius around the key point • Find an angle θ, such that if patch is rotated θ, the x-moment of the pixel values about the center pixel is = 0 • θ = tan-1(y-moment/x-moment) ▪ x-moment = ∑ (x – x0) P(x,y) ▪ y-moment = ∑ (y – y0) P(x,y) ▪ (x0,y0) are center pixel coordinates, P(x,y) is pixel value at (x, y)
  • 9. © 2019 Cadence Design Systems, Inc ORB-5: BRIEF: Binary Robust Independent Elementary Features • BRIEF is 256 bit descriptor that characterizes the key point surrounding area • BRIEF descriptor allows for key-point matching using Hamming Distance. • Consider a patch of size 31x31 around the key-point ` BRIEF descriptor : (b0, b1, …., b255) Each bit bi, is computed as: bi = 1 if P(xi,yi) > P(xi’,yi’), else = 0 (xi, yi), and (xi’, yi’) are predetermined test locations • Test patterns are chosen empirically. • Plot of brief pattern used by ORB SLAM. 9
  • 10. © 2019 Cadence Design Systems, Inc ORB-6: BRIEF Descriptor Matching • The Hamming distance between two BRIEF descriptors is the number of positions at which the corresponding symbols are different. • The BRIEF descriptor of each key point in the current frame is compared with the BRIEF descriptor of the corner points in the previous frame (within local vicinity) • The two points with minimum Hamming distance are considered as matching. • Heuristics used to eliminate incorrect matches 10
  • 11. © 2019 Cadence Design Systems, Inc Part-2: Mapping and Localization • What have we achieved till now : o Located key points in the image sequence o Tracked the 2D coordinates of each key point in the image sequence • We now analyze the 2D location data of the key point in the image sequence to estimate the 3D coordinates of each key point, and the 3D location and orientation of the camera (Pose of the Camera) in each frame. • Topics Covered: o Basics of Projection o Relative Pose Estimation o Global Pose Estimation o Loop Closure 11
  • 12. © 2019 Cadence Design Systems, Inc Basics Of Projection -1 • Key Point Q(x, y, z) has coordinates (x1, y1, z1) in Fr1. • Normalized Projected coordinates (x1’, y1’) of Q1’ on image plane located at Z1 = 1 • x1’ = x1/z1 • y1’ = y1/z1 • Actual pixel coordinates (xp1, yp1) related to (x1’, y1’) via: • xp1 = fx * x1’ + cx • yp1 = fy * y1’ + cy • (fx, fy) : arbitrary location of Image Plane on Z1, and different scaling along X1 and Y1 • (cx, cy) is location of center w.r.t top left pixel 12
  • 13. © 2019 Cadence Design Systems, Inc Basics Of Projection -2 • To compute (x1, y1, z1) from (x, y, z) – adjust for translation and rotation of frame Fr1 • 𝑥1 𝑦1 𝑧1 = 𝑟11 𝑟12 𝑟13 𝑟21 𝑟22 𝑟23 𝑟31 𝑟32 𝑟33 * 𝑥 − 𝑡𝑥1 𝑦 − 𝑡𝑦1 𝑧 − 𝑡𝑧1 • x1 = R1 * [x – t1] o R1 = rotational matrix of Fr1 w.r.t Fr0, orthonormal o t1 = translation vector of frame Fr1 w.r.t. Fr0 o x, x1 = coordinate vectors of point Q in Fr0 and Fr1 • Similarly for Frames Frk and Frk+1: xk = Rk * (x – tk) xk+1 = Rk+1 * (x – tk+1) xk+1 = Rk+1 * Rk -1 * xk - Rk+1 * (tk+1 – tk) • Let Rk+1,k = Rk+1 * Rk -1 : relative rotation matrix of Frk+1 w.r.t. Frk • Let hk+1, k = - Rk+1 * (tk+1 – tk) : relative rotated translation vector of of Frk+1 w.r.t. Frk xk+1 = Rk+1,k * xk + hk+1, k 13
  • 14. © 2019 Cadence Design Systems, Inc Basics Of Projection -3 • Expanding : xk+1 = Rk+1,k * xk + hk+1, k gives: • 𝑥 𝑘+1 𝑦 𝑘+1 𝑧 𝑘+1 = 𝑟11 𝑘+1,𝑘 𝑟12 𝑘+1,𝑘 𝑟13 𝑘+1,𝑘 𝑟21 𝑘+1,𝑘 𝑟22 𝑘+1,𝑘 𝑟23 𝑘+1,𝑘 𝑟31 𝑘+1,𝑘 𝑟32 𝑘+1,𝑘 𝑟33 𝑘+1,𝑘 𝑥 𝑘 𝑦 𝑘 𝑧 𝑘 + ℎ14 𝑘+1,𝑘 ℎ24 𝑘+1,𝑘 ℎ34 𝑘+1,𝑘 • We can rewrite this in terms of xk+1’, yk+1’, xk’, yk’ : • 𝑥′ 𝑘+1 𝑦′ 𝑘+1 1 ∗ 𝑇 𝐾+1,𝑘 ∗ 𝑅 𝐾+1,𝑘 ∗ 𝑥′ 𝑘 𝑦′ 𝑘 1 = 0 • Where, 𝑇 𝐾+1,𝑘= 0 −ℎ34 𝑘+1,𝑘 ℎ24 𝑘+1,𝑘 ℎ34 𝑘+1,𝑘 0 −ℎ14 𝑘+1,𝑘 −ℎ24 𝑘+1,𝑘 ℎ14 𝑘+1,𝑘 0 14
  • 15. © 2019 Cadence Design Systems, Inc Basics Of Projection - 4 • This is the fundamental equation of Projective Geometry : 𝑥′ 𝑘+1 𝑦′ 𝑘+1 1 ∗ 𝐹𝑘+1,𝑘 ∗ 𝑥′ 𝑘 𝑦′ 𝑘 1 = 0 • Where 𝐹𝑘+1,𝑘 = 𝑇𝑘+1,𝑘 ∗ 𝑅 𝑘+1,𝑘 is known as the Fundamental or Essential matrix • 𝑅 𝑘+1,𝑘 is orthonormal rotation matrix, and • Tk+1,k is skew-symmetric 15
  • 16. © 2019 Cadence Design Systems, Inc Relative Pose Estimation -1 • Given set of matching Key Point image coordinates, ( x’k, y’k ) and (x’k+1, y’k+1 ), we want to estimate the Fundamental matrix ∶ 𝐹𝑘+1,𝑘 • [x’k+1 y’k+1 1] ∗ 𝐹𝑘+1,𝑘 ∗ x’k y’k 1 = 0 • [x’k+1 y’k+1 1] 𝑓11 𝑓21 𝑓31 𝑓12 𝑓22 𝑓32 𝑓13 𝑓23 𝑓33 x’k y’k 1 = 0 • This represents a single linear equation in 9 unknowns for each set of matching Key-point coordinates • Use RANSAC based least square estimation to solve for {𝑓11, 𝑓12, 𝑓12, 𝑓21, 𝑓22, 𝑓23, 𝑓31, 𝑓32, 𝑓33} • Need to set any one coefficient = 1, and solve for other eight • Any scaled version of the resulting coefficients is a valid 𝐹𝑘+1,𝑘 matrix 16
  • 17. © 2019 Cadence Design Systems, Inc Relative Pose Estimation -2 • Fundamental matrix F is a product of two matrices T and R • F = T * R = 0 ℎ34 −ℎ24 −ℎ34 0 ℎ14 ℎ24 −ℎ14 0 𝑟11 𝑟21 𝑟31 𝑟12 𝑟22 𝑟32 𝑟13 𝑟23 𝑟33 • T is a skew symmetric matrix and R is the orthogonal rotation matrix. • To recover T: • (Trace (F * F T)/2 )* I - (F * F T) = (h14 2) (h14 ∗h24) (h14 ∗h34) (h14 ∗h24) (h24 2) (h24 ∗h34) (h14 ∗h34) (h24 ∗h34) (h34 2) • (a) Compute the above matrix, (b) From square root of first diagonal element, we can recover h14 (c) Dividing row-1 entries with h14 , we get (h14,h24,h34), (d) Another valid solution is (-h14,-h24,-h34) 17
  • 18. © 2019 Cadence Design Systems, Inc Relative Pose Estimation -3 • r1, r2, r3 are rows of F, or F = [r1 T r2 T r3 T]T • [Cofactor (F)] T = [(r2 x r3)T, (r3 x r1)T, (r1 x r2)T]T • Manipulating the matrices and using the fact that R is orthonormal, and T is skew-symmetric we can write: • [Cofactor (F)] T - (T * F) = (h14 2 + h24 2 + h34 2) * R • Since we have recovered two possible values for T, we get two possible values for the rotation matrix R : R1 and R2 • The rotation matrices R1 and R2 along with (h14,h24,h34), and (-h14,-h24,- h34) give four possible factorizations of the fundamental matrix F. • Next we need to choose one of the four possible factorizations 18
  • 19. © 2019 Cadence Design Systems, Inc Relative Pose Estimation -4 19 • Four possible factorizations of the fundamental matrix represent the four possible scenarios depicted here. • A and B are the camera centers while the T shaped structure indicates the image plane and the normal to the image plane • Only one of the 4 poses produce positive Z co-ordinates in both the frames. • The factorization that gives maximum key points with positive values for (zk, zk+1) is selected. • Hence we now have Rk+1,k and hk+1,k = [h14,h24,h34]T • Note: We have not recovered the magnitude of hk+1,k • Any scaled version of hk+1,k is an equally valid solution • We set magnitude of hk+1,k = 1
  • 20. © 2019 Cadence Design Systems, Inc Relative Pose Estimation -5 (Local Bundle Adjustment) • Inputs: ▪ Set of 3D points (xk , yk , zk) for key points common to frame Frk and Frk+1 ▪ Measured 2D image co-ordinates in both frames (xk ‘, yk ‘) and (xk+1 ‘, yk+1 ‘) ▪ Pose of Frk+1 w.r.t. FrK : Rk+1,k and hk+1,k • Objective: Tune the 3D points and the Pose of FrK+1 so as to minimize the total reprojection error • ∑ (xk/zk − xk’)2 + (yk/zk − yk’)2 + ∑ (xk+1/zk+1 − xk+1’)2 + (yk+1/zk+1 − yk+1’)2 ▪ xk+1 = r11 * xk + r12 * yk + r13 * zk + h14 ▪ yk+1 = r21 * xk + r22 * yk + r23 * zk + h24 ▪ zk+1 = r31 * xk + r32 * yk + r33 * zk + h34 ▪ Can be minimized via steepest descent coefficient tuning 20
  • 21. © 2019 Cadence Design Systems, Inc Global Pose Estimation -1 • Problem: Fix the magnitude of |h1,0| = 1 , and Estimate the magnitude of hk+1,k • Consider Frames Fr0, Fr1 and Fr2 and key points common to the three frames • Let (x0,y0,z0) be the 3D co-ordinates of a key point in Frame Fr0 estimated by triangulation between frames Fr0, and Fr1 • Using pose of Fr1 we can estimate the 3D coordinates of the key point in Fr1 as: x1 y1 z1 = 𝑟11 𝑟21 𝑟31 𝑟12 𝑟22 𝑟32 𝑟13 𝑟23 𝑟33 x0 y0 z0 + ℎ14 ℎ24 ℎ34 • Similarly we can triangulate frame Fr1 and Fr2 and estimate the 3D co-ordinates of the same key point in Fr1 assuming | h2,1 |= 1, as [x*1, y*1, z*1] • From these the two 3D coordinate values, we get – | h2,1 | = (x1/ x*1) = (y1/ y*1) = (z1/ z*1) • We can proceed in a recursive manner and estimate magnitude of all hk+1,k 21
  • 22. © 2019 Cadence Design Systems, Inc Global Pose Estimation -2 • We can also recursively estimate the camera rotation for each Frame: o Rk+1 = Rk * Rk+1,k • Relative translation vector between frame Frk+1 and Frk is: • (tk+1 – tk) = - Rk+1 -1 * hk+1, k • Adding the relative translation vectors, we can estimate the 3D trajectory of the camera path • An example of the estimated camera trajectory looks as shown here. • The blue curve is the true trajectory while red curve is the estimated trajectory. 22
  • 23. © 2019 Cadence Design Systems, Inc Loop Closure • As we continue to analyze key points in frames, we start encountering cases wherein the keypoints seen in the current keyframe are almost identical to the one seen in the starting keyframes. • At this stage we say that loop closure is detected and start the loop closure tuning. • The estimated trajectory and the true trajectory are significantly different due to the accumulated errors as evident in the picture on the previous slide. • The objective of loop closure is thus to fuse the last keyframe position with the starting keyframe position by correcting the errors in the estimated pose for all the keyframes. 23
  • 24. © 2019 Cadence Design Systems, Inc Loop Closure – Phase A • FrN and Fr0 are the two frames which have almost same key points. The absolute pose of FrN with respect to Fr0 is HN,0 • Estimated fundamental matrix between frames FrN and Fr0 is • FN,0 = 0 ℎ34 −ℎ24 −ℎ34 0 ℎ14 ℎ24 −ℎ14 0 𝑟11 𝑟21 𝑟31 𝑟12 𝑟22 𝑟32 𝑟13 𝑟23 𝑟33 • The estimated fundamental matrix is derived from an erroneous pose and will thus show significant deviation from the fundamental matrix equation. Thus the error function is - • EN,0 = σ 𝑥′ 𝑁 𝑦′ 𝑁 1 ∗ 𝐹 𝑁,0 ∗ 𝑥′ 0 𝑦′ 0 1 2 • Where (x’N, y’N) and (x’0, y’0 ) are the matching point 2D co- ordinates of key points common for Fr0, and FrN. • We minimize EN,0 by tuning the estimated translation magnitudes as well as the rotation and translation for the pose of individual keyframes. 24
  • 25. © 2019 Cadence Design Systems, Inc Loop Closure – Phase B • The cost function in phase A only corrected the relative pose between frames FrN and Fr0 - without accounting for any discrepancies in the intermediate frames. • Here we add some more components to the error function. • Consider frames Frk and Frk+2 and common keypoints between these two frames. We then define the error component • EK,K+2 = σ 𝑥′ 𝐾+2 𝑦′ 𝐾+2 1 ∗ 𝐹 𝐾+2,𝐾 ∗ 𝑥′ 𝐾 𝑦′ 𝐾 1 2 • The total cost function C = σ EK,K+2 + EN-1,0 + EN,1 + EN,0 • The objective is thus to minimize the cost function C by tuning the translation magnitudes | hk,k+1 | and also the translation and rotation coefficients. • The pose tuning in Loop Closure – Phase A, and Phase B is also known to as Global Pose Optimization 25
  • 26. © 2019 Cadence Design Systems, Inc Concluding Remarks-1 Summary and Implementation Considerations: - ORB based key point detection and matching - Typically fixed point implementation in a mix of 16b and 32b precision - Tends to be significant contributor to computational load - All subsequent pose estimation, tuning, loop closure, 3D key point location estimation, bundle adjustment need floating point arithmetic - Single precision implementation possible 26
  • 27. © 2019 Cadence Design Systems, Inc Concluding Remarks-2 Vision DSP Architectural Takeaways: - Speeding up floating point computations - Cadence Vision DSPs support 16 way vector floating point instructions - Recent Vision DSP doubles this capability - Accelerating ORB - Recent Cadence Vision DSP adds special instructions to accelerate FAST9 27
  • 28. © 2019 Cadence Design Systems, Inc References [1] Andrew Zisserman Richard Hartley. Multiple View Geometry. 2003. [2] Raul Mur-Artal, J. M. M. Montiel and Juan D. Tardos , “ORB-SLAM: A Versatile and Accurate Monocular SLAM System” [3] Raul Mur-Artal, Juan D. Tardos , “ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras ” [4] Berthold K.P. Horn , “Recovering Baseline and Orientation from ‘Essential’ Matrix ” 28
  • 29. © 2019 Cadence Design Systems, Inc Thank you ! 29