Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"Deploying Visual SLAM in Low-power Devices," a Presentation from CEVA


Published on

For the full video of this presentation, please visit:

For more information about embedded vision, please visit:

Ben Weiss, Customer Solutions Engineer in the CSG Group at CEVA, presents the "Deploying Visual SLAM in Low-power Devices" tutorial at the May 2019 Embedded Vision Summit.

Simultaneous localization and mapping (SLAM) technology has been evolving for quite some time, including visual SLAM, which relies primarily on image data. But implementing fast, accurate visual SLAM in embedded devices has been challenging due to high compute and precision requirements. Recent improvements in embedded processors enable deployment of visual SLAM in low-cost, low-power, mass-market systems, but implementing SLAM on such platforms can be challenging.

In this talk, Weiss explores the current state of visual SLAM algorithms and shows how CEVA processors and software enable easy migration of SLAM algorithms from research to cost- and power-optimized production systems.

Published in: Technology
  • Login to see the comments

"Deploying Visual SLAM in Low-power Devices," a Presentation from CEVA

  1. 1. © 2019 CEVA Inc. Deploying Visual SLAM into Low-Power Devices Ben Weiss CEVA Inc. May 2019
  2. 2. © 2019 CEVA Inc. Agenda In this talk, we’ll be discussing: • Visual SLAM Introduction • Real-time implementation challenges • How to overcome these challenges using the CEVA-SLAM SDK powered by CEVA-XM vision DSP and NeuPro AI Processor 2
  3. 3. © 2019 CEVA Inc. CEVA Intro: Signal Processing and AI Silicon IP 3 We innovate and license technologies to connect sensors and process data on devices Powered by Audio DSPs and voice software for any voice-enabled device Imaging Computer vision Deep learning Wi-Fi: 802.11n/ac/ax: AP & client Bluetooth 5: dual mode & BLE Handsets Base Stations IoT AI Processors scaling from IoT to Automotive
  4. 4. © 2019 CEVA Inc. Introduction
  5. 5. © 2019 CEVA Inc. Why Do We Need SLAM? • AR/VR, indoor navigation require accurate device positioning and orientation • GPS and Inertial Measurement Units (IMUs) cannot meet this requirement • For real immersive AR experiences or truly autonomous devices, we need something else: SLAM is the solution 5 Introduction
  6. 6. © 2019 CEVA Inc. What Is SLAM? Simultaneous Localization and Mapping It’s a process of determining the position and orientation of a sensor with respect to its surroundings, while simultaneously mapping the environment around that sensor 6 Introduction
  7. 7. © 2019 CEVA Inc. SLAM Productization Challenges • Leading SLAM solutions designed for CPUs • SLAM processing imposes high computational load • Leads to high power consumption and low frame rates 7 High Power Consumption Battery drain Short user experience Low Frame Rate Less accurate Poor user experience Solution - offload the main CPU to an efficient DSP!
  8. 8. © 2019 CEVA Inc. Feature-based Visual SLAM
  9. 9. © 2019 CEVA Inc. Short Overview Tracking - Camera motion estimations • Real-time restrictions, high FPS, mostly fixed-point Mapping - Estimates 3D positions of feature points • Semi real-time, mostly floating-point Loop Closure - Global camera trajectory optimization • Extremely high computation load, mostly floating-point 9 Feature-based Visual SLAM
  10. 10. © 2019 CEVA Inc. Building Blocks SLAM modules share similar operative building blocks, such as: Image Processing • Image pyramid • Feature detection (FAST9, DoG) • Feature descriptor (ORB, FREAK) • Descriptor matching • Dominant data type: 8-bit fixed-point 10 Feature-based Visual SLAM
  11. 11. © 2019 CEVA Inc. Building Blocks Math Operation • Linear algebra, matrix manipulation • Linear equation solving • Dominant data type: floating-point 11 Feature-based Visual SLAM
  12. 12. © 2019 CEVA Inc. Computer Vision DSP Advantages and Challenges
  13. 13. © 2019 CEVA Inc. Advantages • Strong ALU • Powerful MAC capabilities • Powerful floating-point capabilities • High throughput memory access • Dedicated vision instructions • Efficient and low power consumption 13 Computer Vision DSP Advantages and Challenges Very fast and efficient processing
  14. 14. © 2019 CEVA Inc. Challenges Typical SLAM building blocks processing challenges: • Small patch processing, like FREAK descriptors • Non-consecutive memory accesses • Sparse matrix manipulation 14 Computer Vision DSP Advantages and Challenges
  15. 15. © 2019 CEVA Inc. Challenges • Efficient processing can’t rely on a data cache • Using local data memory more efficient • Zero-delay memory access • Fine-tuned data pre-fetching is critical (via double buffer) • But, limited local memory also leads to • Image partitioning (tiling) • DMA configuration and monitoring • Processing overhead • DDR traffic increase 15 Computer Vision DSP Advantages and Challenges
  16. 16. © 2019 CEVA Inc. Challenges Processing modules should integrate with existing applications: • Memory sharing with main CPU, usually uses virtual memory • Data collection, packing, transferring • Control, synchronization, monitoring between CPU and DSP 16 Computer Vision DSP Advantages and Challenges
  17. 17. © 2019 CEVA Inc. Challenges - Summary Summary • Efficient processing - not trivial • Integration with application – requires specific handling So, how can we harness the advantages of a Vision DSP ? We need a software framework to overcome those challenges. Let’s see how this can be done 17 Computer Vision DSP Advantages and Challenges
  18. 18. © 2019 CEVA Inc. CEVA-SLAM SDK
  19. 19. © 2019 CEVA Inc. SLAM Framework SLAM acceleration framework requires: 19 Computer Vision DSP Advantages and Challenges Powerful Hardware Efficient Software Easy Integration CEVA-SLAM SDK optimized for CEVA-XM Vision DSPs & NeuPro AI Processors
  20. 20. © 2019 CEVA Inc. Embedded SLAM Acceleration Powerful Vision DSP • CEVA-XM processor family • 128 16-bit MAC/cycle • 32 floating-point operations/cycle • 512-bit memory access/cycle • Flexible random memory access 32 addresses/cycle • Flexible image DMA • Dedicated vision instruction set 20 CEVA-SLAM SDK
  21. 21. © 2019 CEVA Inc. Optimized Embedded Algorithms • Image pyramid • Feature detection • Feature descriptor • Feature matching (projection, grid search) • Linear equation solving • Matrix manipulation 21 CEVA-SLAM SDK
  22. 22. © 2019 CEVA Inc. Optimized Embedded Algorithms – cont. Feature matching: 1. Transform points 2. Select relevant candidates 3. Calculate matching score (Hamming distance) 4. Select best match Typically comparing 200 features vs. 2,000 candidate matches requires 400,000 match operations! 22 CEVA-SLAM SDK
  23. 23. © 2019 CEVA Inc. Optimized Embedded Algorithms – cont. That’s a very intensive process, what can be done? 1. Focus on relevant data: Simple solution requires much redundant processing and data traffic to DDR. Taking only relevant data for each candidate reduces cost by 8- 10x. Smart data pre-fetch selects, fetches only relevant candidates for each descriptor 2. Accelerate processing using DSP’s SIMD capabilities 23 CEVA-SLAM SDK
  24. 24. © 2019 CEVA Inc. Optimized Embedded Algorithms – cont. Now, let’s take a deeper look at Hamming distance calculation. Processing composed of Hamming distance calculation and control/management code (candidate selection, sorting, etc) CEVA-XM6 efficient Hamming distance calculation instruction – calculates 256-bit hamming value in half cycle 24 CEVA-SLAM SDK Vector processing Result management Time Hamming Result management Time
  25. 25. © 2019 CEVA Inc. Optimized Embedded Algorithms – cont. Powerful vector capabilities not utilized: Solution: Extend vector processing period – process multiple descriptors in parallel Process 16 or 32 descriptor candidates same time! 25 CEVA-SLAM SDK Vector processing Result management Time Vector processing Result management Time
  26. 26. © 2019 CEVA Inc. Optimized Embedded Algorithms – cont. Now have efficient parallel value processing. But, how to access multiple descriptors located in random locations in parallel? CEVA-XM6 includes unique load/store mechanism to parallel access 32 different addresses in single cycle Makes it possible 26 CEVA-SLAM SDK 1 2 3 4 5 b0b1b2b3b4b5b6b7Vector Register Scattered Features
  27. 27. © 2019 CEVA Inc. Optimized Embedded Algorithms Tracking performance example: 60 frames/sec consume only 86 mW, utilizing only a fraction of DSP processing Image Pyramid 8 levels, 1.2 scale ratio FAST9 8 levels, NMS, about 1,100 Key points ORB Extraction 200 features ORB Matching Radios search, 200 vs 2,000 candidates 27 Frame size: 1280x720, DSP: CEVA-XM6, TSMC 16nm CEVA-SLAM SDK
  28. 28. © 2019 CEVA Inc. Integration Framework Integration framework • Easy integration API, based on common APIs • CPU/DSP communication framework • Synchronization and monitoring • Task scheduling • Data collecting, packing, and transferring 28 CEVA-SLAM SDK All tuned to work together CPU SLAM Blocks Tracking Mapping Loop Closure SLAM Interface DSP SLAM Blocks SLAM Manager Provided by CEVA
  29. 29. © 2019 CEVA Inc. Summary • Powerful and efficient hardware • Optimized and mature software functions • Easy to integrate framework – short time to market For more information, welcome at CEVA’s booth 29 CEVA-SLAM SDK
  30. 30. © 2019 CEVA Inc. Resources 30 General Visual SLAM algorithms: a survey from 2010 to 2016 0.1186/s41074-017-0027-2 CEVA Materials CEVA SLAM & ADK Overview https://www.ceva-
  31. 31. © 2019 CEVA Inc. Thank You. Any Questions?