Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Presentation from CEVA


Published on

For the full video of this presentation, please visit:

For more information about embedded vision, please visit:

Yair Siegel, Director of Segment Marketing at CEVA, presents the "Fast Deployment of Low-power Deep Learning on CEVA Vision Processors" tutorial at the May 2016 Embedded Vision Summit.

Image recognition capabilities enabled by deep learning are benefitting more and more applications, including automotive safety, surveillance and drones. This is driving a shift towards running neural networks inside embedded devices. But, there are numerous challenges in squeezing deep learning into resource-limited devices. This presentation details a fast path for taking a neural network from research into an embedded implementation on a CEVA vision processor core, making use of CEVA’s neural network software framework. Siegel explains how the CEVA framework integrates with existing deep learning development environments like Caffe, and how it can be used to create low-power embedded systems with neural network capabilities.

Published in: Technology
  • Be the first to comment

"Fast Deployment of Low-power Deep Learning on CEVA Vision Processors," a Presentation from CEVA

  1. 1. Copyright © 2016 CEVA 1 Yair Siegel May 3, 2016 Fast Deployment of Low-power Deep Learning on CEVA Vision Processors
  2. 2. Copyright © 2016 CEVA 2 CEVA — The leading licensor of ultra-low-power signal processing IPs for embedded devices Imaging & Vision Audio, Voice, Sensing Connectivity Communication >7 Billion CEVA-powered devices shipped world-wide
  3. 3. Copyright © 2016 CEVA 3 • CEVA Deep Neural Network (CDNN) Software Framework • Accelerates machine learning deployment for embedded systems • Utilizes CEVA-XM4 imaging & vision DSP • Targeted at object recognition and vision analytics • Automatic conversion from offline neural networks to real-time networks Scope * Vs. GPU-based systems ** Vs. typical implementation 30x Lower Power* 15x Lower Memory Bandwidth** 30% Faster Processing*
  4. 4. Copyright © 2016 CEVA 4 Presentation Outline 1. Backgrounder 2. CEVA Deep Neural Networks Introduction 3. Neural Networks Development Flow 4. AlexNet Example 5. Summary
  5. 5. Copyright © 2016 CEVA 5 • Image signal processor (ISP)* • Image registration • Depth map generation • Point cloud processing • 3D scanning • 3D content creation CEVA in the Vision Space 3D vision Computational photography Visual perception Enabling Intelligent Vision Processing Left Image Right Image Depth Data Images, Data Encode* * These are most appropriately implemented by external HW accelerators • Refocus image • Video stabilization • Low-light image enhance • Zoom • Super-resolution • Background removal • HDR • Deep learning (CNN, DNN) • Object detection, recognition & tracking • Augmented reality (AR) • Natural user interface (NUI) • Context aware algorithms • Biometric authentication
  6. 6. Copyright © 2016 CEVA 6 • 4th-generation imaging and vision processor IP • Brings embedded systems closer to human vision and visual perception • Vector-type processor; combines fixed- and floating-point math; up to 4096-bit processing per cycle • Includes vision processor, libraries, tools and applications (CEVA, SW partners, service experts) • Mature: 10+ design wins, Silicon available in Q2/2016 • CNN-based algorithms combined w/traditional algorithms CEVA-XM4™ Imaging & Vision DSP
  7. 7. Copyright © 2016 CEVA 7 • Human brain based on neural networks, used for any cognitive processing: visual, audio, other senses • Networks develop over time, data collected & analyzed • “Training” phase – Learning new types from examples • “The hunt” to mimic human perception in computers • Horsepower, efficient engine, algorithmic quality — limiters • Big progress here recently Neural Networks Basics Output LayerInput Layer Hidden Layers Connections, Weights Neurons "...a computing system made up of a number of simple, highly interconnected processing elements, which process information by their dynamic state response to external inputs.”* *"Neural Network Primer: Part I" by Maureen Caudill, AI Expert, Feb. 1989 High-time for neural networks in embedded systems
  8. 8. Copyright © 2016 CEVA 8 • Deep Learning • Family of neural network methods, high number of layers (hence deep) • Convolutional Neural Networks (CNN) • Most popular deep learning neural network method • Benefits 1. Best recognition quality (vs. alternatives) 2. Re-trainable without code changes (implement once, use many times) • Caffe — deep learning framework • Popular open source software framework, used to build, train, activate neural networks. • Targets expression, speed, and modularity Deep Learning Neural Networks Object Recognition Driver Assistance (ADAS) Vision Analytics Artificial Intelligence (AI) Augmented Reality / Virtual Reality
  9. 9. Copyright © 2016 CEVA 9 • Computation intensive • 1Meg-Ops/layer — typical • Training in floating point — limited perf in embedded • High memory bandwidth • Between layers, fetching weights for layers • Example: AlexNet — 12MB in layers, 243MB weights in FP • Multi-ROI processing using same network • Evolving, TTM • Ability to modify network, change characteristics, quickly Neural Network Embedded Challenges All above in a cost and energy efficient form factor — must-have for mass market adoption
  10. 10. Copyright © 2016 CEVA 10 CEVA Deep Neural Network Flow with Caffe
  11. 11. Copyright © 2016 CEVA 11 CEVA Network Generator (offline) CEVA Deep Neural Network (CDNN) Features Real-time Neural Network Libraries CDNN deliverables include real-time example models for image classification, localization, object detection • Auto converts for power-efficiency • Floating to fixed point conversion • Adapts for embedded constraints • Keeps high accuracy, 1% deviation • RT algo development and deployment • Optimized for CEVA-XM4 vision DSP • Any network portion/layer • Fixed or variable input sizes • On-the-fly bandwidth optimizations
  12. 12. Copyright © 2016 CEVA 12 Real-Time CDNN Application Flow
  13. 13. Copyright © 2016 CEVA 13 • Example application steps to run on device using CDNN a. Create CDNN CEVA handle • CDNNCreate() b. Create network model (based on CDNN conversion tool outputs) • CDNNCreateNetwork() c. Initialize CDNN library (create a network and a memory database) • CDNNInitialize() d. Execute the network (no need for re-initialization) • CDNNNetworkClassify() Simplified Developer Flow via CDNN
  14. 14. Copyright © 2016 CEVA 14 Neural Networks on CEVA-XM4 m n Reducing Bandwidth Programmability & Time-To-Market Performance Optimization • Compress via prior knowledge • Reduce network redundancies • E.g., AlexNet fully connected —>6MB • Data reused on entry point • Flexible solution supporting any network • Quick turn-around time via port automation • Maximize MAC utilization • Combine small maps • Use fixed-point for higher performance • Utilize dedicated instructions • Parallel scatter-gather for activation layer
  15. 15. Copyright © 2016 CEVA 15 • Example based on Caffe open source implementation for CNN Example CNN — AlexNet Classification Probabilities Object AlexNet PC Probability (floating point) AlexNet on XM4 Probability (fixed point) Labrador retriever 90.44% 91.01% Golden retriever 4.45% 3.98% Beagle 0.21% 0.18% Kuvasz 0.12% 0.10% | | <1%
  16. 16. Copyright © 2016 CEVA 16 CEVA-XM4 CDNN Development Platform PCIe XM4 FPGAi.MX6 Host running Linux applications
  17. 17. Copyright © 2016 CEVA 17 iMX6 (Host) • Live AlexNet object recognition — come visit our booth! • Enables milli-watt products vs. watts on GPU CEVA-XM4 CDNN Demo Webcam FHD Shared Memory DMA DDR JBOX PC Debugger USB Daisy CDNN Engine CEVA Link CEVA Host Link HDMI XM4 FPGA Input Images Data TCM Code TCM Code Cache PCIe FHD to 224x224 Conversion
  18. 18. Copyright © 2016 CEVA 18 • SW framework for real-time, efficient object recognition & vision analytics • Accelerates deep learning application deployment • Harnessing CEVA-XM4 imaging & vision DSP • Lowest power & memory bandwidth solution • Enables real-time classification with pre-trained networks 1. Receives network model & weights as input (via “Caffe”) 2. Automatically converts to real-time network, via CEVA Network Generator 3. Utilizes real-time network models in CNN applications on CEVA-XM4 CEVA Deep Neural Network (CDNN) Summary
  19. 19. Copyright © 2016 CEVA 19 Backup Material
  20. 20. Copyright © 2016 CEVA 20 CEVA — The leading licensor of ultra-low-power signal processing IP’s for embedded devices More than 300 licensees to date >7 Billion CEVA-powered devices shipped worldwide to date 100 licensees of Wi-Fi & Bluetooth IP — and more than 1 billion chips shipped 3X the market share in DSP over any other DSP IP vendor 1 in 3 handsets worldwide are powered by CEVA DSP 5 billion DSP cores in audio/voice devices shipped to date >20 licensees for imaging and vision — shipping for first time in 2016
  21. 21. Copyright © 2016 CEVA 21 • Face Detection & Recognition • Universal Object Recognition • Pedestrian Detection • ADAS Algorithms (FCW, LDW) • 3D Depth Map Creation CEVA-XM4 Imaging & Vision IP Platform CPU-DSP Link – Communication Layer • Digital Video Stabilizer (DVS) • Super-Resolution (SR) Hardware Layer Software Layer App Dev. Kit (ADK) Host CV / OpenVX API SW Toolset Hardware Development Kit Partner Software Products CEVA-XM4 DSP Core Auto system handle CEVA Software Products CEVA-CV Libraries CEVA CNN Framework (CDNN)Android Framework (AMF) Provides OEM differentiation CPU offload Source code provided RTOS