Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

"Deploying Deep Learning Models on Embedded Processors for Autonomous Systems with MATLAB," a Presentation from MathWorks

347 views

Published on

For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/mathworks/embedded-vision-training/videos/pages/may-2019-embedded-vision-summit-hiremath-chou

For more information about embedded vision, please visit:
http://www.embedded-vision.com

Sandeep Hiremath, Product Manager, and Bill Chou, Senior Computer Vision Scientist, both of MathWorks, present the "Deploying Deep Learning Models on Embedded Processors for Autonomous Systems with MATLAB" tutorial at the May 2019 Embedded Vision Summit.

In this presentation, Hiremath and Chou explain how to bring the power of deep neural networks to memory- and power-constrained devices like those used in robotics and automated driving. The workflow starts with an algorithm design in MATLAB, which enjoys universal appeal among engineers and scientists because of its expressive power and ease of use. The algorithm may employ deep learning networks augmented with traditional computer vision techniques and can be tested and verified within MATLAB.

Next, the networks are trained using MATLAB’s GPU and parallel computing support either on the desktop, a local compute cluster or in the cloud. In the deployment phase, code generation tools are employed to automatically generate optimized code that can target both embedded GPUs like Jetson, Jetson Drive AGX Xavier, Intel-based CPU platforms or ARM-based embedded platforms. The generated code leverages target-specific libraries that are highly optimized for the target architecture and memory model.

Published in: Technology
  • Login to see the comments

"Deploying Deep Learning Models on Embedded Processors for Autonomous Systems with MATLAB," a Presentation from MathWorks

  1. 1. © 2019 MathWorks, Inc. Deploying Deep Learning Models on Embedded Processors for Autonomous Systems with MATLAB Bill Chou, Sandeep Hiremath MathWorks May 2019
  2. 2. © 2019 MathWorks, Inc. Autonomous Systems 2
  3. 3. © 2019 MathWorks, Inc. Autonomous Systems Control Planning Perception 3
  4. 4. © 2019 MathWorks, Inc. Control Planning Perception Deep Learning for Perception in Autonomous Systems Path planning Sensor models & model predictive control Deep learning Sensor fusion 4
  5. 5. © 2019 MathWorks, Inc. Deep Learning in Automated Driving 5
  6. 6. © 2019 MathWorks, Inc. Outline Ground Truth Labeling Network Design and Training C/C++ and CUDA Code Generation Hardware Targeting (CPUs and GPUs) Key Takeaways Platform Productivity Framework Interoperability Key Takeaways Optimized C/C++ and CUDA Hardware Targeting Processor-in-loop (PIL) Testing 6
  7. 7. © 2019 MathWorks, Inc. Input Lane Detection Coordinate Transform Bounding Box Processing Object Detection Perception in Autonomous Application Output Example Used in Today’s Talk 7
  8. 8. © 2019 MathWorks, Inc. Outline Ground Truth Labeling Network Design and Training C/C++ and CUDA Code Generation Hardware Targeting (CPUs and GPUs) 8
  9. 9. © 2019 MathWorks, Inc. Ground Truth Labeling App 9
  10. 10. © 2019 MathWorks, Inc. Automate Labeling Lane Markers Vehicle Bounding Boxes
  11. 11. © 2019 MathWorks, Inc. Input Lane Detection Coordinate Transform Bounding Box Processing Object Detection Perception in Autonomous Application Output Deep Learning Models 11
  12. 12. © 2019 MathWorks, Inc. Importing Pre-trained Models >> net = alexnet OR Modify network layers Import Pre-trained networks (Alexnet, ResNet50) Re-train network with training data Detector object 12
  13. 13. © 2019 MathWorks, Inc. Interactive Network Design Modify network layers Import Pre-trained networks (Alexnet, ResNet50) Re-train network with training data Detector object 13
  14. 14. © 2019 MathWorks, Inc. Accelerated Training Modify network layers Import Pre-trained networks (Alexnet, ResNet50) Re-train network with training data Evaluate trained network Single CPU Single CPU Single GPU Single CPU Multiple GPUs Cloud GPUs 14
  15. 15. © 2019 MathWorks, Inc. Network Evaluation Modify network layers Import Pre-trained networks (Alexnet, ResNet50) Re-train network with training data Evaluate trained network 15
  16. 16. © 2019 MathWorks, Inc. Lane and Object Detectors Running in MATLAB 16
  17. 17. © 2019 MathWorks, Inc. Lane and Object Detectors Running in MATLAB 17
  18. 18. © 2019 MathWorks, Inc. Outline Ground Truth Labeling Network Design and Training C/C++ and CUDA Code Generation Hardware Targeting (CPUs and GPUs) 18
  19. 19. © 2019 MathWorks, Inc. Input Lane Detection Coordinate Transform Bounding Box Processing Object Detection Perception in Autonomous Application Output 19
  20. 20. © 2019 MathWorks, Inc. Logic Logic Input Output 20
  21. 21. © 2019 MathWorks, Inc. Multi-Platform Deep Learning Deployment NVIDIA Jetson 21 Logic Logic Data CenterWorkstation NVIDIA DRIVE Raspberry Pi
  22. 22. © 2019 MathWorks, Inc. Multi-Platform Deep Learning Deployment GPU Coder MATLAB Coder NVIDIA GPUs Intel CPUs ARM Cortex-A CPUs 22 Logic Logic
  23. 23. © 2019 MathWorks, Inc. Input Lane Detection Coordinate Transform Bounding Box Processing Object Detection Perception in Autonomous Application Output Generate Code from Non-Deep Learning Parts Generate Optimized CUDA/C++ Code 23
  24. 24. © 2019 MathWorks, Inc. 2200+ Functions for C/C++, 380+ Functions for CUDA Comm. Toolbox DSP System Toolbox Image Processing Toolbox Computer Vision Toolbox Signal Processing Toolbox Sensor Fusion Tracking Toolbox Wavelet ToolboxWLAN Toolbox Phased Array System Toolbox Statistics & Machine Learning Toolbox Core Math Fixed- Point Designer Automated Driving Toolbox Robotics System Toolbox 5G Toolbox 24
  25. 25. © 2019 MathWorks, Inc. Mapped to Optimization Libraries NVIDIA GPUs Intel CPUs ARM Cortex-A CPUs MATLAB Coder GPU Coder cuBLAS cuFFT cuSolver Thrust MKL- DNN FFTW BLAS TensorRT cuDNN ARM Compute Library OpenCV OpenCV
  26. 26. © 2019 MathWorks, Inc. GPUs: Automatically Extract Parallelism from MATLAB 1. Scalarized MATLAB (“for-all” loops) 2. Vectorized MATLAB (math operators and library functions) 3. Composite functions in MATLAB (maps to cuBLAS, cuFFT, cuSolver, cuDNN, TensorRT) Infer CUDA kernels from MATLAB loops Library replacement 26
  27. 27. © 2019 MathWorks, Inc. GPU Coder Compiler Transforms & Optimizations Control-Flow Graph Intermediate Representation ….…. CUDA Kernel Lowering Front End Traditional Compiler Optimizations MATLAB Library Function Mapping Parallel Loop Creation CUDA Kernel Creation cudaMemcpy Minimization Shared Memory Mapping CUDA Code Emission Scalarization Loop Perfectization Loop Interchange Loop Fusion Scalar Replacement Loop Optimizations 27
  28. 28. © 2019 MathWorks, Inc. Input Lane Detection Coordinate Transform Bounding Box Processing Object Detection Perception in Autonomous Application Output Generate Optimized Inference Code Layer Fusion Deep Learning Network Optimizations Memory Optimization Network Re- architecture Generate Code from Deep Learning Networks 28
  29. 29. © 2019 MathWorks, Inc. Original Network Deep Learning Network Optimizations Conv Batch Norm ReLu Add Conv ReLu Max Pool Max Pool Layer Fusion Optimized Computation Fused Conv Fused Conv BatchNormAdd Max Pool Max Pool Buffer Minimization Optimized Memory Fused Conv Fused Conv BatchNormAdd Max Pool Buffer A Buffer B Buffer D Max Pool Buffer C Buffer E X Reuse Buffer A X Reuse Buffer B 29
  30. 30. © 2019 MathWorks, Inc. Original Network Supported Pretrained Networks Conv Batch Norm ReLu Add Conv ReLu Max Pool Max Pool Layer Fusion Optimized Computation Fused Conv Fused Conv BatchNormAdd Max Pool Max Pool Buffer Minimization Optimized Memory Fused Conv Fused Conv BatchNormAdd Max Pool Buffer A Buffer B Buffer D Max Pool Buffer C Buffer E X Reuse Buffer A X Reuse Buffer B 30 SegNet ResNet-50 VGG-19 Inception-v3 SqueezeNet VGG-16 AlexNet GoogLeNet ResNet-101
  31. 31. © 2019 MathWorks, Inc. SegNet ResNet-50 VGG-19 Inception-v3 SqueezeNet VGG-16 AlexNet GoogLeNet ResNet-101 31 Optimized Deep Learning Libraries & Runtimes MKL- DNN ARM Compute Library cuDNN TensorRT NVIDIA GPUs Intel CPUs ARM Cortex-A CPUs GPU Coder MATLAB Coder
  32. 32. © 2019 MathWorks, Inc. 32 MKL- DNN ARM Compute Library cuDNN TensorRT NVIDIA GPUs Intel CPUs ARM Cortex-A CPUs GPU Coder MATLAB Coder Semantic Segmentation Defective Product Detection Blood Smear Segmentation
  33. 33. © 2019 MathWorks, Inc. Generating CUDA Code and Run on Titan V GPU 33
  34. 34. © 2019 MathWorks, Inc. How is the Performance? 34
  35. 35. © 2019 MathWorks, Inc. Intel® Xeon® CPU 3.6 GHz - NVIDIA libraries: CUDA10 - cuDNN 7 - Frameworks: TensorFlow 1.13.0, MXNet 1.4.0 PyTorch 1.0.0 Single Image Inference on Titan V using cuDNN PyTorch (1.0.0) MXNet (1.4.0) GPU Coder (R2019a) TensorFlow (1.13.0) 35
  36. 36. © 2019 MathWorks, Inc. TensorRT Accelerates Inference on Titan V Single Image Inference with ResNet-50 (Titan V) cuDNN TensorRT (FP32) TensorRT (INT8) GPU Coder TensorFlow 36
  37. 37. © 2019 MathWorks, Inc. Single Image Inference on CPU MATLAB TensorFlow MXNet MATLAB Coder PyTorch CPU, Single Image Inference (Linux) Intel® Xeon® CPU 3.6 GHz - Frameworks: TensorFlow 1.6.0, MXNet 1.2.1, PyTorch 0.3.1 37
  38. 38. © 2019 MathWorks, Inc. Outline Ground Truth Labeling Network Design and Training C/C++ and CUDA Code Generation Hardware Targeting (CPUs and GPUs) 38
  39. 39. © 2019 MathWorks, Inc. Access Target Peripherals from MATLAB 39 Jetson AGX Xavier Host Machine DRIVE AGX Raspberry Pi Peripheral Data
  40. 40. © 2019 MathWorks, Inc. Jetson AGX Xavier DRIVE AGX Raspberry Pi Deploy Application to Target Boards 40 Host Machine Generated CUDA Code Generated C/C++ Code
  41. 41. © 2019 MathWorks, Inc. Deploy Application to Jetson AGX Xavier Deploy Generated CUDA Code Target Display Video Feed 41 Jetson AGX Xavier Host Machine
  42. 42. © 2019 MathWorks, Inc. Deploy Application to Jetson AGX Xavier 42
  43. 43. © 2019 MathWorks, Inc. Deploy Generated CUDA Code Processor-in-the-Loop (PIL) Testing on Hardware Boards Jetson AGX Xavier Host Machine Send Inputs & Compare Results Data Exchange 43
  44. 44. © 2019 MathWorks, Inc. Musashi Seimitsu Industry Co.,Ltd. Detect Abnormalities in Automotive Parts MATLAB use in project: • Preprocessing of captured images • Image annotation for training • Deep learning based analysis • Various transfer learning methods (Combinations of CNN models, Classifiers) • Estimation of defect area using Class Activation Map (CAM) • Abnormality/defect classification • Deployment to NVIDIA Jetson using GPU Coder Automated visual inspection of 1.3 million bevel gear per month 44
  45. 45. © 2019 MathWorks, Inc. Summary Ground Truth Labeling Network Design and Training C/C++ and CUDA Code Generation Hardware Targeting (CPUs and GPUs) 45 Key Takeaways Platform Productivity Framework Interoperability Key Takeaways Optimized C/C++ and CUDA Hardware Targeting Processor-in-loop (PIL) Testing
  46. 46. © 2019 MathWorks, Inc. Thank You 46

×