Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Developing and Deploying Deep Learning Based Computer Vision Systems - Alka Nair - MathWorks

3,946 views

Published on

Deep Learning is enabling a wide range of computer vision applications from advanced driver assistance systems to sophisticated medical diagnostic devices. However, designing and deploying these applications involve a lot of challenges like handling large datasets, developing optimized models, effectively performing GPU computing and efficiently deploying deep learning models to embedded boards like NVIDIA Jetson. This session illustrates how MATLAB supports all phases of this workflow starting with algorithm design to automatically generating portable and optimized CUDA code helping engineers and scientists address the commonly observed challenges in deep learning workflow

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Developing and Deploying Deep Learning Based Computer Vision Systems - Alka Nair - MathWorks

  1. 1. 1© 2017 The MathWorks, Inc. Developing and Deploying Deep Learning Based Computer Vision Systems Alka Nair Application Engineer
  2. 2. 2 Alexnet Vehicle Detection People detection Lane detection ~30 Fps (Tegra X1) ~66 Fps (Tegra X1) ~20 Fps (K40c) ~130 Fps (K40c)
  3. 3. 3 Deep Learning Applications in Computer Vision HIGHWAY_SCENE Classification Semantic SegmentationRain Detection and Removal Human Aware Navigation for Robots
  4. 4. 4 Lane Detection on a Tesla K40 GPU
  5. 5. 5 End-to-End Application: Lane Detection Transfer Learning AlexNet – 1000 class classification Lane detection CNN Post-processing (find left/right lane points) Image Parabolic lane coefficients in world coordinates Left lane co-efficients Right lane co-efficients Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c MATLAB : A SINGLE PLATFORM FOR DEEP LEARNING TRAINING & DEPLOYMENT
  6. 6. 6 Deep Learning Challenges Big Data ▪ Handling large amounts of data ▪ Labeling thousands of images & videos Training and Testing Deep Neural Networks ▪ Accessing reference models from research ▪ Understanding network behavior ▪ Tuning hyperparameters and refining architectures ▪ Training takes hours-days Seamless Deployment onto embedded hardware Real world systems use more than deep learning Deep learning frameworks do not include “classical” computer vision Not a deep learning expert
  7. 7. 7 Access Large Sets of Images Handle Large Sets of Images Easily manage large sets of images - Single line of code to access images - Operates on disk, database, big-data file system imageData = imageDataStore(‘vehicles’) Easily manage large sets of images - Single line of code to access images - Operates on disk, database, big-data file system Organize Images in Folders (~ 10,000 images , 5 folders)
  8. 8. 8 Handle big image collection without big changes Images in local directory Images on HDFS
  9. 9. 9 Accelerating Ground Truth Labeling
  10. 10. 10 Generate Training Data from Labeled Images Labeled Lane Boundaries in Image Coordinates Correspond to coefficients of parabola representing left and right lane (a,b,c). Ground Truth Exported from Ground Truth Labeler App Parabolic Lane Boundary Modeling >> findparabolicLaneBoundaries Lane Boundary Models in World Coordinates
  11. 11. 11 End-to-End Application: Lane Detection Transfer Learning AlexNet – 1000 class classification Lane detection CNN Post-processing (find left/right lane points) Image Parabolic lane coefficients in vehicle coordinates Left lane co-efficients Right lane co-efficients Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c MATLAB : A SINGLE PLATFORM FOR DEEP LEARNING TRAINING & DEPLOYMENT
  12. 12. 12 Deep Learning Challenges Big Data ▪ Handling large amounts of data ▪ Labeling thousands of images & videos Training and Testing Deep Neural Networks ▪ Accessing reference models from research ▪ Understanding network behavior ▪ Tuning hyperparameters and refining architectures ▪ Training takes hours-days Seamless Deployment onto embedded hardware Real world systems use more than deep learning Deep learning frameworks do not include “classical” computer vision Not a deep learning expert
  13. 13. 13 Transfer Learning Workflow Early layers Last layers 1 million images 1000s classes Load pretrained network Fewer classes Learn faster New layers Replace final layers 100s images 10s classes Training options Train network Trained Network Predict and assess network accuracy
  14. 14. 14 Import Pre-Trained Models and Network Architectures Pretrained Models ▪ AlexNet ▪ VGG-16 ▪ VGG-19 ▪ GoogLeNet ▪ Resnet50 ▪ InceptionV3 ▪ ResNet - 101 Import Models from Frameworks ▪ Caffe Model Importer (including Caffe Model Zoo) – importCaffeLayers – importCaffeNetwork ▪ TensorFlow-Keras Model Importer – importKerasLayers – importKerasNetwork Download from within MATLAB net = alexnet; net = vgg16; net = vgg19; net = googlenet; net = resnet50; net = inceptionv3; net = resnet101;
  15. 15. 15 Visualizations for Understanding Network Behavior ▪ Custom visualizations – Example: Class Activation Maps Filters … Activations Deep Dream http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf
  16. 16. 16 Augment Training Images imageAugmenter = imageDataAugmenter('RandRotation',[-180 180]) Rotation Reflection Scaling Shearing Translation Colour pre-processing Resize / Random crop / Centre crop
  17. 17. 17 Transfer Learning Workflow Early layers Last layers 1 million images 1000s classes Load pretrained network Fewer classes Learn faster New layers Replace final layers 100s images 10s classes Training options Train network Trained Network Predict and assess network accuracy
  18. 18. 18 Transfer Learning Workflow Early layers Last layers 1 million images 1000s classes Load pretrained network Fewer classes Learn faster New layers Replace final layers 100s images 10s classes Training options Train network Trained Network Predict and assess network accuracy
  19. 19. 19 Transfer Learning Workflow Early layers Last layers 1 million images 1000s classes Load pretrained network Fewer classes Learn faster New layers Replace final layers 100s images 10s classes Training options Train network Trained Network Predict and assess network accuracy
  20. 20. 20 Training Deep Neural Networks trainingOptions ▪ Plot training metrics – Training accuracy, smoothed training accuracy, validation accuracy – Training loss, smoothed training loss, and validation loss ▪ Debug training – Stop and check current state – Save / load checkpoint networks – Custom output function (stopping condition, visualization, etc.) ▪ Bayesian optimization for hyperparameter tuning Learn More
  21. 21. 21 Transfer Learning Workflow Early layers Last layers 1 million images 1000s classes Load pretrained network Fewer classes Learn faster New layers Replace final layers 100s images 10s classes Training options Train network Trained Network Predict and assess network accuracy MATLAB Provides Evaluation Frameworks for Different Classes of Deep Learning Problems
  22. 22. 22 Lane Detection
  23. 23. 23 Deep learning on CPU, GPU, multi-GPU and clusters Single CPU Single CPU Single GPU Single CPU Multiple GPUs On-prem server with GPUs Cloud GPUs (AWS, Azure, etc.) Deep Learning on Cloud Whitepaper
  24. 24. 24 Training in MATLAB is fast MATLAB is more than 4x faster than TensorFlow AlexNet CNN architecture trained on the ImageNet dataset, using batch size of 32, on a Windows 10 desktop with single NVIDIA GPU (Titan Xp). TensorFlow version 1.2.0.
  25. 25. 25 Deep Learning Challenges Big Data ▪ Handling large amounts of data ▪ Labeling thousands of images & videos Training and Testing Deep Neural Networks ▪ Accessing reference models from research ▪ Understanding network behavior ▪ Tuning hyperparameters and refining architectures ▪ Training takes hours-days Seamless Deployment onto embedded hardware Real world systems use more than deep learning Deep learning frameworks do not include “classical” computer vision Not a deep learning expert
  26. 26. 26 Algorithm Design to Embedded Deployment Workflow Conventional Approach Desktop GPU High-level language Deep learning framework Large, complex software stack 1 Desktop GPU C++ C/C++ Low-level APIs Application-specific libraries 2 C++ Embedded GPU C/C++ Target-optimized libraries Optimize for memory & speed 3 Challenges • Integrating multiple libraries and packages • Verifying and maintaining multiple implementations • Algorithm & vendor lock-in
  27. 27. 27 GPU Coder for Deployment: New Product in R2017b Neural Networks Deep Learning, machine learning Image Processing and Computer Vision Image filtering, feature detection/extraction Signal Processing and Communications FFT, filtering, cross correlation, 7x faster than state-of-art 700x faster than CPUs for feature extraction 20x faster than CPUs for FFTs GPU Coder Accelerated implementation of parallel algorithms on GPUs
  28. 28. 28 Algorithm Design to Embedded Deployment Workflow with GPU Coder MATLAB algorithm (functional reference) Functional test1 Deployment unit-test 2 Desktop GPU C++ Deployment integration-test 3 Desktop GPU C++ Real-time test4 Embedded GPU .mex .lib Cross-compiled .lib Build type Call CUDA from MATLAB directly Call CUDA from (C++) hand- coded main() Call CUDA from (C++) hand-coded main().
  29. 29. 29 CUDA Code Generation from GPU Coder
  30. 30. 30
  31. 31. 31 End-to-End Application: Lane Detection Transfer Learning AlexNet – 1000 class classification Lane detection CNN Post-processing (find left/right lane points) Image Parabolic lane coefficients in world coordinates Left lane co-efficients Right lane co-efficients Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c MATLAB : A SINGLE PLATFORM FOR DEEP LEARNING TRAINING & DEPLOYMENT https://tinyurl.com/ybaxnxjg
  32. 32. 32 Alexnet Inference on Intel CPUs MATLAB (R2017b Release 2) TensorFlow MXNet Caffe2
  33. 33. 33 Alexnet Inference on NVIDIA Titan Xp MATLAB GPU Coder (R2017b) TensorFlow (1.2.0) Caffe2 (0.8.1) Framespersecond Batch Size CPU Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz GPU Pascal Titan Xp cuDNN v5 Testing platform MXNet (0.10) MATLAB (R2017b) 2x 7x5x
  34. 34. 34 Alexnet inference on NVIDIA GPUs 0 1 2 3 4 5 6 7 8 9 CPU resident memory GPU peak memory (nvidia-smi) Memoryusage(GB) Batch Size1 16 32 64 CPU Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50 GHz GPU Tesla K40c Py-Caffe GPUCoder TensorFlow MATLABw/PCT C++-Caffe
  35. 35. 35 Design Deep Neural Networks in MATLAB and Deploy with GPU Coder Design Deep Learning & Vision Algorithms Highlights ▪ Manage large image sets ▪ Easy access to models like AlexNet, GoogleNet ▪ Pre-built training frameworks ▪ Automate ground truth labeling apps Highlights ▪ Automate optimized CUDA code generation with GPU Coder ▪ Deployed models upto 4.5x faster than Caffe2 and 7x faster than Tensor High Performance Deployment
  36. 36. 36 Deep Learning Onramp Free Introductory Course Available Here
  37. 37. 37 Visit MathWorks Booth to Learn More HIGHWAY_SCENE Classification Car Car CarDetection Lane Lane Regression Semantic Segmentation

×