1© 2017 The MathWorks, Inc.
Developing and Deploying Deep Learning Based
Computer Vision Systems
Alka Nair
Application Engineer
2
Alexnet Vehicle Detection
People detection
Lane detection
~30 Fps
(Tegra X1)
~66 Fps
(Tegra X1)
~20 Fps
(K40c)
~130 Fps
(K40c)
3
Deep Learning Applications in Computer Vision
HIGHWAY_SCENE
Classification
Semantic SegmentationRain Detection and Removal
Human Aware Navigation for Robots
4
Lane Detection on a Tesla K40 GPU
5
End-to-End Application: Lane Detection
Transfer Learning
AlexNet – 1000 class classification
Lane detection
CNN
Post-processing
(find left/right lane
points)
Image
Parabolic lane
coefficients in
world coordinates
Left lane co-efficients
Right lane co-efficients
Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c
MATLAB : A SINGLE PLATFORM FOR DEEP LEARNING TRAINING & DEPLOYMENT
6
Deep Learning Challenges
Big Data
▪ Handling large amounts of data
▪ Labeling thousands of images & videos
Training and Testing Deep Neural Networks
▪ Accessing reference models from research
▪ Understanding network behavior
▪ Tuning hyperparameters and refining architectures
▪ Training takes hours-days
Seamless Deployment onto embedded hardware
Real world systems use more than deep
learning
Deep learning frameworks do not
include “classical” computer vision
Not a deep learning expert
7
Access Large Sets of Images
Handle Large Sets of Images
Easily manage large sets of images
- Single line of code to access images
- Operates on disk, database, big-data file system
imageData =
imageDataStore(‘vehicles’)
Easily manage large sets of images
- Single line of code to access images
- Operates on disk, database, big-data file
system
Organize Images in Folders
(~ 10,000 images , 5 folders)
8
Handle big image collection without big changes
Images in local directory
Images on HDFS
9
Accelerating Ground Truth Labeling
10
Generate Training Data from Labeled Images
Labeled Lane Boundaries in
Image Coordinates
Correspond to coefficients of parabola representing left and right lane (a,b,c).
Ground Truth Exported from
Ground Truth Labeler App
Parabolic Lane Boundary
Modeling
>> findparabolicLaneBoundaries
Lane Boundary Models
in World Coordinates
11
End-to-End Application: Lane Detection
Transfer Learning
AlexNet – 1000 class classification
Lane detection
CNN
Post-processing
(find left/right lane
points)
Image
Parabolic lane
coefficients in
vehicle coordinates
Left lane co-efficients
Right lane co-efficients
Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c
MATLAB : A SINGLE PLATFORM FOR DEEP LEARNING TRAINING & DEPLOYMENT
12
Deep Learning Challenges
Big Data
▪ Handling large amounts of data
▪ Labeling thousands of images & videos
Training and Testing Deep Neural Networks
▪ Accessing reference models from research
▪ Understanding network behavior
▪ Tuning hyperparameters and refining architectures
▪ Training takes hours-days
Seamless Deployment onto embedded hardware
Real world systems use more than deep
learning
Deep learning frameworks do not
include “classical” computer vision
Not a deep learning expert
13
Transfer Learning Workflow
Early layers Last layers
1 million images
1000s classes
Load pretrained network
Fewer classes
Learn faster
New layers
Replace final layers
100s images
10s classes
Training options
Train network
Trained
Network
Predict and assess
network accuracy
14
Import Pre-Trained Models and Network Architectures
Pretrained Models
▪ AlexNet
▪ VGG-16
▪ VGG-19
▪ GoogLeNet
▪ Resnet50
▪ InceptionV3
▪ ResNet - 101
Import Models from Frameworks
▪ Caffe Model Importer
(including Caffe Model Zoo)
– importCaffeLayers
– importCaffeNetwork
▪ TensorFlow-Keras Model Importer
– importKerasLayers
– importKerasNetwork
Download from within MATLAB
net = alexnet;
net = vgg16;
net = vgg19;
net = googlenet;
net = resnet50;
net = inceptionv3;
net = resnet101;
15
Visualizations for Understanding Network Behavior
▪ Custom visualizations
– Example: Class Activation Maps
Filters
…
Activations
Deep Dream
http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf
16
Augment Training Images
imageAugmenter = imageDataAugmenter('RandRotation',[-180 180])
Rotation
Reflection
Scaling
Shearing
Translation
Colour pre-processing
Resize / Random crop / Centre crop
17
Transfer Learning Workflow
Early layers Last layers
1 million images
1000s classes
Load pretrained network
Fewer classes
Learn faster
New layers
Replace final layers
100s images
10s classes
Training options
Train network
Trained
Network
Predict and assess
network accuracy
18
Transfer Learning Workflow
Early layers Last layers
1 million images
1000s classes
Load pretrained network
Fewer classes
Learn faster
New layers
Replace final layers
100s images
10s classes
Training options
Train network
Trained
Network
Predict and assess
network accuracy
19
Transfer Learning Workflow
Early layers Last layers
1 million images
1000s classes
Load pretrained network
Fewer classes
Learn faster
New layers
Replace final layers
100s images
10s classes
Training options
Train network
Trained
Network
Predict and assess
network accuracy
20
Training Deep Neural Networks
trainingOptions
▪ Plot training metrics
– Training accuracy, smoothed training
accuracy, validation accuracy
– Training loss, smoothed training loss,
and validation loss
▪ Debug training
– Stop and check current state
– Save / load checkpoint networks
– Custom output function (stopping
condition, visualization, etc.)
▪ Bayesian optimization for
hyperparameter tuning Learn More
21
Transfer Learning Workflow
Early layers Last layers
1 million images
1000s classes
Load pretrained network
Fewer classes
Learn faster
New layers
Replace final layers
100s images
10s classes
Training options
Train network
Trained
Network
Predict and assess
network accuracy
MATLAB Provides Evaluation Frameworks for Different Classes of Deep Learning Problems
22
Lane Detection
23
Deep learning on CPU, GPU, multi-GPU and clusters
Single CPU Single CPU
Single GPU
Single CPU
Multiple GPUs
On-prem server with
GPUs
Cloud GPUs
(AWS, Azure, etc.)
Deep Learning on
Cloud Whitepaper
24
Training in MATLAB is fast
MATLAB is more than 4x
faster than TensorFlow
AlexNet CNN architecture trained on the ImageNet dataset, using batch size of 32, on a Windows 10 desktop with single
NVIDIA GPU (Titan Xp). TensorFlow version 1.2.0.
25
Deep Learning Challenges
Big Data
▪ Handling large amounts of data
▪ Labeling thousands of images & videos
Training and Testing Deep Neural Networks
▪ Accessing reference models from research
▪ Understanding network behavior
▪ Tuning hyperparameters and refining architectures
▪ Training takes hours-days
Seamless Deployment onto embedded hardware
Real world systems use more than deep
learning
Deep learning frameworks do not
include “classical” computer vision
Not a deep learning expert
26
Algorithm Design to Embedded Deployment Workflow
Conventional Approach
Desktop GPU
High-level language
Deep learning framework
Large, complex software stack
1
Desktop GPU
C++
C/C++
Low-level APIs
Application-specific libraries
2
C++
Embedded GPU
C/C++
Target-optimized libraries
Optimize for memory & speed
3
Challenges
• Integrating multiple libraries and
packages
• Verifying and maintaining multiple
implementations
• Algorithm & vendor lock-in
27
GPU Coder for Deployment: New Product in R2017b
Neural Networks
Deep Learning, machine learning
Image Processing and
Computer Vision
Image filtering, feature detection/extraction
Signal Processing and
Communications
FFT, filtering, cross correlation,
7x faster than state-of-art 700x faster than CPUs
for feature extraction
20x faster than
CPUs for FFTs
GPU Coder
Accelerated implementation of
parallel algorithms on GPUs
28
Algorithm Design to Embedded Deployment Workflow
with GPU Coder
MATLAB algorithm
(functional reference)
Functional test1 Deployment
unit-test
2
Desktop
GPU
C++
Deployment
integration-test
3
Desktop
GPU
C++
Real-time test4
Embedded GPU
.mex .lib Cross-compiled
.lib
Build type
Call CUDA
from MATLAB
directly
Call CUDA from
(C++) hand-
coded main()
Call CUDA from (C++)
hand-coded main().
29
CUDA Code Generation from GPU Coder
30
31
End-to-End Application: Lane Detection
Transfer Learning
AlexNet – 1000 class classification
Lane detection
CNN
Post-processing
(find left/right lane
points)
Image
Parabolic lane
coefficients in
world coordinates
Left lane co-efficients
Right lane co-efficients
Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c
MATLAB : A SINGLE PLATFORM FOR DEEP LEARNING TRAINING & DEPLOYMENT
https://tinyurl.com/ybaxnxjg
32
Alexnet Inference on Intel CPUs
MATLAB
(R2017b Release 2)
TensorFlow
MXNet
Caffe2
33
Alexnet Inference on NVIDIA Titan Xp
MATLAB GPU Coder
(R2017b)
TensorFlow (1.2.0)
Caffe2 (0.8.1)
Framespersecond
Batch Size
CPU Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
GPU Pascal Titan Xp
cuDNN v5
Testing platform
MXNet (0.10)
MATLAB (R2017b)
2x 7x5x
34
Alexnet inference on NVIDIA GPUs
0
1
2
3
4
5
6
7
8
9 CPU resident memory
GPU peak memory (nvidia-smi)
Memoryusage(GB)
Batch Size1 16 32 64
CPU Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50 GHz
GPU Tesla K40c
Py-Caffe
GPUCoder
TensorFlow
MATLABw/PCT
C++-Caffe
35
Design Deep Neural Networks in MATLAB and Deploy with GPU
Coder
Design Deep Learning &
Vision Algorithms
Highlights
▪ Manage large image sets
▪ Easy access to models like AlexNet, GoogleNet
▪ Pre-built training frameworks
▪ Automate ground truth labeling apps
Highlights
▪ Automate optimized CUDA code
generation with GPU Coder
▪ Deployed models upto 4.5x faster
than Caffe2 and 7x faster than
Tensor
High Performance Deployment
36
Deep Learning Onramp
Free Introductory Course
Available Here
37
Visit MathWorks Booth to Learn More
HIGHWAY_SCENE
Classification
Car Car CarDetection
Lane Lane
Regression
Semantic
Segmentation

Developing and Deploying Deep Learning Based Computer Vision Systems - Alka Nair - MathWorks

  • 1.
    1© 2017 TheMathWorks, Inc. Developing and Deploying Deep Learning Based Computer Vision Systems Alka Nair Application Engineer
  • 2.
    2 Alexnet Vehicle Detection Peopledetection Lane detection ~30 Fps (Tegra X1) ~66 Fps (Tegra X1) ~20 Fps (K40c) ~130 Fps (K40c)
  • 3.
    3 Deep Learning Applicationsin Computer Vision HIGHWAY_SCENE Classification Semantic SegmentationRain Detection and Removal Human Aware Navigation for Robots
  • 4.
    4 Lane Detection ona Tesla K40 GPU
  • 5.
    5 End-to-End Application: LaneDetection Transfer Learning AlexNet – 1000 class classification Lane detection CNN Post-processing (find left/right lane points) Image Parabolic lane coefficients in world coordinates Left lane co-efficients Right lane co-efficients Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c MATLAB : A SINGLE PLATFORM FOR DEEP LEARNING TRAINING & DEPLOYMENT
  • 6.
    6 Deep Learning Challenges BigData ▪ Handling large amounts of data ▪ Labeling thousands of images & videos Training and Testing Deep Neural Networks ▪ Accessing reference models from research ▪ Understanding network behavior ▪ Tuning hyperparameters and refining architectures ▪ Training takes hours-days Seamless Deployment onto embedded hardware Real world systems use more than deep learning Deep learning frameworks do not include “classical” computer vision Not a deep learning expert
  • 7.
    7 Access Large Setsof Images Handle Large Sets of Images Easily manage large sets of images - Single line of code to access images - Operates on disk, database, big-data file system imageData = imageDataStore(‘vehicles’) Easily manage large sets of images - Single line of code to access images - Operates on disk, database, big-data file system Organize Images in Folders (~ 10,000 images , 5 folders)
  • 8.
    8 Handle big imagecollection without big changes Images in local directory Images on HDFS
  • 9.
  • 10.
    10 Generate Training Datafrom Labeled Images Labeled Lane Boundaries in Image Coordinates Correspond to coefficients of parabola representing left and right lane (a,b,c). Ground Truth Exported from Ground Truth Labeler App Parabolic Lane Boundary Modeling >> findparabolicLaneBoundaries Lane Boundary Models in World Coordinates
  • 11.
    11 End-to-End Application: LaneDetection Transfer Learning AlexNet – 1000 class classification Lane detection CNN Post-processing (find left/right lane points) Image Parabolic lane coefficients in vehicle coordinates Left lane co-efficients Right lane co-efficients Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c MATLAB : A SINGLE PLATFORM FOR DEEP LEARNING TRAINING & DEPLOYMENT
  • 12.
    12 Deep Learning Challenges BigData ▪ Handling large amounts of data ▪ Labeling thousands of images & videos Training and Testing Deep Neural Networks ▪ Accessing reference models from research ▪ Understanding network behavior ▪ Tuning hyperparameters and refining architectures ▪ Training takes hours-days Seamless Deployment onto embedded hardware Real world systems use more than deep learning Deep learning frameworks do not include “classical” computer vision Not a deep learning expert
  • 13.
    13 Transfer Learning Workflow Earlylayers Last layers 1 million images 1000s classes Load pretrained network Fewer classes Learn faster New layers Replace final layers 100s images 10s classes Training options Train network Trained Network Predict and assess network accuracy
  • 14.
    14 Import Pre-Trained Modelsand Network Architectures Pretrained Models ▪ AlexNet ▪ VGG-16 ▪ VGG-19 ▪ GoogLeNet ▪ Resnet50 ▪ InceptionV3 ▪ ResNet - 101 Import Models from Frameworks ▪ Caffe Model Importer (including Caffe Model Zoo) – importCaffeLayers – importCaffeNetwork ▪ TensorFlow-Keras Model Importer – importKerasLayers – importKerasNetwork Download from within MATLAB net = alexnet; net = vgg16; net = vgg19; net = googlenet; net = resnet50; net = inceptionv3; net = resnet101;
  • 15.
    15 Visualizations for UnderstandingNetwork Behavior ▪ Custom visualizations – Example: Class Activation Maps Filters … Activations Deep Dream http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf
  • 16.
    16 Augment Training Images imageAugmenter= imageDataAugmenter('RandRotation',[-180 180]) Rotation Reflection Scaling Shearing Translation Colour pre-processing Resize / Random crop / Centre crop
  • 17.
    17 Transfer Learning Workflow Earlylayers Last layers 1 million images 1000s classes Load pretrained network Fewer classes Learn faster New layers Replace final layers 100s images 10s classes Training options Train network Trained Network Predict and assess network accuracy
  • 18.
    18 Transfer Learning Workflow Earlylayers Last layers 1 million images 1000s classes Load pretrained network Fewer classes Learn faster New layers Replace final layers 100s images 10s classes Training options Train network Trained Network Predict and assess network accuracy
  • 19.
    19 Transfer Learning Workflow Earlylayers Last layers 1 million images 1000s classes Load pretrained network Fewer classes Learn faster New layers Replace final layers 100s images 10s classes Training options Train network Trained Network Predict and assess network accuracy
  • 20.
    20 Training Deep NeuralNetworks trainingOptions ▪ Plot training metrics – Training accuracy, smoothed training accuracy, validation accuracy – Training loss, smoothed training loss, and validation loss ▪ Debug training – Stop and check current state – Save / load checkpoint networks – Custom output function (stopping condition, visualization, etc.) ▪ Bayesian optimization for hyperparameter tuning Learn More
  • 21.
    21 Transfer Learning Workflow Earlylayers Last layers 1 million images 1000s classes Load pretrained network Fewer classes Learn faster New layers Replace final layers 100s images 10s classes Training options Train network Trained Network Predict and assess network accuracy MATLAB Provides Evaluation Frameworks for Different Classes of Deep Learning Problems
  • 22.
  • 23.
    23 Deep learning onCPU, GPU, multi-GPU and clusters Single CPU Single CPU Single GPU Single CPU Multiple GPUs On-prem server with GPUs Cloud GPUs (AWS, Azure, etc.) Deep Learning on Cloud Whitepaper
  • 24.
    24 Training in MATLABis fast MATLAB is more than 4x faster than TensorFlow AlexNet CNN architecture trained on the ImageNet dataset, using batch size of 32, on a Windows 10 desktop with single NVIDIA GPU (Titan Xp). TensorFlow version 1.2.0.
  • 25.
    25 Deep Learning Challenges BigData ▪ Handling large amounts of data ▪ Labeling thousands of images & videos Training and Testing Deep Neural Networks ▪ Accessing reference models from research ▪ Understanding network behavior ▪ Tuning hyperparameters and refining architectures ▪ Training takes hours-days Seamless Deployment onto embedded hardware Real world systems use more than deep learning Deep learning frameworks do not include “classical” computer vision Not a deep learning expert
  • 26.
    26 Algorithm Design toEmbedded Deployment Workflow Conventional Approach Desktop GPU High-level language Deep learning framework Large, complex software stack 1 Desktop GPU C++ C/C++ Low-level APIs Application-specific libraries 2 C++ Embedded GPU C/C++ Target-optimized libraries Optimize for memory & speed 3 Challenges • Integrating multiple libraries and packages • Verifying and maintaining multiple implementations • Algorithm & vendor lock-in
  • 27.
    27 GPU Coder forDeployment: New Product in R2017b Neural Networks Deep Learning, machine learning Image Processing and Computer Vision Image filtering, feature detection/extraction Signal Processing and Communications FFT, filtering, cross correlation, 7x faster than state-of-art 700x faster than CPUs for feature extraction 20x faster than CPUs for FFTs GPU Coder Accelerated implementation of parallel algorithms on GPUs
  • 28.
    28 Algorithm Design toEmbedded Deployment Workflow with GPU Coder MATLAB algorithm (functional reference) Functional test1 Deployment unit-test 2 Desktop GPU C++ Deployment integration-test 3 Desktop GPU C++ Real-time test4 Embedded GPU .mex .lib Cross-compiled .lib Build type Call CUDA from MATLAB directly Call CUDA from (C++) hand- coded main() Call CUDA from (C++) hand-coded main().
  • 29.
  • 30.
  • 31.
    31 End-to-End Application: LaneDetection Transfer Learning AlexNet – 1000 class classification Lane detection CNN Post-processing (find left/right lane points) Image Parabolic lane coefficients in world coordinates Left lane co-efficients Right lane co-efficients Output of CNN is lane parabola coefficients according to: y = ax^2 + bx + c MATLAB : A SINGLE PLATFORM FOR DEEP LEARNING TRAINING & DEPLOYMENT https://tinyurl.com/ybaxnxjg
  • 32.
    32 Alexnet Inference onIntel CPUs MATLAB (R2017b Release 2) TensorFlow MXNet Caffe2
  • 33.
    33 Alexnet Inference onNVIDIA Titan Xp MATLAB GPU Coder (R2017b) TensorFlow (1.2.0) Caffe2 (0.8.1) Framespersecond Batch Size CPU Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz GPU Pascal Titan Xp cuDNN v5 Testing platform MXNet (0.10) MATLAB (R2017b) 2x 7x5x
  • 34.
    34 Alexnet inference onNVIDIA GPUs 0 1 2 3 4 5 6 7 8 9 CPU resident memory GPU peak memory (nvidia-smi) Memoryusage(GB) Batch Size1 16 32 64 CPU Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50 GHz GPU Tesla K40c Py-Caffe GPUCoder TensorFlow MATLABw/PCT C++-Caffe
  • 35.
    35 Design Deep NeuralNetworks in MATLAB and Deploy with GPU Coder Design Deep Learning & Vision Algorithms Highlights ▪ Manage large image sets ▪ Easy access to models like AlexNet, GoogleNet ▪ Pre-built training frameworks ▪ Automate ground truth labeling apps Highlights ▪ Automate optimized CUDA code generation with GPU Coder ▪ Deployed models upto 4.5x faster than Caffe2 and 7x faster than Tensor High Performance Deployment
  • 36.
    36 Deep Learning Onramp FreeIntroductory Course Available Here
  • 37.
    37 Visit MathWorks Boothto Learn More HIGHWAY_SCENE Classification Car Car CarDetection Lane Lane Regression Semantic Segmentation