NVIDIA深度學習教育機構 (DLI): Object detection with jetson

1
Instructor
DLI Robotics
Workshop
“Pixels to Action”
Wednesday, May 10, 2017

2
DEEP LEARNING INSTITUTE
DLI Mission
Helping people solve challenging
problems using AI and deep learning.
• Developers, data scientists and
engineers
• Self-driving cars, healthcare and
robotics
• Training, optimizing, and deploying
deep neural networks

3
DLI TRAINING OFFERINGS
5-DAY COURSES
INSTRUCTOR-LED
WORKSHOPS
SELF-PACED
LABS
Online labs offer on-
demand 24/7 access to
introductory concepts
Prerequisites: Developer,
Data Scientist or
Researcher. Not designed
for non-developers
Coming later in 2017
Industry-specific courses
teach students to fine-tune
a neural network to deploy
on a specific platform (e.g.
NVIDIA Drive PX2)
Prerequisites - varies by
industry
Both beginner and
intermediate labs are
offered - typically “Getting
Started” and two more
advanced labs constitute a
day-long workshop
Prerequisites - same as
self-paced labs.

4
ITINERARY
04:00 – 04:30 Deep Learning Introduction and Pre-Work demo
04:30 – 05:00 Image Classification deployed to Jetson TX1
05:00 – 05:20 JetBot Presentation & Demo
05:20 – 05:50 Object Detection on Jetson TX1
05:50 – 06:00 Wrap-up / Q&A / Additional Jetson time

6
WHAT THIS LAB IS
• An introduction to:
• Deep Learning
• Workflow of an end-to-end Deep Learning pipeline
• Deploying trained DNNs
• Hands-on exercises using Caffe/TensorRT on Jetson TX1

7
WHAT THIS LAB IS NOT
• Intro to machine learning from first principles
• Rigorous mathematical formalism of neural networks
• Survey of all the features and options of Caffe, DIGITS, or other
tools
• A deep dive into the hardware of Jetson TX1

8
ASSUMPTIONS
• No background in Deep Learning needed
• No robotics experience needed
• Understand how to:
• Work in a Linux command-line environment
• Basic programming skills in Python and C/C++

9
TAKE AWAYS
• Understanding of the workflow of Deep Learning
• Ability to deploy a trained convolutional neural network
• Comprehend “pixels to actions” on a Jetson TX1 platform

10
DEEP LEARNING DEVELOPMENT CYCLE

11
IMAGE CLASSIFICATION
— Classify an entire image as one class
— Works better for close up images
— We’ll be using AlexNet initially
“trained a large, deep convolutional neural
network to classify the 1.3 million high-
resolution images in the LSVRC-2010
ImageNet training set into the 1000 different
classes. On the test data, we achieved top-1
and top-5 error rates of 39.7% and 18.9%
which is considerably better than the previous
state-of-the-art results”

12
CREATING OUR MODEL - SUMMARY

13
LOAD DATASET – IMAGENET DATA
Different options will be presented based upon the task

14
CHOOSE MODEL – ALEXNET IN CAFFE
Differences may exist between model tasks
Can
anneal
the
learning
rate
Define
custom
layers
with
Python

15
TRAINED MODEL
Annealed learning
rate
Loss function and
accuracy during
training

16
DEPLOYING OUR MODEL – YOUR TURN

17
TASK 1
1. Open a terminal on the Jetson
2. Go to ~/01-classification
3. Run ./static_classify.sh grannysmith.jpg
— Try other images in the folder
4. Try nvidia.jpg
— What is different about this image?
Static Image Classification

19
TASK 2
1. Run ./webCamClassify.py
— Test some convenient items
2. Find IDs with their associated class names in synset_words.txt
3. Modify webCamClassify.py and change the object IDs
— Function is alert
— Try using “computer mouse” or “granny smith” or “bottle cap”
4. Re-run ./webCamClassify.py
— Should see alerts when holding up recognized object
Live Camera Classification

20
HOW TO STOP TERMINATOR
Bananas

21
OBJECT DETECTION
— Draws bounding boxes around objects
within an image
— We will use a DetectNet-based model
trained to recognize “bottles”
— Trained on Azure K80 GPUs
— MS Coco dataset converted to KITTI
format for DIGITS to use

22
TASK 3
1. cd ~/02-detection; mkdir bottlenet
2. Extract ~/bottlenet.tgz to ~/detection/bottlenet
1. cd bottlenet; tar xzf ~/bottlenet.tgz
— This simulates downloading model from Azure and deploying
3. From ~/02-detection, run ./imageDetect.py sodagroup.jpg
4. You should see an image with bounding boxes drawn on bottles.
Static Object Detection

23
TASK 4
1. Modify imageDetect.py to do something different with its inference.
— For example:
— Print the number of bottles
— Calculate the % of the image that consists of bottles.
— Display the results differently than big blue opaque rectangles.
— Come up with your own algorithm!
Detect Multiple Bottles

24
TASK 5
1. Run ./webCamDetect.py
— Hold up bottle in front of camera and look for rectangles on screen and printout
indicating whether bottles are detected.
2. Modify code to change the output based on number of bottles
— If really ambitious, can changed output based on size of bounding box
Live Camera Object Detection

2626
NVIDIA DEEP LEARNING SOFTWARE PLATFORM
NVIDIA DEEP LEARNING SDK
TensorRT
Embedded
Automotive
Data center
TRAINING FRAMEWORK
Training
Data
Training
Data Management
Model Assessment
Trained Neural
Network
developer.nvidia.com/deep-learning-software

2727
NVIDIA TensorRT
High-performance deep learning inference for production
deployment
developer.nvidia.com/tensorrt
High performance neural network inference engine
for production deployment
Generate optimized and deployment-ready models for
datacenter, embedded and automotive platforms
Deliver high-performance, low-latency inference demanded
by real-time services
Deploy faster, more responsive and memory efficient deep
learning applications with INT8 and FP16 optimized
precision support
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
2 8 128
CPU-Only
Tesla P40 + TensorRT (FP32)
Tesla P40 + TensorRT (INT8)
Up to 36x More Image/sec
Batch Size
GoogLenet, CPU-only vs Tesla P40 + TensorRT
CPU: 1 socket E4 2690 v4 @2.6 GHz, HT-on
GPU: 2 socket E5-2698 v3 @2.3 GHz, HT off, 1 P40 card in the box
Images/Second

2828
TENSORRT
• Image Classification (AlexNet, GoogleNet, VGG, ResNet)
• Object Detection
• Segmentation
Networks Supported
Not Yet Supported
• RNN/LSTM
• 3D convolutions
• Custom user layers

2929
TENSORRT
• Convolution: Currently only 2D convolutions
• Activation: ReLU, tanh and sigmoid
• Pooling: max and average
• Scale: similar to Caffe Power layer (shift+scale*x)^p
• ElementWise: sum, product or max of two tensors
• LRN: cross-channel only
• Fully-connected: with or without bias
• SoftMax: cross-channel only
• Deconvolution
Layers Types Supported

30
TENSORRT
Workflow
Training Framework
OPTIMIZATION
USING TensorRT
RUNTIME
USING TensorRT
PLANNEURAL
NETWORK

3131
TENSORRT
Optimizations
• Fuse network layers
• Eliminate concatenation layers
• Kernel specialization
• Auto-tuning for target platform
• Tuned for given batch size
TRAINED
NEURAL NETWORK
OPTIMIZED
INFERENCE
RUNTIME

32
GRAPH OPTIMIZATION
Unoptimized network
concat
max pool
input
next input
3x3 conv.
relu
bias
1x1 conv.
relu
bias
1x1 conv.
relu
bias
1x1 conv.
relu
bias
concat
1x1 conv.
relu
bias
5x5 conv.
relu
bias

33
GRAPH OPTIMIZATION
Vertical fusion
concat
max pool
input
next input
concat
1x1 CBR 3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR 1x1 CBR

34
GRAPH OPTIMIZATION
Horizontal fusion
concat
max pool
input
next input
concat
3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR

35
GRAPH OPTIMIZATION
Concat elision
max pool
input
next input
3x3 CBR 5x5 CBR 1x1 CBR
1x1 CBR

3636
INT8 PRECISION
New in TensorRT
ACCURACYEFFICIENCYPERFORMANCE
0
1000
2000
3000
4000
5000
6000
7000
2 4 128
FP32 INT8
Up To 3x More Images/sec with INT8
Precision
Batch Size
GoogLenet, FP32 vs INT8 precision + TensorRT on
Tesla P40 GPU, 2 Socket Haswell E5-2698 v3@2.3GHz with HT off
Images/Second
0
200
400
600
800
1000
1200
1400
2 4 128
FP32 INT8
Deploy 2x Larger Models with INT8
Precision
Batch Size
Memory(MB)
0%
20%
40%
60%
80%
100%
Top 1
Accuracy
Top 5
Accuracy
FP32 INT8
Deliver full accuracy with INT8
precision
%Accuracy

37
TENSORRT AVAILABILITY
http://developer.nvidia.com/tensorrrt
Register to download
Sign up for early access testing
Learn More
TensorRT 1 stable version and testing for TensorRT 2

38
TASK 6-1
1. Change to ~/03-detection-RT/jetson-inference/data/networks
2. mkdir bottlenet; cd bottlenet
3. Extract bottlenet: tar xzf ~/bottlenet.tgz
4. Remove the Python layer:
patch -p0 < ../../../../deploy_bottlenet.patch
5. If you want to see what the patch did:
cat ../../../../deploy_bottlenet.patch
Use TensorRT for Better Performance

39
TASK 6-2
1. Change to ~/03-detection/jetson-inference/build/aarch64/bin
2. Run ./detectnet-camera bottlenet
1. Notice the 5x speedup in performance!
3. Modify
~/03-detection/jetson-inference/detectnet-camera/detectnet-camera.cpp
1. Change display algorithm as previous tasks
2. Go to ~/detectnet/jetson-inference/build and type make to re-compile
Use TensorRT for Better Performance

40
UPDATING YOUR DATASET
— Find images/video that do not work correctly
— Label them
— Use tools such as Sloth, MathWorks, or others for labeling images
— Upload new data to Azure
— Retrain network to improve accuracy
— Deploy

41
WHAT’S NEXT
• Use / practice what you learned
• Move your data to Azure and start training
• Buy a Jetson!
• Discuss with peers practical applications of DNN
• Reach out to Microsoft and the Deep Learning Institute

4242
WHAT’S NEXT WITH DLI
…for the chance to win an NVIDIA SHIELD
TV.
Check your email for a link.
TAKE SURVEY
Check your email for details to access more
DLI training online.
ACCESS ONLINE LABS
Visit www.nvidia.com/dli for workshops in
your area.
ATTEND WORKSHOP
Visit https://developer.nvidia.com/join for
more.
JOIN DEVELOPER PROGRAM

4343*Limit five per person
**Limit one per student/instructor
JETSON TX2
DEVELOPER KIT
GTC Show Special: Just $399*
EDU Discount: Just $299**
Available at the GTC Gear Store all week

4444
May 8 - 11, 2017 | Silicon Valley | #GTC17
www.gputechconf.com
Enjoy the world’s most important event for GPU developers
May 8 – 11, 2017 in Silicon Valley
INNOVATE
Hear about disruptive
innovations from startups
DISCOVER
See how GPUs are creating
amazing breakthroughs in
important fields such as
deep learning and AI
CONNECT
Connect with technology
experts from NVIDIA and
other leading organizations
LEARN
Gain insight and valuable
hands-on training through
hundreds of sessions and
research posters

4545
FINAL TASK
1. cd /home
2. sudo /usr/local/clean_home/restore.sh
(password is ubuntu)
Restore Jetson

NVIDIA深度學習教育機構 (DLI): Object detection with jetson

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to NVIDIA深度學習教育機構 (DLI): Object detection with jetson

Similar to NVIDIA深度學習教育機構 (DLI): Object detection with jetson (20)

More from NVIDIA Taiwan

More from NVIDIA Taiwan (20)

Recently uploaded

Recently uploaded (20)

NVIDIA深度學習教育機構 (DLI): Object detection with jetson