Faster deep learning solutions from training to inference - Amitai Armon & Nir Lotan, Intel

Fast deep learning
at your fingertips
Faster deep learning solutions
from training to inference using Intel® Deep Learning SDK
Nir Lotan
Machine Learning Product Manager
Advanced Analytics, Intel
Chief Data Scientist
Advanced Analytics, Intel
Dr. Amitai Armon

Deep learning is Everywhere AT
Intel
ManufacturingProcessor Design Sales & Marketing
Health Analytics AI ProductsPerceptual Computing

Visual understanding Natural Language Processing Speech recognition
Deep neural networks are solving real life cognitive tasks
person
Sed ut perspiciatis unde omnis iste
natus error sit voluptatem
accusantium doloremque
laudantium, totam rem aperiam,
eaque ipsa quae ab illo inventore
veritatis et quasi architecto beatae
vitae dicta sunt explicabo. Nemo
enim ipsam voluptatem quia volu
eaque ipsa quae ab illo inventore
veritatis et quasi architecto beatae
vitae dicta sunt explicabo. Nemo
enim ipsam voluptatem quia volu
DEEP
LEARNING

Model is inspired by a multi-layer network of neurons
Network Topology
DEEP
LEARNING

DEEP LEARNING steps
Step 1: Training
(In Data Center – Over Hours/Days/Weeks)
Person
Lots of labeled
input data
Output:
Trained Model
Create “Deep
neural net” math
model
Step 2: Inference
(End point or Data Center - Instantaneous)
New input from
camera and
sensors
Output:
Classification
Trained neural
network model
97% person
2% traffic light
Trained
Model

deep learning
todayis not really accessible…
…and can be overwhelming

Intel® deep
learning SDKEasily develop and deploy deep learning
solutions,
using Intel® Architecture & Popular
frameworks

Intel® Deep Learning SDK - Workflow
Data
Prep.
Build a
Model
Model Training
Training Inference
Compress
ion
Visualizat
ions
Algorith
mic
Features
Multi-
Node
Model
Optimizer
Inference
Engine

Our Vision: Democratize deep
learningAllow every Data scientist and Developer to easily deploy Open Sourced Deep
Learning Frameworks optimized for Intel® Architecture - delivering end-to-end
capabilities, a rich user experience, and tools to boost productivity.
Plug &
Train
Maximiz
e
perform
Producti
vity
tools
Accelerat
e
deploym

Plug & Train - An easy to use
installer
Install on Linux CentOS/Ubunto or Mac
Install from Linux, Mac or Windows
Use the tool remotely via Chrome browser from any platform.

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems,
components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated
purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance . *Other names and brands may be property of others
Configurations:
2 socket system with Intel® Xeon Processor E5-2699 v4 (22 Cores, 2.2 GHz,), 128 GB memory, Red Hat* Enterprise Linux 6.7, BVLC Caffe, Intel Optimized Caffe framework, Intel® MKL 11.3.3, Intel® MKL 2017
Intel® Xeon Phi™ Processor 7250 (68 Cores, 1.4 GHz, 16GB MCDRAM), 128 GB memory, Red Hat* Enterprise Linux 6.7, Intel® Optimized Caffe framework, Intel® MKL 2017
All numbers measured without taking data manipulation into account.
0
5
10
15
20
25
30
Intel® Xeon® E5-2699 v4 Intel® Xeon® E5-2699 v4 Intel® Xeon® E5-2699 v4 Intel® Xeon Phi 7250
Out-of-the-box +Intel MKL 11.3.3 +Intel MKL 2017 +Intel MKL 2017
Performancespeedup
Caffe/AlexNet single node training performance
2.1x
2x
training with Intel-Optimized
Frameworks
5.8x
24x
12x
Example: Deep-learning training with Intel-Optimized Caffe* on Intel® Xeon® Processor E5 v4 and Intel® Xeon Phi™

Kubernetes
Multi-node training
Jupyter notebooksBrowser
service
DLSDK
service
service
Node 3Node 1
Container Container
DLSDK
Container
Data (File System)
Node 2
Container Container
DLSDK
Container
Data (File System)
Container
Data (File
System)
…
Performance boost with distributed training

Step by Step Wizard
Productivity tools
Interactive Notebook
MODEL VISUALIZATION MODEL COMRESSION

• Optimize:
• Imports trained models from all popular DL
framework regardless of training HW
• Model Canonicalization, Compression and
Quantization
• Deploy:
• One API across all Intel HW and systems
• Friendly Inference solution: (low footprint,
easy API, control meeting Functional Safety)
• Optimizes Inference execution per target
hardware under-the-hood
Ease of use + Embedded friendly + Extra performance boost
1
2
Trained Model
Optimize
Compress
Quantize
Inference Engine
1
2
CPU GEN FPGA …
Enable full utilization of IA Inference while abstracting HW from developers
Intel Deep Learning Deployment Tool

This is our
dataset:
“Hands”
“Not-Hands”
Thanks to the Intel “Hands in VR” team for sharing the use case and dataset

Demo
http://software.intel.com/deep-learning-sdk/

Step 1: Training
Person
Lots of labeled
input data
Output:
Trained Model
Create model
Step 2: Inference
New input from
camera and
sensors
Output:
Classification
Trained neural
network model
97% person
2% traffic light
Trained
Model
Reminder: Deep-Learning is leading today’s AI
Achieved breakthroughs in visual understanding and in natural language processing
Illustrating the deep-learning training and inference process

Step 1: Training
Person
Lots of labeled
input data
Output:
Trained Model
Create model
Step 2: Inference
New input from
camera and
sensors
Output:
Classification
Trained neural
network model
97% person
2% traffic light
Trained
Model
New AI Methods try to cope with additional
challenges
Illustrating the training and inference process
GAN
(1) Labeled data is scarce and more labeled samples need to be generated (e.g. using GAN)

Step 1: Training
Person
Lots of labeled
input data
Output:
Trained Model
Create model
Step 2: Inference
New input from
camera and
sensors
Output:
Classification
Trained neural
network model
97% person
2% traffic light
Trained
Model
challenges(2) Some tasks are learned better through trial-and-error than through examples
RL

Step 1: Training
Person
Lots of labeled
input data
Output:
Trained Model
Create model
Step 2: Inference
New input from
camera and
sensors
Output:
Classification
Trained neural
network model
97% person
2% traffic light
Trained
Model
challenges(3) Deployment at the edge requires lower memory footprint and faster inference
Model
compression &
low precision

Step 1: Training
Person
Lots of labeled
input data
Output:
Trained Model
Create model
Step 2: Inference
New input from
camera and
sensors
Output:
Classification
Trained neural
network model
97% person
2% traffic light
Trained
Model
challenges(4) Using the model often requires adapting it to a new type of data (transfer learning)
Transfer
learning

challenges=> There are multiple ongoing evolvements in AI, and we should be ready
Step 1: Training
Person
Lots of labeled
input data
Output:
Trained Model
Create model
Step 2: Inference
New input from
camera and
sensors
Output:
Classification
Trained neural
network model
97% person
2% traffic light
Trained
Model
GANRL
Model
compression &
low precision
Transfer
learning

Download, use, and provide
feedback
or search for: Intel Deep Learning SDK
http://software.intel.com/deep-learning-sdk/
If someone has specific use case
they want to try with us – let me know
Thank you!

Faster deep learning solutions from training to inference - Amitai Armon & Nir Lotan, Intel

More Related Content

What's hot

Similar to Faster deep learning solutions from training to inference - Amitai Armon & Nir Lotan, Intel

More from Codemotion Tel Aviv

Recently uploaded

Faster deep learning solutions from training to inference - Amitai Armon & Nir Lotan, Intel