May 7th 2019
This document contains information on products, services and/or processes in development. All information provided here is subject to change without
notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at
intel.com, or from the OEM or retailer. No computer system can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual
performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about
performance and benchmark results, visit http://www.intel.com/performance.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may
affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that
involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings,
including the annual report on Form 10-K.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications.
Current characterized errata are available on request.
Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as
"Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and
confirm whether referenced data are accurate.
Intel, the Intel logo, Pentium, Celeron, Atom, Core, Xeon and others are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2018 Intel Corporation.
Legalnotices&disclaimers
3
Questions?Askus!
MichaelZephyr
Developer Evangelist
michael.zephyr@intel.com
BenOdom
Developer Evangelist
benjamin.j.odom@intel.com
MecitGungor
Developer Evangelist
abdulmecit.gungor@intel.com
4
Agenda
• Intel® AI Academy
• Intel® AI Portfolio
• Intel AI Use Cases
• ML/DL Introduction
• Training on Caffe*/Tensorflow* with Intel optimizations
• Introduction to the Intel® OpenVINO™ Toolkit
• Introduction to the Intel® Movidius™ Neural Compute Stick and SDK
• Overview of Intel® Optimized Caffe* and Tensorflow*
• Intel® AI DevCloud
© 2019 Intel Corporation
Theai
journey
Business
imperative
Intel
AI
AIIsthedrivingforce
Foresight
Predictive
Analytics
Forecast
Prescriptive
Analytics
Act/adapt
Cognitive
Analytics
Hindsight
Descriptive
Analytics
insight
Diagnostic
Analytics
WhyAInow?
AnalyticsCurve
25GBper month
Internet User
1
Datadeluge(2019) Insights
Business
Operational
Security
50GBper day
Smart Car
2
3TB per day
Smart Hospital
2
40TBper day
Airplane Data
2
1pBper day
Smart Factory
2
50PBper day
City Safety
2
1. Source: http://www.cisco.com/c/en/us/solutions/service-provider/vni-network-traffic-forecast/infographic.html
2. Source: https://www.cisco.com/c/dam/m/en_us/service-provider/ciscoknowledgenetwork/files/547_11_10-15-DocumentsCisco_GCI_Deck_2014-2019_for_CKN__10NOV2015_.pdf
Deep
learning
Machine
learning
AI
Model
Weights
Forward
“Bicycle”?
??????
Lots of
tagged
data
Forward
Backward
“Strawberry”
“Bicycle”?
Error
Human Bicycle
Strawberry
Training:
Inference:
Many different approaches to AI
WhatisAI?
Consumer Health Finance Retail Government Energy Transport Industrial Other
Smart
Assistants
Chatbots
Search
Personalization
Augmented
Reality
Robots
Enhanced
Diagnostics
Drug
Discovery
Patient Care
Research
Sensory
Aids
Algorithmic
Trading
Fraud
Detection
Research
Personal
Finance
Risk Mitigation
Support
Experience
Marketing
Merchandising
Loyalty
Supply Chain
Security
Defense
Data
Insights
Safety &
Security
Resident
Engagement
Smarter
Cities
Oil & Gas
Exploration
Smart
Grid
Operational
Improvement
Conservation
In-Vehicle
Experience
Automated
Driving
Aerospace
Shipping
Search &
Rescue
Factory
Automation
Predictive
Maintenance
Precision
Agriculture
Field
Automation
Advertising
Education
Gaming
Professional &
IT Services
Telco/Media
Sports
Source: Intel forecast
AIwilltransform
© 2019 Intel Corporation
Theai
journey
Business
imperative
Intel
AI
TheAI
journey
2.Approach
3.values7.model
4.People
8.Deploy
1.Challenge
6.Data
5.Technology
Partner with Intel
to accelerate
your AI journey
Proof of Concept: Image Recognition
Seismic Reflection Analysis
Client:
A leading developer of software solutions to the global oil and gas industry.
Challenge:
Automate identification of fault lines within seismic reflection data.
Solution:
Built a proof of concept that is trained using seismic reflection data and can predict
the probability of finding fault lines on previously unseen images.
Performs pixel-wise semantic segmentation of SEG-Y formatted data
Model trained using supervised learning
Advantages:
Automation enables analysis of vast amounts of data faster
Could identify potentially rewarding locations from subtle clues in the data
Proof of Concept: Image Recognition
Oil Rig “Inspector Assist” SystemClient
Multinational oil and gas company
Challenge
The customer operates a number of offshore oil rigs,
and uses submersible vehicles to take video footage to
ensure their infrastructure is healthy and safe.
Since reviews of this footage are time consuming and
prone to errors, a more efficient solution for detecting
potential problems is needed.
Solution
Built models to detect and classify bolts according to
level of corrosion.
Advantages
Video footage can be condensed to 10% of its original length
by filtering out unimportant frames and highlight potential
problem areas, enabling inspectors to perform their jobs
more efficiently.
Level of Corrosion
Low High
© 2019 Intel Corporation
Theai
journey
Business
imperative
Intel
AI
Software
hardware
community
nGraph
OpenVINO™
toolkit
Nauta™
ML Libraries
Intel AI
Builders
Intel AI
Developer
Program
BreakingbarriersbetweenAITheoryandreality
Simplify AI
via our robust community
Choose any approach
from analytics to deep learning
Tame your data deluge
with our data layer expertise
Deploy AI anywhere
with unprecedented HW choice
Speed up development
with open AI software
Partner with Intel to accelerate your AI journey
Scale with confidence
on the platform for IT & cloud
Intel
GPU
*
*
*
*
*
Intel AI
DevCloud
BigDL
Intel®
MKL-DNN
www.intel.ai
Dedicated
Media/vision
Automated
Driving
Dedicated
DLTraining
Flexible
Acceleration
Dedicated
DLinference
Graphics,Media&
AnalyticsAcceleration
*FPGA: (1) First to market to accelerate evolving AI workloads (2) AI+other system level workloads like AI+I/O ingest, networking, security, pre/post-processing, etc (3) Low latency memory constrained workloads like RNN/LSTM
1GNA=Gaussian Neural Accelerator
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Images are examples of intended applications but not an exhaustive list.
device
Edge
Multi-cloud
NNP-L
NNP-I
GPU
And/OR
ADD
ACCELERATION
DeployAIanywhere
with unprecedented hardware choice
21
© 2019 Intel Corporation
1 An open source version is available at: 01.org/openvinotoolkit *Other names and brands may be claimed as the property of others.
Developer personas show above represent the primary user base for each row, but are not mutually-exclusive
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
TOOLKITS
App
developers
libraries
Data
scientists
Kernels
Library
developers
DEEP LEARNING DEPLOYMENT
Intel® Distribution of OpenVINO™ Toolkit1 Nauta (Beta)
Deep learning inference deployment
on CPU/GPU/FPGA/VPU for
Caffe*, TensorFlow*, MXNet*, ONNX*, Kaldi*
Open source, scalable, and extensible
distributed deep learning platform
built on Kubernetes
DEEP LEARNING FRAMEWORKS
Optimized for CPU & more
Status & installation guides
More framework optimizations
underway (e.g. PaddlePaddle*,
CNTK* & more)
MACHINE LEARNING (ML)
Python R Distributed
• Scikit-
learn
• Pandas
• NumPy
• Cart
• Random
Forest
• e1071
• MlLib (on Spark)
• Mahout
ANALYTICS & ML
Intel®
Distribution
for Python*
Intel® Data
Analytics
Library
Intel distribution
optimized for
machine learning
Intel® Data Analytics
Acceleration Library
(incl machine learning)
DEEP LEARNING GRAPH COMPILER
Intel® nGraph™ Compiler (Beta)
Open source compiler for deep learning model
computations optimized for multiple devices (CPU, GPU,
NNP) from multiple frameworks (TF, MXNet, ONNX)
DEEP LEARNING
Intel® Math Kernel
Library for Deep
Neural Networks
(Intel® MKL-DNN)
Open source DNN functions for
CPU / integrated graphics
*
*
*
*
FOR
*
*
Speedupdevelopment
with open AI software
Optimization Notice
software.intel.com/ai
Get 4-weeks FREE access to
the Intel® AI DevCloud, use
your existing Intel® Xeon®
Processor-based cluster, or
use a public cloud service
Intel®AIacademy
For developers, students, instructors and startups
teach Share
Developlearn
Showcase your innovation
at industry & academic
events and online via the
Intel AI community forum
Get smarter using
online tutorials,
webinars, student kits
and support forums
Educate others using
available course
materials, hands-on
labs, and more
24
LearnMoreonDevMesh
Opportunities to
share your projects
as an Intel® Student
Ambassador
▪ Industry events via
sponsored speakerships
▪ Student Workshops
▪ Ambassador Labs
▪ Intel® Developer Mesh
AIbuilders:ecosystem
BUSINESS
INTELLIGENCE
&ANALYTCS
VISION CONVERSATIONALBOTS AITOOLS&CONSULTING AIPaaS
HEALTHCARE FINANCIAL
SERVICES
RETAIL TRANSPORTATION NEWS,MEDIA&
ENTERTAINMENT
AGRICULTURE LEGAL&HR ROBOTIC
PROCESS
AUTOMOATION
oem Systemintegrators
CROSSVERTICAL
VERTICAL
HORIZONTAL
Builders.intel.com/ai
Other names and brands may be claimed as the property of others.
100+AI Partners
27
Source: https://en.wikipedia.org/wiki/Data_science
Thedatascienceprocess
28
Applying Algorithms to observed data and make predictions based on data.
Whatismachinelearning?
29
Machinelearning2methods:Supervised&UnsupervisedLearning
Supervised:Wetrainthemodel.Wefeedthemodelwithcorrectanswers.Model
Learnsandfinallypredicts.
Wefeedthemodelwith“groundtruth”.
Unsupervised:Dataisgiventothemodel.Rightanswersarenotprovidedtothe
model.Themodelmakessenseofthedatagiventoit.
Hopefullyteachesyousomethingyouwerenotawareof
30
TypesofSupervisedandUnsupervisedlearning
Classification
Regression
Clustering
Recommendation
SUPERVISED UNSUPERVISED
31
CLASSIFICATION
Predict a label for an entity with a given set of features.
SPAM
prediction sentimentanalysis
3
2
MarketSegmentation
Playtime
inhours
Age
Causal
Gamers
No
Gamers
Serious
Gamers
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
CLUSTERING
Group entities with similar features
33
Minimum Mean Squared Error
0.0
1.0
2.0
1.0 2.0
Budget
BoxOffice
x108
x108
min
𝛽0,𝛽1
1
𝑚
෍
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
34
Gradient Descent
Start with a cost function J(𝛽):
𝑱 𝜷
𝜷
Then gradually move towards the minimum.
Global Minimum
35
▪ Each point can be iteratively
calculated from the previous one
3
Gradient Descent with Linear Regression
𝐽 𝛽0, 𝛽1
𝛽1
𝛽0
𝜔0
𝜔1𝜔2 = 𝜔1 − 𝛼𝛻
1
2
෍
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
𝜔2
𝜔3 = 𝜔2 − 𝛼𝛻
1
2
෍
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2 𝜔3
37
3
Why Deep Learning – What is wrong with Linear
Classifiers?
XOR
The counter
example to all
models
We need non-
linear functions
X1 X2
0 0 0
y
0 1 1
1 0 1
1 1 0
0
X1
X2
0
1
1
Source: https://medium.com/towards-data-science/introducing-deep-learning-and-neural-networks-deep-learning-for-rookies-1-bd68f9cf5883
+
+-
-
38
3
1.5 0.5
Input
Input
+1
+1
+1
+1
-2
Output
X1 X2
0 0 0
y
0 1 1
1 0 1
1 1 0
We Need Layers Usually Lots with Non-linear
Transformations
Threshold to 0 or 1
XOR = (X1 and not X2) OR (Not X1 and X2)
1
0
1 x 1
0 x 1
1 x 1
0 x 1
1 < 1.5
0 x -2
(1 x 1) + (0 x 1) < 1.5 = 0
( 1x1) + (0x-2) + (0x1)= 1 > 0.5 =1
39
1.5 0.5
Input
Input
+1
+1
+1
+1
-2
Output
X1 X2
0 0 0
y
0 1 1
1 0 1
1 1 0
We Need Layers Usually Lots with Non-linear
Transformations
Threshold to 0 or 1
XOR = (X1 and not X2) OR (Not X1 and X2)
1
1
1 x 1
1 x 1
1 x 1
1 x 1
2 > 1.5
1 x -2
(1 x 1) + (1 x 1) = 2 > 1.5
(1x1) + (1x -2) + (1x1) = 0 < .5 =0
40
“Deep learning is a set of algorithms in
machine learning that attempt to model
high-level abstractions in data by using
architectures composed of multiple
non-linear transformations.”
- Wikipedia*
4
This is a brewing domain called Deep Learning
In the machine learning world, we use neural networks. The idea comes from biology.
Each layer learns something.
41
1.5 0.5
Input
Input
+1
+1
+1
+1
-2 Output
≈
Motivation for Neural Nets
▪ Use biology as inspiration for mathematical model
▪ Get signals from previous neurons
▪ Generate signals (or not) according to inputs
▪ Pass signals on to next neurons≈
▪ By layering many neurons, can create complex model
42
4
Each Layer Learns Something
Elephant
Faces
Cars
Elephants
Chairs
Fully
Connected
layer
PredictionLayer 1 Layer 2 Layer N
b
w3
Activation
Function
x1
x2
x3
w1
w2
z = x1w1+ x2w2+ x3w3+b
f(z)
1
BasicNeuronVisualization
44
45
▪ Sigmoid function
– Smooth transition in output between (0,1)
▪ Tanh function
– Smooth transition in output between (-1,1)
▪ ReLU function
– f(x) = max(x,0)
▪ Step function
– f(x) = (0,1)
TypesofActivationFunctions
46
WhyNeuralNets?
▪ Why not just use a single neuron? Why do we need a larger network?
▪ A single neuron (like logistic regression) only permits a linear decision boundary.
▪ Most real-world problems are considerably more complicated!
FeedforwardNeuralNetwork
𝑥1
𝑥2
𝑥3
𝜎
𝜎
𝜎
𝜎
𝜎
𝜎
𝜎
𝜎
ො𝑦1
ො𝑦2
ො𝑦3
47
48
Convolutional Neural Nets
Primary Ideas behind Convolutional Neural Networks:
– Let the Neural Network learn which kernels are most useful
– Use same set of kernels across entire image (translation invariance)
– Reduces number of parameters and “variance” (from bias-variance point of view)
– Can Think of Kernels as “Local Feature Detectors”
Vertical Line Detector
-1 1 -1
-1 1 -1
-1 1 -1
Horizontal Line Detector
-1 -1 -1
1 1 1
-1 -1 -1
Corner Detector
-1 -1 -1
-1 1 1
-1 1 1
CNNforDigitRecognition
49
Source: http://cs231n.github.io/
ConvolutionalNeuralNetworks(CNN)forImageRecognition
50
Pooling:Max-pool
▪ For each distinct patch, represent it by the maximum
▪ 2x2 Max-Pool shown below
51
LeNet-5
How many total weights in the network?
Conv1: 1*6*5*5 + 6 = 156
Conv3: 6*16*5*5 + 16 = 2416
FC1: 400*120 + 120 = 48120
FC2: 120*84 + 84 = 10164
FC3: 84*10 + 10 = 850
Total: = 61706
Less than a single FC layer with [1200x1200] weights!
Note that Convolutional Layers have relatively few weights.
52
53
5
Convolutional Neural Network
– Each neuron connected to a small set of
nearby neurons in the previous layer
– Uses same set of weights for each neuron
– Ideal for spatial feature recognition, Ex: Image
recognition
– Cheaper on resources due to fewer
connections
Fully Connected Neural Networks
– Each neuron is connected to every neuron in the
previous layer
– Every connection has a separate weight
– Not optimal for detecting features
– Computationally intensive – heavy memory usage
Differences Between CNN and Fully Connected Networks
Natural and man-made disasters create
havoc and grief. Lost and abandoned
pets/livestock only add to the emotional
toll.
How do you find your beloved dog after a
flood? What happens to your daughter’s
horse?
Our charter is to unite pets with their
families.
Animal ID Startup
We need your help creating a way to
identify animals. Initial product is
focused on cat/dog breed identification.
Your app will be used by rescuers and the
public to document found animals and to
search for lost pets.
Welcome aboard!
YourJob:DataScientist
59
ArtificialIntelligenceDevelopmentCycle
Data aquisition and
organization
Integrate trained models
with application code
Create models
Adjust models to meet
performance and accuracy
objectives
Intel® Deep Learning Deployment Toolkit Provides Deployment from Intel® Edge to Cloud
Lots of
Labeled
Data!
Training
Inference
Forward
Backward
Model Weights
Forward
“Bicycle”?
“Strawberry”
“Bicycle”?
Error
Human
Bicycle
Strawberry
??????
Data Set Size
Accuracy
Did You Know?
Training requires a very large
data set and deep neural
network (many layers) to
achieve the highest accuracy in
most cases
DeepLearning:Trainingvs.Inference
60
Copyright © 2019, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization NoticeOptimization Notice
Unleash CNN-based deep learning inference
using a common API, 30+ pre-trained models,
& computer vision algorithms. Validated
on more than 100 public/custom models.
61
Benefits of Intel® Distribution of OpenVINO™ toolkit
Reduce time using a library of optimized OpenCV*
& OpenVX* functions, & 15+ samples.
Develop once, deploy for current
& future Intel-based devices.
Use OpenCL™ kernels/tools to add your own
unique code. Customize layers without the
overhead of frameworks.
Access Intel computer vision accelerators.
Speed code performance.
Supports heterogeneous execution.
AcceleratePerformance
OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos
IntegrateDeeplearning
Maximize the Power of Intel® Processors: CPU, GPU/Intel® Processor Graphics, FPGA,VPU
speeddevelopment Innovate&customize
Deep learning revenue is estimated to grow from $655M in 2016 to $35B by 2025¹.
1Tractica 2Q 2017
62
Choosingthe“Right”Hardware
Power/Performance Efficiency Varies
▪ Running the right workload on the
right piece of hardware → higher
efficiency
▪ Hardware acceleration is a must
▪ Heterogeneous computing?
Tradeoffs
▪ Power/performance
▪ Price
▪ Software flexibility, portability
PowerEfficiency
Computation Flexibility
Dedicated
Hardware
GPU
CPU
X1
X10
X100 Vision Processing
Efficiency
Vision DSPs
FPGA
63
▪ Based on selection and connections of computational filters to
abstract key features and correlating them to an object.
▪ Works well with well defined objects and controlled scene.
▪ Difficult to predict critical features in larger number of objects or
varying scenes.
Traditional Computer Vision
▪ Based on application of a large number of filters to an image to
extract features.
▪ Features in the object(s) are analyzed with the goal of associating
each input image with an output node for each type of object.
▪ Values are assigned to output node representing the probability
that the image is the object associated with the output node.
Deep Learning Computer Vision
Pre-trained
Optimized Deep
Learning Models
OpenVINO™ toolkit
Intel® Deep Learning Deployment Toolkit
Model
Optimizer
Inference
Engine
API Solution
Computer Vision
Libraries
OpenCV*/OpenVX*
OpenCV* OpenVX*
Direct Coding Solution
Custom Code
new filters/algorithms or
optimizations/fusing steps
OpenCL™ C/C++
Intel® SDK for
OpenCL™
Application
Intel®
Media SDK
API Solution
CPU GPU FPGA VPU CPU GPU CPU GPU
Intel
Hardware
Abstraction
Layer
OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos
IR = Intermediate Representation File GPU = Intel CPU with integrated graphics processing unit/Intel® Processor Graphics VPU = Intel® Movidius™ Vision Processing Unit
Deep Learning vs. Traditional Computer Vision
OpenVINO™ toolkit has tools for an end-to-end vision pipeline
IR
File
Optimize/
Heterogeneous
Inference engine
supports multiple
devices for
heterogeneous flows.
(device-level
optimization)
Prepare
Optimize
Model optimizer:
▪ Converting
▪ Optimizing
▪ Preparing to
inference
(device agnostic,
generic optimization)
Inference
Inference engine
lightweight API
to use in
applications for
inference.
MKL-
DNN
cl-DNN
CPU: Intel®
Xeon®/Intel®
Core™/Intel Atom®
GPU
FPGA
Myriad™ 2/X
DLA
Intel®
Movidius™
API
Train
Train a DL model.
Currently supports:
▪ Caffe*
▪ Mxnet*
▪ TensorFlow*
Extend
Inference engine
supports
extensibility
and allows
custom kernels
for various
devices.
Extensibility
C++
Extensibility
OpenCL™
Extensibility
OpenCL™/TB
D
Extensibility
TBD
ApplicationdevelopmentwithOpenVINO™Toolkit
64
AzureML→EdgeFlow
usingAZUREIOTEdge+AZUREONNXRT+OpenVINOExecutionProvider
MSFT’s pre-trained
topologies & models
User’s custom
topologies &
models
ONNX, CAFFE, TENSORFLOW, …
AzureML
Intel component
MSFT components
Users’ custom
components
OS with Azure IoT Edge
OpenVINO IE Libs
Inference Scripts
CPU
MKL
DNN
CLDNN,
Media libs
DLA Myriad
Localresourceaccessto
optimizedDLlibraries
Deviceresourceaccess
toaccelerators
GPU FPGA Movidius
Azure
Container
Registry
Azure IOT
Hub
ONNX Runtime
OpenVINO EP
ONNX
Model
ONNX Model Converters
Edge
Theneedfor‘intelligenceattheedge’!
What are you? I am asking
the ‘cloud’ if I should
vacuum you too.
I’ll scratch you down to
your motors, if you
come any closer!
Computer Vision and AI at the edge
Intel® Neural Compute Stick 2 69
CMX (2.5 MB to 450 GB/s Bandwidth)
Neural
Compute
Engine
CV
Accelerat-
ion
Pixel
Processin
g
System Support
Functions
Interfaces
16 SHAVE
Programmable Cores
CPU
Cluster
RT
RISC
LPDDR
AON
Intel®NeuralComputestick2:
Featuringtheintel®Movidius™myriad™xvpu
System support
functions operate
frames, tiles, CODEC,
compression and
security
Homogeneous memory
design for low-power,
UL latency, sustained
High Performance, and
locally stored data
VLIW (DSP)
programmable
processors are
optimized for complex
vision & imaging
workloads
An entirely new deep
neural network (DNN)
inferencing engine that
offers flexible
interconnect and ease of
configuration for on-
device DNNs and
computer vision
applications
RISC Processors, RTOS
Schedulers, Pipeline
Managers, Sensor
Control Frameworks
A self-sufficient, all-in-one processor that features the powerful Neural Compute Engine and 16 programmable SHAVE cores
that deliver class-leading performance for deep neural network inference applications.
Intel® Neural Compute Stick 2 70
HighPerformance&LowPowerforAIInference
Intel®neuralcomputestick2
Order now from Mouser Electronics
for $99 MSRP*: Where to buy
*MSRP is not a guarantee of final retail price. MSRP may be changed in the future based upon economic conditions.
+
Intel®
Movidius™
Myriad™ X
VPU
Intel® Distribution of
OpenVINO™ toolkit
Optimized by
Powered by
MORE CORES. MORE AI INFERENCE.
✓ Start quickly with plug-and-play
simplicity
✓ Develop on common frameworks
and out-of-box sample applications
✓ Prototype on any platform with a
USB port
✓ Operate without cloud compute
dependence
Boost
productivity
Simplify
prototyping
Discover
efficiencies
71
72
1. Operator optimizations
2. Graph optimizations
3. System optimizations
Operatoroptimizations
In TensorFlow, computation graph is a data-flow graph.
MatMul
Examples Weights
Bias
Add
ReLU
Operatoroptimizations
Replace default (Eigen) kernels by highly-optimized kernels (using Intel® MKL-
DNN)
Intel® MKL-DNN has optimized a set of TensorFlow operations.
Library is open-source (https://github.com/intel/mkl-dnn) and downloaded
automatically when building TensorFlow.
Forward Backward
Conv2D Conv2DGrad
Relu, TanH, ELU ReLUGrad, TanHGrad,
ELUGrad
MaxPooling MaxPoolingGrad
AvgPooling AvgPoolingGrad
BatchNorm BatchNormGrad
LRN LRNGrad
MatMul, Concat
Graphoptimizations:fusion
Conv2D
BiasAdd
Input Filter
Bias
Conv2DWithBias
Input Filter Bias
Before Merge After Merge
Graphoptimizations:fusion
Conv2D
ReLU
Input Filter
Conv2DWithRelu
Input Filter
Before Merge After Merge
Graphoptimizations:layoutpropagation
Converting to/from optimized layout can be less expensive than operating on un-
optimized layout.
All MKL-DNN operators use highly-optimized layouts for TensorFlow tensors.
Conv2D
ReLU
Input Filter
Shape
MklConv2D
Input Filter
Convert
Convert Convert
MklReLU
Convert
Shape
Convert
Initial Graph After Layout Conversions
Graphoptimizations:layoutpropagation
Did you notice anything wrong with previous graph?
Problem: redundant conversions
MklConv2D
Input Filter
Convert
Convert Convert
MklReLU
Convert
Shape
Convert
MklConv2D
Input Filter
Convert Convert
MklReLU
Convert
Shape
After Layout Conversion After Layout Propagation
TensorFlow graphs offer opportunities
for parallel execution.
Threading model
1. inter_op_parallelism_threads =
max number of operators that
can be executed in parallel
2. intra_op_parallelism_threads =
max number of threads to use
for executing an operator
3. OMP_NUM_THREADS = MKL-DNN
equivalent of
intra_op_parallelism_threads
Systemoptimizations:loadbalancing
MklConv2D
Input Filter
Convert Convert
MklReLU
Convert
Shape
>>> config = tf.ConfigProto()
>>> config.intra_op_parallelism_threads = 56
>>> config.inter_op_parallelism_threads = 2
>>> tf.Session(config=config)
tf.ConfigProto is used to set the inter_op_parallelism_threads and
intra_op_parallelism_threads configurations of the Session object.
https://www.tensorflow.org/performance/performance_guide#tensorflow_with_intel_mkl_dnn
performanceGUIDE
Systemoptimizations:loadbalancing
Incorrect setting of threading model parameters can lead to over-
or under-subscription, leading to poor performance.
Solution:
Set these parameters for your model manually.
Guidelines on TensorFlow webpage
OMP: Error #34: System unable
to allocate necessary resources
for OMP thread:
OMP: System error #11: Resource
temporarily unavailable
OMP: Hint: Try decreasing the
value of OMP_NUM_THREADS.
performanceGUIDE
Setting the threading model correctly
We provide best settings for popular CNN models. (https://ai.intel.com/tensorflow-optimizations-
intel-xeon-scalable-processor)
os.environ["KMP_BLOCKTIME"] = "1"
os.environ["KMP_AFFINITY"] = "granularity=fine,compact,1,0"
os.environ["KMP_SETTINGS"] = "0"
os.environ["OMP_NUM_THREADS"] = “56"
https://www.tensorflow.org/performance/performance_
guide#tensorflow_with_intel_mkl_dnn
Example setting MKL variables with python os.environ :
performanceGUIDE
https://www.tensorflow.org/performance/performance_guide#tensorflow_with_intel_mkl_dnn
85
Summary
Convolutional Neural Network with TensorFlow
Getting Intel-optimized TensorFlow is easy.
TensorFlow performance guide is the best source on performance tips.
Intel-optimized TensorFlow improves TensorFlow CPU performance by up to 14X.
Stay tuned for updates - https://ai.intel.com/tensorflow
87
LeveragetheadvantagesofIntel’send-to-endAIofferings
• Training:
• Take advantage of Intel® Xeon™ Scalable Processors for training Deep Neural Networks
• Download and Install Intel® Optimized Caffe*
• Download and install Tensorflow* with Intel’s optimizations
• Pre-built wheels for Intel Architecture
• Inference
• Download and Install the Intel® Movidius™ Neural Compute Stick SDK
• Take advantage of AI courses and training available on Intel® Developer Zone
88

Microsoft Build 2019- Intel AI Workshop

  • 1.
  • 2.
    This document containsinformation on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. No computer system can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K. The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as "Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. Intel, the Intel logo, Pentium, Celeron, Atom, Core, Xeon and others are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. © 2018 Intel Corporation. Legalnotices&disclaimers
  • 3.
  • 4.
    4 Agenda • Intel® AIAcademy • Intel® AI Portfolio • Intel AI Use Cases • ML/DL Introduction • Training on Caffe*/Tensorflow* with Intel optimizations • Introduction to the Intel® OpenVINO™ Toolkit • Introduction to the Intel® Movidius™ Neural Compute Stick and SDK • Overview of Intel® Optimized Caffe* and Tensorflow* • Intel® AI DevCloud
  • 6.
    © 2019 IntelCorporation Theai journey Business imperative Intel AI
  • 7.
    AIIsthedrivingforce Foresight Predictive Analytics Forecast Prescriptive Analytics Act/adapt Cognitive Analytics Hindsight Descriptive Analytics insight Diagnostic Analytics WhyAInow? AnalyticsCurve 25GBper month Internet User 1 Datadeluge(2019)Insights Business Operational Security 50GBper day Smart Car 2 3TB per day Smart Hospital 2 40TBper day Airplane Data 2 1pBper day Smart Factory 2 50PBper day City Safety 2 1. Source: http://www.cisco.com/c/en/us/solutions/service-provider/vni-network-traffic-forecast/infographic.html 2. Source: https://www.cisco.com/c/dam/m/en_us/service-provider/ciscoknowledgenetwork/files/547_11_10-15-DocumentsCisco_GCI_Deck_2014-2019_for_CKN__10NOV2015_.pdf
  • 8.
  • 9.
    Consumer Health FinanceRetail Government Energy Transport Industrial Other Smart Assistants Chatbots Search Personalization Augmented Reality Robots Enhanced Diagnostics Drug Discovery Patient Care Research Sensory Aids Algorithmic Trading Fraud Detection Research Personal Finance Risk Mitigation Support Experience Marketing Merchandising Loyalty Supply Chain Security Defense Data Insights Safety & Security Resident Engagement Smarter Cities Oil & Gas Exploration Smart Grid Operational Improvement Conservation In-Vehicle Experience Automated Driving Aerospace Shipping Search & Rescue Factory Automation Predictive Maintenance Precision Agriculture Field Automation Advertising Education Gaming Professional & IT Services Telco/Media Sports Source: Intel forecast AIwilltransform
  • 10.
    © 2019 IntelCorporation Theai journey Business imperative Intel AI
  • 11.
  • 12.
    Proof of Concept:Image Recognition Seismic Reflection Analysis Client: A leading developer of software solutions to the global oil and gas industry. Challenge: Automate identification of fault lines within seismic reflection data. Solution: Built a proof of concept that is trained using seismic reflection data and can predict the probability of finding fault lines on previously unseen images. Performs pixel-wise semantic segmentation of SEG-Y formatted data Model trained using supervised learning Advantages: Automation enables analysis of vast amounts of data faster Could identify potentially rewarding locations from subtle clues in the data
  • 13.
    Proof of Concept:Image Recognition Oil Rig “Inspector Assist” SystemClient Multinational oil and gas company Challenge The customer operates a number of offshore oil rigs, and uses submersible vehicles to take video footage to ensure their infrastructure is healthy and safe. Since reviews of this footage are time consuming and prone to errors, a more efficient solution for detecting potential problems is needed. Solution Built models to detect and classify bolts according to level of corrosion. Advantages Video footage can be condensed to 10% of its original length by filtering out unimportant frames and highlight potential problem areas, enabling inspectors to perform their jobs more efficiently. Level of Corrosion Low High
  • 16.
    © 2019 IntelCorporation Theai journey Business imperative Intel AI
  • 17.
    Software hardware community nGraph OpenVINO™ toolkit Nauta™ ML Libraries Intel AI Builders IntelAI Developer Program BreakingbarriersbetweenAITheoryandreality Simplify AI via our robust community Choose any approach from analytics to deep learning Tame your data deluge with our data layer expertise Deploy AI anywhere with unprecedented HW choice Speed up development with open AI software Partner with Intel to accelerate your AI journey Scale with confidence on the platform for IT & cloud Intel GPU * * * * * Intel AI DevCloud BigDL Intel® MKL-DNN www.intel.ai
  • 19.
    Dedicated Media/vision Automated Driving Dedicated DLTraining Flexible Acceleration Dedicated DLinference Graphics,Media& AnalyticsAcceleration *FPGA: (1) Firstto market to accelerate evolving AI workloads (2) AI+other system level workloads like AI+I/O ingest, networking, security, pre/post-processing, etc (3) Low latency memory constrained workloads like RNN/LSTM 1GNA=Gaussian Neural Accelerator All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. Images are examples of intended applications but not an exhaustive list. device Edge Multi-cloud NNP-L NNP-I GPU And/OR ADD ACCELERATION DeployAIanywhere with unprecedented hardware choice
  • 21.
    21 © 2019 IntelCorporation 1 An open source version is available at: 01.org/openvinotoolkit *Other names and brands may be claimed as the property of others. Developer personas show above represent the primary user base for each row, but are not mutually-exclusive All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice. TOOLKITS App developers libraries Data scientists Kernels Library developers DEEP LEARNING DEPLOYMENT Intel® Distribution of OpenVINO™ Toolkit1 Nauta (Beta) Deep learning inference deployment on CPU/GPU/FPGA/VPU for Caffe*, TensorFlow*, MXNet*, ONNX*, Kaldi* Open source, scalable, and extensible distributed deep learning platform built on Kubernetes DEEP LEARNING FRAMEWORKS Optimized for CPU & more Status & installation guides More framework optimizations underway (e.g. PaddlePaddle*, CNTK* & more) MACHINE LEARNING (ML) Python R Distributed • Scikit- learn • Pandas • NumPy • Cart • Random Forest • e1071 • MlLib (on Spark) • Mahout ANALYTICS & ML Intel® Distribution for Python* Intel® Data Analytics Library Intel distribution optimized for machine learning Intel® Data Analytics Acceleration Library (incl machine learning) DEEP LEARNING GRAPH COMPILER Intel® nGraph™ Compiler (Beta) Open source compiler for deep learning model computations optimized for multiple devices (CPU, GPU, NNP) from multiple frameworks (TF, MXNet, ONNX) DEEP LEARNING Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) Open source DNN functions for CPU / integrated graphics * * * * FOR * * Speedupdevelopment with open AI software Optimization Notice
  • 23.
    software.intel.com/ai Get 4-weeks FREEaccess to the Intel® AI DevCloud, use your existing Intel® Xeon® Processor-based cluster, or use a public cloud service Intel®AIacademy For developers, students, instructors and startups teach Share Developlearn Showcase your innovation at industry & academic events and online via the Intel AI community forum Get smarter using online tutorials, webinars, student kits and support forums Educate others using available course materials, hands-on labs, and more
  • 24.
    24 LearnMoreonDevMesh Opportunities to share yourprojects as an Intel® Student Ambassador ▪ Industry events via sponsored speakerships ▪ Student Workshops ▪ Ambassador Labs ▪ Intel® Developer Mesh
  • 25.
    AIbuilders:ecosystem BUSINESS INTELLIGENCE &ANALYTCS VISION CONVERSATIONALBOTS AITOOLS&CONSULTINGAIPaaS HEALTHCARE FINANCIAL SERVICES RETAIL TRANSPORTATION NEWS,MEDIA& ENTERTAINMENT AGRICULTURE LEGAL&HR ROBOTIC PROCESS AUTOMOATION oem Systemintegrators CROSSVERTICAL VERTICAL HORIZONTAL Builders.intel.com/ai Other names and brands may be claimed as the property of others. 100+AI Partners
  • 27.
  • 28.
    28 Applying Algorithms toobserved data and make predictions based on data. Whatismachinelearning?
  • 29.
  • 30.
  • 31.
    31 CLASSIFICATION Predict a labelfor an entity with a given set of features. SPAM prediction sentimentanalysis
  • 32.
    3 2 MarketSegmentation Playtime inhours Age Causal Gamers No Gamers Serious Gamers 10 15 2025 30 35 40 45 50 55 60 65 70 75 80 85 90 CLUSTERING Group entities with similar features
  • 33.
    33 Minimum Mean SquaredError 0.0 1.0 2.0 1.0 2.0 Budget BoxOffice x108 x108 min 𝛽0,𝛽1 1 𝑚 ෍ 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2
  • 34.
    34 Gradient Descent Start witha cost function J(𝛽): 𝑱 𝜷 𝜷 Then gradually move towards the minimum. Global Minimum
  • 35.
    35 ▪ Each pointcan be iteratively calculated from the previous one 3 Gradient Descent with Linear Regression 𝐽 𝛽0, 𝛽1 𝛽1 𝛽0 𝜔0 𝜔1𝜔2 = 𝜔1 − 𝛼𝛻 1 2 ෍ 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 𝜔2 𝜔3 = 𝜔2 − 𝛼𝛻 1 2 ෍ 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 𝜔3
  • 37.
    37 3 Why Deep Learning– What is wrong with Linear Classifiers? XOR The counter example to all models We need non- linear functions X1 X2 0 0 0 y 0 1 1 1 0 1 1 1 0 0 X1 X2 0 1 1 Source: https://medium.com/towards-data-science/introducing-deep-learning-and-neural-networks-deep-learning-for-rookies-1-bd68f9cf5883 + +- -
  • 38.
    38 3 1.5 0.5 Input Input +1 +1 +1 +1 -2 Output X1 X2 00 0 y 0 1 1 1 0 1 1 1 0 We Need Layers Usually Lots with Non-linear Transformations Threshold to 0 or 1 XOR = (X1 and not X2) OR (Not X1 and X2) 1 0 1 x 1 0 x 1 1 x 1 0 x 1 1 < 1.5 0 x -2 (1 x 1) + (0 x 1) < 1.5 = 0 ( 1x1) + (0x-2) + (0x1)= 1 > 0.5 =1
  • 39.
    39 1.5 0.5 Input Input +1 +1 +1 +1 -2 Output X1 X2 00 0 y 0 1 1 1 0 1 1 1 0 We Need Layers Usually Lots with Non-linear Transformations Threshold to 0 or 1 XOR = (X1 and not X2) OR (Not X1 and X2) 1 1 1 x 1 1 x 1 1 x 1 1 x 1 2 > 1.5 1 x -2 (1 x 1) + (1 x 1) = 2 > 1.5 (1x1) + (1x -2) + (1x1) = 0 < .5 =0
  • 40.
    40 “Deep learning isa set of algorithms in machine learning that attempt to model high-level abstractions in data by using architectures composed of multiple non-linear transformations.” - Wikipedia* 4 This is a brewing domain called Deep Learning In the machine learning world, we use neural networks. The idea comes from biology. Each layer learns something.
  • 41.
    41 1.5 0.5 Input Input +1 +1 +1 +1 -2 Output ≈ Motivationfor Neural Nets ▪ Use biology as inspiration for mathematical model ▪ Get signals from previous neurons ▪ Generate signals (or not) according to inputs ▪ Pass signals on to next neurons≈ ▪ By layering many neurons, can create complex model
  • 42.
    42 4 Each Layer LearnsSomething Elephant Faces Cars Elephants Chairs Fully Connected layer PredictionLayer 1 Layer 2 Layer N
  • 44.
    b w3 Activation Function x1 x2 x3 w1 w2 z = x1w1+x2w2+ x3w3+b f(z) 1 BasicNeuronVisualization 44
  • 45.
    45 ▪ Sigmoid function –Smooth transition in output between (0,1) ▪ Tanh function – Smooth transition in output between (-1,1) ▪ ReLU function – f(x) = max(x,0) ▪ Step function – f(x) = (0,1) TypesofActivationFunctions
  • 46.
    46 WhyNeuralNets? ▪ Why notjust use a single neuron? Why do we need a larger network? ▪ A single neuron (like logistic regression) only permits a linear decision boundary. ▪ Most real-world problems are considerably more complicated!
  • 47.
  • 48.
    48 Convolutional Neural Nets PrimaryIdeas behind Convolutional Neural Networks: – Let the Neural Network learn which kernels are most useful – Use same set of kernels across entire image (translation invariance) – Reduces number of parameters and “variance” (from bias-variance point of view) – Can Think of Kernels as “Local Feature Detectors” Vertical Line Detector -1 1 -1 -1 1 -1 -1 1 -1 Horizontal Line Detector -1 -1 -1 1 1 1 -1 -1 -1 Corner Detector -1 -1 -1 -1 1 1 -1 1 1
  • 49.
  • 50.
  • 51.
    Pooling:Max-pool ▪ For eachdistinct patch, represent it by the maximum ▪ 2x2 Max-Pool shown below 51
  • 52.
    LeNet-5 How many totalweights in the network? Conv1: 1*6*5*5 + 6 = 156 Conv3: 6*16*5*5 + 16 = 2416 FC1: 400*120 + 120 = 48120 FC2: 120*84 + 84 = 10164 FC3: 84*10 + 10 = 850 Total: = 61706 Less than a single FC layer with [1200x1200] weights! Note that Convolutional Layers have relatively few weights. 52
  • 53.
    53 5 Convolutional Neural Network –Each neuron connected to a small set of nearby neurons in the previous layer – Uses same set of weights for each neuron – Ideal for spatial feature recognition, Ex: Image recognition – Cheaper on resources due to fewer connections Fully Connected Neural Networks – Each neuron is connected to every neuron in the previous layer – Every connection has a separate weight – Not optimal for detecting features – Computationally intensive – heavy memory usage Differences Between CNN and Fully Connected Networks
  • 55.
    Natural and man-madedisasters create havoc and grief. Lost and abandoned pets/livestock only add to the emotional toll. How do you find your beloved dog after a flood? What happens to your daughter’s horse? Our charter is to unite pets with their families. Animal ID Startup
  • 56.
    We need yourhelp creating a way to identify animals. Initial product is focused on cat/dog breed identification. Your app will be used by rescuers and the public to document found animals and to search for lost pets. Welcome aboard! YourJob:DataScientist
  • 59.
    59 ArtificialIntelligenceDevelopmentCycle Data aquisition and organization Integratetrained models with application code Create models Adjust models to meet performance and accuracy objectives Intel® Deep Learning Deployment Toolkit Provides Deployment from Intel® Edge to Cloud
  • 60.
    Lots of Labeled Data! Training Inference Forward Backward Model Weights Forward “Bicycle”? “Strawberry” “Bicycle”? Error Human Bicycle Strawberry ?????? DataSet Size Accuracy Did You Know? Training requires a very large data set and deep neural network (many layers) to achieve the highest accuracy in most cases DeepLearning:Trainingvs.Inference 60
  • 61.
    Copyright © 2019,Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization NoticeOptimization Notice Unleash CNN-based deep learning inference using a common API, 30+ pre-trained models, & computer vision algorithms. Validated on more than 100 public/custom models. 61 Benefits of Intel® Distribution of OpenVINO™ toolkit Reduce time using a library of optimized OpenCV* & OpenVX* functions, & 15+ samples. Develop once, deploy for current & future Intel-based devices. Use OpenCL™ kernels/tools to add your own unique code. Customize layers without the overhead of frameworks. Access Intel computer vision accelerators. Speed code performance. Supports heterogeneous execution. AcceleratePerformance OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos IntegrateDeeplearning Maximize the Power of Intel® Processors: CPU, GPU/Intel® Processor Graphics, FPGA,VPU speeddevelopment Innovate&customize Deep learning revenue is estimated to grow from $655M in 2016 to $35B by 2025¹. 1Tractica 2Q 2017
  • 62.
    62 Choosingthe“Right”Hardware Power/Performance Efficiency Varies ▪Running the right workload on the right piece of hardware → higher efficiency ▪ Hardware acceleration is a must ▪ Heterogeneous computing? Tradeoffs ▪ Power/performance ▪ Price ▪ Software flexibility, portability PowerEfficiency Computation Flexibility Dedicated Hardware GPU CPU X1 X10 X100 Vision Processing Efficiency Vision DSPs FPGA
  • 63.
    63 ▪ Based onselection and connections of computational filters to abstract key features and correlating them to an object. ▪ Works well with well defined objects and controlled scene. ▪ Difficult to predict critical features in larger number of objects or varying scenes. Traditional Computer Vision ▪ Based on application of a large number of filters to an image to extract features. ▪ Features in the object(s) are analyzed with the goal of associating each input image with an output node for each type of object. ▪ Values are assigned to output node representing the probability that the image is the object associated with the output node. Deep Learning Computer Vision Pre-trained Optimized Deep Learning Models OpenVINO™ toolkit Intel® Deep Learning Deployment Toolkit Model Optimizer Inference Engine API Solution Computer Vision Libraries OpenCV*/OpenVX* OpenCV* OpenVX* Direct Coding Solution Custom Code new filters/algorithms or optimizations/fusing steps OpenCL™ C/C++ Intel® SDK for OpenCL™ Application Intel® Media SDK API Solution CPU GPU FPGA VPU CPU GPU CPU GPU Intel Hardware Abstraction Layer OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos IR = Intermediate Representation File GPU = Intel CPU with integrated graphics processing unit/Intel® Processor Graphics VPU = Intel® Movidius™ Vision Processing Unit Deep Learning vs. Traditional Computer Vision OpenVINO™ toolkit has tools for an end-to-end vision pipeline IR File
  • 64.
    Optimize/ Heterogeneous Inference engine supports multiple devicesfor heterogeneous flows. (device-level optimization) Prepare Optimize Model optimizer: ▪ Converting ▪ Optimizing ▪ Preparing to inference (device agnostic, generic optimization) Inference Inference engine lightweight API to use in applications for inference. MKL- DNN cl-DNN CPU: Intel® Xeon®/Intel® Core™/Intel Atom® GPU FPGA Myriad™ 2/X DLA Intel® Movidius™ API Train Train a DL model. Currently supports: ▪ Caffe* ▪ Mxnet* ▪ TensorFlow* Extend Inference engine supports extensibility and allows custom kernels for various devices. Extensibility C++ Extensibility OpenCL™ Extensibility OpenCL™/TB D Extensibility TBD ApplicationdevelopmentwithOpenVINO™Toolkit 64
  • 65.
    AzureML→EdgeFlow usingAZUREIOTEdge+AZUREONNXRT+OpenVINOExecutionProvider MSFT’s pre-trained topologies &models User’s custom topologies & models ONNX, CAFFE, TENSORFLOW, … AzureML Intel component MSFT components Users’ custom components OS with Azure IoT Edge OpenVINO IE Libs Inference Scripts CPU MKL DNN CLDNN, Media libs DLA Myriad Localresourceaccessto optimizedDLlibraries Deviceresourceaccess toaccelerators GPU FPGA Movidius Azure Container Registry Azure IOT Hub ONNX Runtime OpenVINO EP ONNX Model ONNX Model Converters Edge
  • 67.
    Theneedfor‘intelligenceattheedge’! What are you?I am asking the ‘cloud’ if I should vacuum you too. I’ll scratch you down to your motors, if you come any closer!
  • 68.
    Computer Vision andAI at the edge
  • 69.
    Intel® Neural ComputeStick 2 69 CMX (2.5 MB to 450 GB/s Bandwidth) Neural Compute Engine CV Accelerat- ion Pixel Processin g System Support Functions Interfaces 16 SHAVE Programmable Cores CPU Cluster RT RISC LPDDR AON Intel®NeuralComputestick2: Featuringtheintel®Movidius™myriad™xvpu System support functions operate frames, tiles, CODEC, compression and security Homogeneous memory design for low-power, UL latency, sustained High Performance, and locally stored data VLIW (DSP) programmable processors are optimized for complex vision & imaging workloads An entirely new deep neural network (DNN) inferencing engine that offers flexible interconnect and ease of configuration for on- device DNNs and computer vision applications RISC Processors, RTOS Schedulers, Pipeline Managers, Sensor Control Frameworks A self-sufficient, all-in-one processor that features the powerful Neural Compute Engine and 16 programmable SHAVE cores that deliver class-leading performance for deep neural network inference applications.
  • 70.
    Intel® Neural ComputeStick 2 70 HighPerformance&LowPowerforAIInference Intel®neuralcomputestick2 Order now from Mouser Electronics for $99 MSRP*: Where to buy *MSRP is not a guarantee of final retail price. MSRP may be changed in the future based upon economic conditions. + Intel® Movidius™ Myriad™ X VPU Intel® Distribution of OpenVINO™ toolkit Optimized by Powered by MORE CORES. MORE AI INFERENCE. ✓ Start quickly with plug-and-play simplicity ✓ Develop on common frameworks and out-of-box sample applications ✓ Prototype on any platform with a USB port ✓ Operate without cloud compute dependence Boost productivity Simplify prototyping Discover efficiencies
  • 71.
  • 72.
  • 73.
    1. Operator optimizations 2.Graph optimizations 3. System optimizations
  • 74.
    Operatoroptimizations In TensorFlow, computationgraph is a data-flow graph. MatMul Examples Weights Bias Add ReLU
  • 75.
    Operatoroptimizations Replace default (Eigen)kernels by highly-optimized kernels (using Intel® MKL- DNN) Intel® MKL-DNN has optimized a set of TensorFlow operations. Library is open-source (https://github.com/intel/mkl-dnn) and downloaded automatically when building TensorFlow. Forward Backward Conv2D Conv2DGrad Relu, TanH, ELU ReLUGrad, TanHGrad, ELUGrad MaxPooling MaxPoolingGrad AvgPooling AvgPoolingGrad BatchNorm BatchNormGrad LRN LRNGrad MatMul, Concat
  • 76.
  • 77.
  • 78.
    Graphoptimizations:layoutpropagation Converting to/from optimizedlayout can be less expensive than operating on un- optimized layout. All MKL-DNN operators use highly-optimized layouts for TensorFlow tensors. Conv2D ReLU Input Filter Shape MklConv2D Input Filter Convert Convert Convert MklReLU Convert Shape Convert Initial Graph After Layout Conversions
  • 79.
    Graphoptimizations:layoutpropagation Did you noticeanything wrong with previous graph? Problem: redundant conversions MklConv2D Input Filter Convert Convert Convert MklReLU Convert Shape Convert MklConv2D Input Filter Convert Convert MklReLU Convert Shape After Layout Conversion After Layout Propagation
  • 80.
    TensorFlow graphs offeropportunities for parallel execution. Threading model 1. inter_op_parallelism_threads = max number of operators that can be executed in parallel 2. intra_op_parallelism_threads = max number of threads to use for executing an operator 3. OMP_NUM_THREADS = MKL-DNN equivalent of intra_op_parallelism_threads Systemoptimizations:loadbalancing MklConv2D Input Filter Convert Convert MklReLU Convert Shape
  • 81.
    >>> config =tf.ConfigProto() >>> config.intra_op_parallelism_threads = 56 >>> config.inter_op_parallelism_threads = 2 >>> tf.Session(config=config) tf.ConfigProto is used to set the inter_op_parallelism_threads and intra_op_parallelism_threads configurations of the Session object. https://www.tensorflow.org/performance/performance_guide#tensorflow_with_intel_mkl_dnn performanceGUIDE
  • 82.
    Systemoptimizations:loadbalancing Incorrect setting ofthreading model parameters can lead to over- or under-subscription, leading to poor performance. Solution: Set these parameters for your model manually. Guidelines on TensorFlow webpage OMP: Error #34: System unable to allocate necessary resources for OMP thread: OMP: System error #11: Resource temporarily unavailable OMP: Hint: Try decreasing the value of OMP_NUM_THREADS.
  • 83.
    performanceGUIDE Setting the threadingmodel correctly We provide best settings for popular CNN models. (https://ai.intel.com/tensorflow-optimizations- intel-xeon-scalable-processor) os.environ["KMP_BLOCKTIME"] = "1" os.environ["KMP_AFFINITY"] = "granularity=fine,compact,1,0" os.environ["KMP_SETTINGS"] = "0" os.environ["OMP_NUM_THREADS"] = “56" https://www.tensorflow.org/performance/performance_ guide#tensorflow_with_intel_mkl_dnn Example setting MKL variables with python os.environ :
  • 84.
  • 85.
    85 Summary Convolutional Neural Networkwith TensorFlow Getting Intel-optimized TensorFlow is easy. TensorFlow performance guide is the best source on performance tips. Intel-optimized TensorFlow improves TensorFlow CPU performance by up to 14X. Stay tuned for updates - https://ai.intel.com/tensorflow
  • 87.
    87 LeveragetheadvantagesofIntel’send-to-endAIofferings • Training: • Takeadvantage of Intel® Xeon™ Scalable Processors for training Deep Neural Networks • Download and Install Intel® Optimized Caffe* • Download and install Tensorflow* with Intel’s optimizations • Pre-built wheels for Intel Architecture • Inference • Download and Install the Intel® Movidius™ Neural Compute Stick SDK • Take advantage of AI courses and training available on Intel® Developer Zone
  • 88.