Microsoft Build 2019- Intel AI Workshop

This document contains information on products, services and/or processes in development. All information provided here is subject to change without
notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at
intel.com, or from the OEM or retailer. No computer system can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual
performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about
performance and benchmark results, visit http://www.intel.com/performance.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may
affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that
involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings,
including the annual report on Form 10-K.
The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications.
Current characterized errata are available on request.
Performance estimates were obtained prior to implementation of recent software patches and firmware updates intended to address exploits referred to as
"Spectre" and "Meltdown." Implementation of these updates may make these results inapplicable to your device or system.
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and
confirm whether referenced data are accurate.
Intel, the Intel logo, Pentium, Celeron, Atom, Core, Xeon and others are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2018 Intel Corporation.
Legalnotices&disclaimers

3
Questions?Askus!
MichaelZephyr
Developer Evangelist
michael.zephyr@intel.com
BenOdom
benjamin.j.odom@intel.com
MecitGungor
abdulmecit.gungor@intel.com

4
Agenda
• Intel® AI Academy
• Intel® AI Portfolio
• Intel AI Use Cases
• ML/DL Introduction
• Training on Caffe*/Tensorflow* with Intel optimizations
• Introduction to the Intel® OpenVINO™ Toolkit
• Introduction to the Intel® Movidius™ Neural Compute Stick and SDK
• Overview of Intel® Optimized Caffe* and Tensorflow*
• Intel® AI DevCloud

© 2019 Intel Corporation
Theai
journey
Business
imperative
Intel
AI

AIIsthedrivingforce
Foresight
Predictive
Analytics
Forecast
Prescriptive
Analytics
Act/adapt
Cognitive
Analytics
Hindsight
Descriptive
Analytics
insight
Diagnostic
Analytics
WhyAInow?
AnalyticsCurve
25GBper month
Internet User
1
Datadeluge(2019) Insights
Business
Operational
Security
50GBper day
Smart Car
2
3TB per day
Smart Hospital
2
40TBper day
Airplane Data
2
1pBper day
Smart Factory
2
50PBper day
City Safety
2
1. Source: http://www.cisco.com/c/en/us/solutions/service-provider/vni-network-traffic-forecast/infographic.html
2. Source: https://www.cisco.com/c/dam/m/en_us/service-provider/ciscoknowledgenetwork/files/547_11_10-15-DocumentsCisco_GCI_Deck_2014-2019_for_CKN__10NOV2015_.pdf

Deep
learning
Machine
learning
AI
Model
Weights
Forward
“Bicycle”?
??????
Lots of
tagged
data
Forward
Backward
“Strawberry”
“Bicycle”?
Error
Human Bicycle
Strawberry
Training:
Inference:
Many different approaches to AI
WhatisAI?

Consumer Health Finance Retail Government Energy Transport Industrial Other
Smart
Assistants
Chatbots
Search
Personalization
Augmented
Reality
Robots
Enhanced
Diagnostics
Drug
Discovery
Patient Care
Research
Sensory
Aids
Algorithmic
Trading
Fraud
Detection
Research
Personal
Finance
Risk Mitigation
Support
Experience
Marketing
Merchandising
Loyalty
Supply Chain
Security
Defense
Data
Insights
Safety &
Security
Resident
Engagement
Smarter
Cities
Oil & Gas
Exploration
Smart
Grid
Operational
Improvement
Conservation
In-Vehicle
Experience
Automated
Driving
Aerospace
Shipping
Search &
Rescue
Factory
Automation
Predictive
Maintenance
Precision
Agriculture
Field
Automation
Advertising
Education
Gaming
Professional &
IT Services
Telco/Media
Sports
Source: Intel forecast
AIwilltransform

TheAI
journey
2.Approach
3.values7.model
4.People
8.Deploy
1.Challenge
6.Data
5.Technology
Partner with Intel
to accelerate
your AI journey

Proof of Concept: Image Recognition
Seismic Reflection Analysis
Client:
A leading developer of software solutions to the global oil and gas industry.
Challenge:
Automate identification of fault lines within seismic reflection data.
Solution:
Built a proof of concept that is trained using seismic reflection data and can predict
the probability of finding fault lines on previously unseen images.
Performs pixel-wise semantic segmentation of SEG-Y formatted data
Model trained using supervised learning
Advantages:
Automation enables analysis of vast amounts of data faster
Could identify potentially rewarding locations from subtle clues in the data

Proof of Concept: Image Recognition
Oil Rig “Inspector Assist” SystemClient
Multinational oil and gas company
Challenge
The customer operates a number of offshore oil rigs,
and uses submersible vehicles to take video footage to
ensure their infrastructure is healthy and safe.
Since reviews of this footage are time consuming and
prone to errors, a more efficient solution for detecting
potential problems is needed.
Solution
Built models to detect and classify bolts according to
level of corrosion.
Advantages
Video footage can be condensed to 10% of its original length
by filtering out unimportant frames and highlight potential
problem areas, enabling inspectors to perform their jobs
more efficiently.
Level of Corrosion
Low High

Software
hardware
community
nGraph
OpenVINO™
toolkit
Nauta™
ML Libraries
Intel AI
Builders
Intel AI
Developer
Program
BreakingbarriersbetweenAITheoryandreality
Simplify AI
via our robust community
Choose any approach
from analytics to deep learning
Tame your data deluge
with our data layer expertise
Deploy AI anywhere
with unprecedented HW choice
Speed up development
with open AI software
Partner with Intel to accelerate your AI journey
Scale with confidence
on the platform for IT & cloud
Intel
GPU
*
*
*
*
*
Intel AI
DevCloud
BigDL
Intel®
MKL-DNN
www.intel.ai

Dedicated
Media/vision
Automated
Driving
Dedicated
DLTraining
Flexible
Acceleration
Dedicated
DLinference
Graphics,Media&
AnalyticsAcceleration
*FPGA: (1) First to market to accelerate evolving AI workloads (2) AI+other system level workloads like AI+I/O ingest, networking, security, pre/post-processing, etc (3) Low latency memory constrained workloads like RNN/LSTM
1GNA=Gaussian Neural Accelerator
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
Images are examples of intended applications but not an exhaustive list.
device
Edge
Multi-cloud
NNP-L
NNP-I
GPU
And/OR
ADD
ACCELERATION
DeployAIanywhere
with unprecedented hardware choice

21
© 2019 Intel Corporation
1 An open source version is available at: 01.org/openvinotoolkit *Other names and brands may be claimed as the property of others.
Developer personas show above represent the primary user base for each row, but are not mutually-exclusive
All products, computer systems, dates, and figures are preliminary based on current expectations, and are subject to change without notice.
TOOLKITS
App
developers
libraries
Data
scientists
Kernels
Library
developers
DEEP LEARNING DEPLOYMENT
Intel® Distribution of OpenVINO™ Toolkit1 Nauta (Beta)
Deep learning inference deployment
on CPU/GPU/FPGA/VPU for
Caffe*, TensorFlow*, MXNet*, ONNX*, Kaldi*
Open source, scalable, and extensible
distributed deep learning platform
built on Kubernetes
DEEP LEARNING FRAMEWORKS
Optimized for CPU & more
Status & installation guides
More framework optimizations
underway (e.g. PaddlePaddle*,
CNTK* & more)
MACHINE LEARNING (ML)
Python R Distributed
• Scikit-
learn
• Pandas
• NumPy
• Cart
• Random
Forest
• e1071
• MlLib (on Spark)
• Mahout
ANALYTICS & ML
Intel®
Distribution
for Python*
Intel® Data
Analytics
Library
Intel distribution
optimized for
machine learning
Intel® Data Analytics
Acceleration Library
(incl machine learning)
DEEP LEARNING GRAPH COMPILER
Intel® nGraph™ Compiler (Beta)
Open source compiler for deep learning model
computations optimized for multiple devices (CPU, GPU,
NNP) from multiple frameworks (TF, MXNet, ONNX)
DEEP LEARNING
Intel® Math Kernel
Library for Deep
Neural Networks
(Intel® MKL-DNN)
Open source DNN functions for
CPU / integrated graphics
*
*
*
*
FOR
*
*
Speedupdevelopment
with open AI software
Optimization Notice

software.intel.com/ai
Get 4-weeks FREE access to
the Intel® AI DevCloud, use
your existing Intel® Xeon®
Processor-based cluster, or
use a public cloud service
Intel®AIacademy
For developers, students, instructors and startups
teach Share
Developlearn
Showcase your innovation
at industry & academic
events and online via the
Intel AI community forum
Get smarter using
online tutorials,
webinars, student kits
and support forums
Educate others using
available course
materials, hands-on
labs, and more

24
LearnMoreonDevMesh
Opportunities to
share your projects
as an Intel® Student
Ambassador
▪ Industry events via
sponsored speakerships
▪ Student Workshops
▪ Ambassador Labs
▪ Intel® Developer Mesh

AIbuilders:ecosystem
BUSINESS
INTELLIGENCE
&ANALYTCS
VISION CONVERSATIONALBOTS AITOOLS&CONSULTING AIPaaS
HEALTHCARE FINANCIAL
SERVICES
RETAIL TRANSPORTATION NEWS,MEDIA&
ENTERTAINMENT
AGRICULTURE LEGAL&HR ROBOTIC
PROCESS
AUTOMOATION
oem Systemintegrators
CROSSVERTICAL
VERTICAL
HORIZONTAL
Builders.intel.com/ai
Other names and brands may be claimed as the property of others.
100+AI Partners

27
Source: https://en.wikipedia.org/wiki/Data_science
Thedatascienceprocess

28
Applying Algorithms to observed data and make predictions based on data.
Whatismachinelearning?

29
Machinelearning2methods:Supervised&UnsupervisedLearning
Supervised:Wetrainthemodel.Wefeedthemodelwithcorrectanswers.Model
Learnsandfinallypredicts.
Wefeedthemodelwith“groundtruth”.
Unsupervised:Dataisgiventothemodel.Rightanswersarenotprovidedtothe
model.Themodelmakessenseofthedatagiventoit.
Hopefullyteachesyousomethingyouwerenotawareof

30
TypesofSupervisedandUnsupervisedlearning
Classification
Regression
Clustering
Recommendation
SUPERVISED UNSUPERVISED

31
CLASSIFICATION
Predict a label for an entity with a given set of features.
SPAM
prediction sentimentanalysis

3
2
MarketSegmentation
Playtime
inhours
Age
Causal
Gamers
No
Gamers
Serious
Gamers
10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
CLUSTERING
Group entities with similar features

33
Minimum Mean Squared Error
0.0
1.0
2.0
1.0 2.0
Budget
BoxOffice
x108
x108
min
𝛽0,𝛽1
1
𝑚
෍
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2

34
Gradient Descent
Start with a cost function J(𝛽):
𝑱 𝜷
𝜷
Then gradually move towards the minimum.
Global Minimum

35
▪ Each point can be iteratively
calculated from the previous one
3
Gradient Descent with Linear Regression
𝐽 𝛽0, 𝛽1
𝛽1
𝛽0
𝜔0
𝜔1𝜔2 = 𝜔1 − 𝛼𝛻
1
2
෍
𝑖=1
𝑚
(𝑖)
(𝑖)
2
𝜔2
𝜔3 = 𝜔2 − 𝛼𝛻
1
2
෍
𝑖=1
𝑚
(𝑖)
(𝑖)
2 𝜔3

37
3
Why Deep Learning – What is wrong with Linear
Classifiers?
XOR
The counter
example to all
models
We need non-
linear functions
X1 X2
0 0 0
y
0 1 1
1 0 1
1 1 0
0
X1
X2
0
1
1
Source: https://medium.com/towards-data-science/introducing-deep-learning-and-neural-networks-deep-learning-for-rookies-1-bd68f9cf5883
+
+-
-

38
3
1.5 0.5
Input
Input
+1
+1
+1
+1
-2
Output
X1 X2
0 0 0
y
0 1 1
1 0 1
1 1 0
We Need Layers Usually Lots with Non-linear
Transformations
Threshold to 0 or 1
XOR = (X1 and not X2) OR (Not X1 and X2)
1
0
1 x 1
0 x 1
1 x 1
0 x 1
1 < 1.5
0 x -2
(1 x 1) + (0 x 1) < 1.5 = 0
( 1x1) + (0x-2) + (0x1)= 1 > 0.5 =1

39
1.5 0.5
Input
Input
+1
+1
+1
+1
-2
Output
X1 X2
0 0 0
y
0 1 1
1 0 1
1 1 0
We Need Layers Usually Lots with Non-linear
Transformations
Threshold to 0 or 1
XOR = (X1 and not X2) OR (Not X1 and X2)
1
1
1 x 1
1 x 1
1 x 1
1 x 1
2 > 1.5
1 x -2
(1 x 1) + (1 x 1) = 2 > 1.5
(1x1) + (1x -2) + (1x1) = 0 < .5 =0

40
“Deep learning is a set of algorithms in
machine learning that attempt to model
high-level abstractions in data by using
architectures composed of multiple
non-linear transformations.”
- Wikipedia*
4
This is a brewing domain called Deep Learning
In the machine learning world, we use neural networks. The idea comes from biology.
Each layer learns something.

41
1.5 0.5
Input
Input
+1
+1
+1
+1
-2 Output
≈
Motivation for Neural Nets
▪ Use biology as inspiration for mathematical model
▪ Get signals from previous neurons
▪ Generate signals (or not) according to inputs
▪ Pass signals on to next neurons≈
▪ By layering many neurons, can create complex model

42
4
Each Layer Learns Something
Elephant
Faces
Cars
Elephants
Chairs
Fully
Connected
layer
PredictionLayer 1 Layer 2 Layer N

b
w3
Activation
Function
x1
x2
x3
w1
w2
z = x1w1+ x2w2+ x3w3+b
f(z)
1
BasicNeuronVisualization
44

45
▪ Sigmoid function
– Smooth transition in output between (0,1)
▪ Tanh function
– Smooth transition in output between (-1,1)
▪ ReLU function
– f(x) = max(x,0)
▪ Step function
– f(x) = (0,1)
TypesofActivationFunctions

46
WhyNeuralNets?
▪ Why not just use a single neuron? Why do we need a larger network?
▪ A single neuron (like logistic regression) only permits a linear decision boundary.
▪ Most real-world problems are considerably more complicated!

FeedforwardNeuralNetwork
𝑥1
𝑥2
𝑥3
𝜎
𝜎
𝜎
𝜎
𝜎
𝜎
𝜎
𝜎
ො𝑦1
ො𝑦2
ො𝑦3
47

48
Convolutional Neural Nets
Primary Ideas behind Convolutional Neural Networks:
– Let the Neural Network learn which kernels are most useful
– Use same set of kernels across entire image (translation invariance)
– Reduces number of parameters and “variance” (from bias-variance point of view)
– Can Think of Kernels as “Local Feature Detectors”
Vertical Line Detector
-1 1 -1
-1 1 -1
-1 1 -1
Horizontal Line Detector
-1 -1 -1
1 1 1
-1 -1 -1
Corner Detector
-1 -1 -1
-1 1 1
-1 1 1

Source: http://cs231n.github.io/
ConvolutionalNeuralNetworks(CNN)forImageRecognition
50

Pooling:Max-pool
▪ For each distinct patch, represent it by the maximum
▪ 2x2 Max-Pool shown below
51

LeNet-5
How many total weights in the network?
Conv1: 1*6*5*5 + 6 = 156
Conv3: 6*16*5*5 + 16 = 2416
FC1: 400*120 + 120 = 48120
FC2: 120*84 + 84 = 10164
FC3: 84*10 + 10 = 850
Total: = 61706
Less than a single FC layer with [1200x1200] weights!
Note that Convolutional Layers have relatively few weights.
52

53
5
Convolutional Neural Network
– Each neuron connected to a small set of
nearby neurons in the previous layer
– Uses same set of weights for each neuron
– Ideal for spatial feature recognition, Ex: Image
recognition
– Cheaper on resources due to fewer
connections
Fully Connected Neural Networks
– Each neuron is connected to every neuron in the
previous layer
– Every connection has a separate weight
– Not optimal for detecting features
– Computationally intensive – heavy memory usage
Differences Between CNN and Fully Connected Networks

Natural and man-made disasters create
havoc and grief. Lost and abandoned
pets/livestock only add to the emotional
toll.
How do you find your beloved dog after a
flood? What happens to your daughter’s
horse?
Our charter is to unite pets with their
families.
Animal ID Startup

We need your help creating a way to
identify animals. Initial product is
focused on cat/dog breed identification.
Your app will be used by rescuers and the
public to document found animals and to
search for lost pets.
Welcome aboard!
YourJob:DataScientist

59
ArtificialIntelligenceDevelopmentCycle
Data aquisition and
organization
Integrate trained models
with application code
Create models
Adjust models to meet
performance and accuracy
objectives
Intel® Deep Learning Deployment Toolkit Provides Deployment from Intel® Edge to Cloud

Lots of
Labeled
Data!
Training
Inference
Forward
Backward
Model Weights
Forward
“Bicycle”?
“Strawberry”
“Bicycle”?
Error
Human
Bicycle
Strawberry
??????
Data Set Size
Accuracy
Did You Know?
Training requires a very large
data set and deep neural
network (many layers) to
achieve the highest accuracy in
most cases
DeepLearning:Trainingvs.Inference
60

Copyright © 2019, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Optimization NoticeOptimization Notice
Unleash CNN-based deep learning inference
using a common API, 30+ pre-trained models,
& computer vision algorithms. Validated
on more than 100 public/custom models.
61
Benefits of Intel® Distribution of OpenVINO™ toolkit
Reduce time using a library of optimized OpenCV*
& OpenVX* functions, & 15+ samples.
Develop once, deploy for current
& future Intel-based devices.
Use OpenCL™ kernels/tools to add your own
unique code. Customize layers without the
overhead of frameworks.
Access Intel computer vision accelerators.
Speed code performance.
Supports heterogeneous execution.
AcceleratePerformance
OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos
IntegrateDeeplearning
Maximize the Power of Intel® Processors: CPU, GPU/Intel® Processor Graphics, FPGA,VPU
speeddevelopment Innovate&customize
Deep learning revenue is estimated to grow from $655M in 2016 to $35B by 2025¹.
1Tractica 2Q 2017

62
Choosingthe“Right”Hardware
Power/Performance Efficiency Varies
▪ Running the right workload on the
right piece of hardware → higher
efficiency
▪ Hardware acceleration is a must
▪ Heterogeneous computing?
Tradeoffs
▪ Power/performance
▪ Price
▪ Software flexibility, portability
PowerEfficiency
Computation Flexibility
Dedicated
Hardware
GPU
CPU
X1
X10
X100 Vision Processing
Efficiency
Vision DSPs
FPGA

63
▪ Based on selection and connections of computational filters to
abstract key features and correlating them to an object.
▪ Works well with well defined objects and controlled scene.
▪ Difficult to predict critical features in larger number of objects or
varying scenes.
Traditional Computer Vision
▪ Based on application of a large number of filters to an image to
extract features.
▪ Features in the object(s) are analyzed with the goal of associating
each input image with an output node for each type of object.
▪ Values are assigned to output node representing the probability
that the image is the object associated with the output node.
Deep Learning Computer Vision
Pre-trained
Optimized Deep
Learning Models
OpenVINO™ toolkit
Intel® Deep Learning Deployment Toolkit
Model
Optimizer
Inference
Engine
API Solution
Computer Vision
Libraries
OpenCV*/OpenVX*
OpenCV* OpenVX*
Direct Coding Solution
Custom Code
new filters/algorithms or
optimizations/fusing steps
OpenCL™ C/C++
Intel® SDK for
OpenCL™
Application
Intel®
Media SDK
API Solution
CPU GPU FPGA VPU CPU GPU CPU GPU
Intel
Hardware
Abstraction
Layer
OpenVX and the OpenVX logo are trademarks of the Khronos Group Inc.
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos
IR = Intermediate Representation File GPU = Intel CPU with integrated graphics processing unit/Intel® Processor Graphics VPU = Intel® Movidius™ Vision Processing Unit
Deep Learning vs. Traditional Computer Vision
OpenVINO™ toolkit has tools for an end-to-end vision pipeline
IR
File

Optimize/
Heterogeneous
Inference engine
supports multiple
devices for
heterogeneous flows.
(device-level
optimization)
Prepare
Optimize
Model optimizer:
▪ Converting
▪ Optimizing
▪ Preparing to
inference
(device agnostic,
generic optimization)
Inference
Inference engine
lightweight API
to use in
applications for
inference.
MKL-
DNN
cl-DNN
CPU: Intel®
Xeon®/Intel®
Core™/Intel Atom®
GPU
FPGA
Myriad™ 2/X
DLA
Intel®
Movidius™
API
Train
Train a DL model.
Currently supports:
▪ Caffe*
▪ Mxnet*
▪ TensorFlow*
Extend
Inference engine
supports
extensibility
and allows
custom kernels
for various
devices.
Extensibility
C++
Extensibility
OpenCL™
Extensibility
OpenCL™/TB
D
Extensibility
TBD
ApplicationdevelopmentwithOpenVINO™Toolkit
64

AzureML→EdgeFlow
usingAZUREIOTEdge+AZUREONNXRT+OpenVINOExecutionProvider
MSFT’s pre-trained
topologies & models
User’s custom
topologies &
models
ONNX, CAFFE, TENSORFLOW, …
AzureML
Intel component
MSFT components
Users’ custom
components
OS with Azure IoT Edge
OpenVINO IE Libs
Inference Scripts
CPU
MKL
DNN
CLDNN,
Media libs
DLA Myriad
Localresourceaccessto
optimizedDLlibraries
Deviceresourceaccess
toaccelerators
GPU FPGA Movidius
Azure
Container
Registry
Azure IOT
Hub
ONNX Runtime
OpenVINO EP
ONNX
Model
ONNX Model Converters
Edge

Theneedfor‘intelligenceattheedge’!
What are you? I am asking
the ‘cloud’ if I should
vacuum you too.
I’ll scratch you down to
your motors, if you
come any closer!

Computer Vision and AI at the edge

Intel® Neural Compute Stick 2 69
CMX (2.5 MB to 450 GB/s Bandwidth)
Neural
Compute
Engine
CV
Accelerat-
ion
Pixel
Processin
g
System Support
Functions
Interfaces
16 SHAVE
Programmable Cores
CPU
Cluster
RT
RISC
LPDDR
AON
Intel®NeuralComputestick2:
Featuringtheintel®Movidius™myriad™xvpu
System support
functions operate
frames, tiles, CODEC,
compression and
security
Homogeneous memory
design for low-power,
UL latency, sustained
High Performance, and
locally stored data
VLIW (DSP)
programmable
processors are
optimized for complex
vision & imaging
workloads
An entirely new deep
neural network (DNN)
inferencing engine that
offers flexible
interconnect and ease of
configuration for on-
device DNNs and
computer vision
applications
RISC Processors, RTOS
Schedulers, Pipeline
Managers, Sensor
Control Frameworks
A self-sufficient, all-in-one processor that features the powerful Neural Compute Engine and 16 programmable SHAVE cores
that deliver class-leading performance for deep neural network inference applications.

Intel® Neural Compute Stick 2 70
HighPerformance&LowPowerforAIInference
Intel®neuralcomputestick2
Order now from Mouser Electronics
for $99 MSRP*: Where to buy
*MSRP is not a guarantee of final retail price. MSRP may be changed in the future based upon economic conditions.
+
Intel®
Movidius™
Myriad™ X
VPU
Intel® Distribution of
OpenVINO™ toolkit
Optimized by
Powered by
MORE CORES. MORE AI INFERENCE.
✓ Start quickly with plug-and-play
simplicity
✓ Develop on common frameworks
and out-of-box sample applications
✓ Prototype on any platform with a
USB port
✓ Operate without cloud compute
dependence
Boost
productivity
Simplify
prototyping
Discover
efficiencies

1. Operator optimizations
2. Graph optimizations
3. System optimizations

Operatoroptimizations
In TensorFlow, computation graph is a data-flow graph.
MatMul
Examples Weights
Bias
Add
ReLU

Operatoroptimizations
Replace default (Eigen) kernels by highly-optimized kernels (using Intel® MKL-
DNN)
Intel® MKL-DNN has optimized a set of TensorFlow operations.
Library is open-source (https://github.com/intel/mkl-dnn) and downloaded
automatically when building TensorFlow.
Forward Backward
Conv2D Conv2DGrad
Relu, TanH, ELU ReLUGrad, TanHGrad,
ELUGrad
MaxPooling MaxPoolingGrad
AvgPooling AvgPoolingGrad
BatchNorm BatchNormGrad
LRN LRNGrad
MatMul, Concat

Graphoptimizations:fusion
Conv2D
BiasAdd
Input Filter
Bias
Conv2DWithBias
Input Filter Bias
Before Merge After Merge

Graphoptimizations:fusion
Conv2D
ReLU
Input Filter
Conv2DWithRelu
Input Filter
Before Merge After Merge

Graphoptimizations:layoutpropagation
Converting to/from optimized layout can be less expensive than operating on un-
optimized layout.
All MKL-DNN operators use highly-optimized layouts for TensorFlow tensors.
Conv2D
ReLU
Input Filter
Shape
MklConv2D
Input Filter
Convert
Convert Convert
MklReLU
Convert
Shape
Convert
Initial Graph After Layout Conversions

Graphoptimizations:layoutpropagation
Did you notice anything wrong with previous graph?
Problem: redundant conversions
MklConv2D
Input Filter
Convert
Convert Convert
MklReLU
Convert
Shape
Convert
MklConv2D
Input Filter
Convert Convert
MklReLU
Convert
Shape
After Layout Conversion After Layout Propagation

TensorFlow graphs offer opportunities
for parallel execution.
Threading model
1. inter_op_parallelism_threads =
max number of operators that
can be executed in parallel
2. intra_op_parallelism_threads =
max number of threads to use
for executing an operator
3. OMP_NUM_THREADS = MKL-DNN
equivalent of
intra_op_parallelism_threads
Systemoptimizations:loadbalancing
MklConv2D
Input Filter
Convert Convert
MklReLU
Convert
Shape

>>> config = tf.ConfigProto()
>>> config.intra_op_parallelism_threads = 56
>>> config.inter_op_parallelism_threads = 2
>>> tf.Session(config=config)
tf.ConfigProto is used to set the inter_op_parallelism_threads and
intra_op_parallelism_threads configurations of the Session object.
https://www.tensorflow.org/performance/performance_guide#tensorflow_with_intel_mkl_dnn
performanceGUIDE

Systemoptimizations:loadbalancing
Incorrect setting of threading model parameters can lead to over-
or under-subscription, leading to poor performance.
Solution:
Set these parameters for your model manually.
Guidelines on TensorFlow webpage
OMP: Error #34: System unable
to allocate necessary resources
for OMP thread:
OMP: System error #11: Resource
temporarily unavailable
OMP: Hint: Try decreasing the
value of OMP_NUM_THREADS.

performanceGUIDE
Setting the threading model correctly
We provide best settings for popular CNN models. (https://ai.intel.com/tensorflow-optimizations-
intel-xeon-scalable-processor)
os.environ["KMP_BLOCKTIME"] = "1"
os.environ["KMP_AFFINITY"] = "granularity=fine,compact,1,0"
os.environ["KMP_SETTINGS"] = "0"
os.environ["OMP_NUM_THREADS"] = “56"
https://www.tensorflow.org/performance/performance_
guide#tensorflow_with_intel_mkl_dnn
Example setting MKL variables with python os.environ :

performanceGUIDE
https://www.tensorflow.org/performance/performance_guide#tensorflow_with_intel_mkl_dnn

85
Summary
Convolutional Neural Network with TensorFlow
Getting Intel-optimized TensorFlow is easy.
TensorFlow performance guide is the best source on performance tips.
Intel-optimized TensorFlow improves TensorFlow CPU performance by up to 14X.
Stay tuned for updates - https://ai.intel.com/tensorflow

87
LeveragetheadvantagesofIntel’send-to-endAIofferings
• Training:
• Take advantage of Intel® Xeon™ Scalable Processors for training Deep Neural Networks
• Download and Install Intel® Optimized Caffe*
• Download and install Tensorflow* with Intel’s optimizations
• Pre-built wheels for Intel Architecture
• Inference
• Download and Install the Intel® Movidius™ Neural Compute Stick SDK
• Take advantage of AI courses and training available on Intel® Developer Zone

Microsoft Build 2019- Intel AI Workshop

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Microsoft Build 2019- Intel AI Workshop

Similar to Microsoft Build 2019- Intel AI Workshop (20)

More from Intel® Software

More from Intel® Software (20)

Recently uploaded

Recently uploaded (20)

Microsoft Build 2019- Intel AI Workshop