Eran Shlomo, IPP tech lead, Haifa
eran.shlomo@intel.com, eran@dataloop.ai
About me
Haifa IoT Ignition lab and IPP(Intel ingenuity partnership program) tech lead.
Intel Perceptual computing.
Compute, cloud and embedded expert.
Maker and Entrepreneur
Focus on Data science and Machine learning in recent years
Soon to work on dataloop.ai
Agenda
What is deep learning
Why now ?
Different network topologies and their usage
The tools race
The processors (HW) race
Buzzwords alignment attempt
AI
Machine
learning
Supervised
learning
Deep
learning
Machine reasoning
Automated tasks
Train based on data
Neural networks
input
logic
output
input
output
logic
Deep learning – basic anatomy
Data driven
Training a model
Input, output and hidden neurons
Input layer Hidden layer(s) Output layer
Deep learning Many hidden (deep) layers
The essence of deeplearning
Xi YiWij(1) Wij(2)
W11(1)
X1
Y1
W11(2)
𝑌 = 𝑓 𝑋 = 𝑊𝑋+b
Deep network is essentially a function
we train to detect some pattern
b (bias) is omitted in this drawing
Data is becoming the fuel behind new SW,
BK (Intel CEO) – “Data is the new Oil”
Why now ?
28.2
25.8
16.4
11.7
6.7
3.57 2.99
2010 2011 2012 2013 2014 2015 2016
ILSVRC top 5 error
ILSVRC top 5 error
8 Layers 22 152
Alexnet
Shallow Ensemble
Data
Neural networks – Background and inspiration
It is pretty common to compare neural networks to how our brain works:
• Coupled well with the term AI
• Has some sense in it, as many different researches show. Yet we are a bit long from really understanding
how the brain works.
𝑘=0
𝑛
𝑊𝑋
W1
W2
W3
X1
X2
X3
𝑓(𝑥)
Network topologies
• There are many network topologies
• The basic principles apply:
• Supervised
• hidden units
• backpropagation training is common to most
• Training on data generates model, later to be used to inference on unseen data:
• Minimize a cost function
Some basic intuition
Model have capacity  Number of parameters.
Generally HW (compute/mem) limits the capacity
From the Paper: AN ANALYSIS OF DEEP NEURAL NETWORK MODELS
FOR PRACTICAL APPLICATIONS
More
compute
& Data
Higher
accuracy
Bigger
model
Model fit scenarios
0
20
40
60
80
100
120
140
160
0 5 10 15
Good model
0
20
40
60
80
100
120
140
160
0 5 10 15
Underfit/High
bias
0
20
40
60
80
100
120
140
160
0 5 10 15
Overfit/ High
Variance
Training model  Bias/Variance “games”
We can look at our model error as follows:
noise
model
error
Total
Error
Our error usually comes from combination of these two, These are all equivalent:
• High variance=modeling noise=not enough data=model too big=overfit
• High bias =model too simple=underfit
Basic network types
Fully connected networks
A very basic/generic network, Full nodes
connectivity
Used as a building block in more complex
topologies
High level task: Maps features into classes
Convolutional neural networks
On very simple images fully connected networks work pretty well with images
converted into vectors, but:
• Simple images (~10x10) works well, bigger images (~100x100) don’t:
• Too much data(parameters) is needed in order to train FC networks that way, not
practical. 100x100 image 10K pixel, 2 layer FC network 100M parameters.
Entering convolutional neural networks:
• Encodes special dependency, kind of Wight sharing
• Two main parts:
• Conv/Subsample acts as feature generators
• FC maps feature ensemble into classes
Recurrent neural networks
In general neural networks works well on bounded
areas, AKA the data collected to train.
In order to predict time series data (like stocks, ...) we
need time factor.
RNNs:
• Neurons as self connected
• Backpropegated through time.
• Each time stamp is now considered a laeyer.
• Issue: We need deep network  Many layers 
Vanishing gradient problem
Long Short Term memory networks
Solves the vanishing gradient problem, Long
memory by default
Contains gates that act as decision points
Usually LSTMs are proffered over RNN , more
compute is needed per timestamp but overall
accuracy is better.
Tools
Assembly C (compiler) C++(OOP) JAVA(managed)
Python (run
time)
Where we are in technology timeline perspective
Model
protos
High level
(e.g.
keras)
???? ???? ????
The programming language
Science Data science and deep learning are very close friends.
All are frontend languages with performant backend language (C++)
3 main languages:
My personal take … :
Python is the leading language:
• Free
• Won the deep learning community
• Most of the new tools / frameworks are python friendly.
• Production friendly
• Easy low level binding
Frameworks
Big frameworks supported by environment
Caffee
TensorFlow
MXNet
Keras
Torch
CNTK
Theano
Good comparision reference : https://github.com/zer0n/deepframeworks
Nnet
MXNet
Darch
deepnet
H2O
Neural networks toolbox
The big data/Cloud arena
All major cloud providers have ML services, deep learning model development
included.
Many other dedicated cloud services , some already acquired by tier 1 providers:
• Nervana
• Databricks
• Turi (GraphLab)
• H2O
• ..
The HW arena
Currently NVIDIA rules
Market top level segmentation:
• Training – building the
model, Data center
• Inference – Running the
model, also edge/client
In the short term intel is
positioned to take significant
inference market share (SW
moves only, existing x86 hw).
The (rough) deep learning compute math
• We have model capacity
• We have chip capacity
• Throughput = chip capacity/model capacity
But the story have few twists, It turns out that:
• Models can work well with low precision parameters
• A lot of sparse areas
• Memory plays significant role as well
New compute architectures wave is coming
Handle 16,8,4,2,1 bit
networks
Expect 100-300x
effective compute boost
Memory paths
adjustments
The race to the AI silicon has kicked off
Everybody is playing: Startups, Technology companies (Verticals), Corporations
Segments of the game:
• ASIC VS FPGA
• Edge VS cloud
• Inference VS training
• Network Generic VS network specific
• Models Arch/Eco-system
Deep learning @ Intel
the AI era – New A
group
Academia
Development
Training and programs
A lot of HW/SW activity, The public
ones 
Knights Mill
Intel FPGA SDK
eran.shlomo@intel.com,
eran@dataloop.ai

The deep learning tour - Q1 2017

  • 1.
    Eran Shlomo, IPPtech lead, Haifa eran.shlomo@intel.com, eran@dataloop.ai
  • 2.
    About me Haifa IoTIgnition lab and IPP(Intel ingenuity partnership program) tech lead. Intel Perceptual computing. Compute, cloud and embedded expert. Maker and Entrepreneur Focus on Data science and Machine learning in recent years Soon to work on dataloop.ai
  • 3.
    Agenda What is deeplearning Why now ? Different network topologies and their usage The tools race The processors (HW) race
  • 4.
    Buzzwords alignment attempt AI Machine learning Supervised learning Deep learning Machinereasoning Automated tasks Train based on data Neural networks input logic output input output logic
  • 5.
    Deep learning –basic anatomy Data driven Training a model Input, output and hidden neurons Input layer Hidden layer(s) Output layer Deep learning Many hidden (deep) layers
  • 6.
    The essence ofdeeplearning Xi YiWij(1) Wij(2) W11(1) X1 Y1 W11(2) 𝑌 = 𝑓 𝑋 = 𝑊𝑋+b Deep network is essentially a function we train to detect some pattern b (bias) is omitted in this drawing Data is becoming the fuel behind new SW, BK (Intel CEO) – “Data is the new Oil”
  • 7.
    Why now ? 28.2 25.8 16.4 11.7 6.7 3.572.99 2010 2011 2012 2013 2014 2015 2016 ILSVRC top 5 error ILSVRC top 5 error 8 Layers 22 152 Alexnet Shallow Ensemble Data
  • 8.
    Neural networks –Background and inspiration It is pretty common to compare neural networks to how our brain works: • Coupled well with the term AI • Has some sense in it, as many different researches show. Yet we are a bit long from really understanding how the brain works. 𝑘=0 𝑛 𝑊𝑋 W1 W2 W3 X1 X2 X3 𝑓(𝑥)
  • 9.
    Network topologies • Thereare many network topologies • The basic principles apply: • Supervised • hidden units • backpropagation training is common to most • Training on data generates model, later to be used to inference on unseen data: • Minimize a cost function
  • 10.
    Some basic intuition Modelhave capacity  Number of parameters. Generally HW (compute/mem) limits the capacity From the Paper: AN ANALYSIS OF DEEP NEURAL NETWORK MODELS FOR PRACTICAL APPLICATIONS More compute & Data Higher accuracy Bigger model
  • 11.
    Model fit scenarios 0 20 40 60 80 100 120 140 160 05 10 15 Good model 0 20 40 60 80 100 120 140 160 0 5 10 15 Underfit/High bias 0 20 40 60 80 100 120 140 160 0 5 10 15 Overfit/ High Variance
  • 12.
    Training model Bias/Variance “games” We can look at our model error as follows: noise model error Total Error Our error usually comes from combination of these two, These are all equivalent: • High variance=modeling noise=not enough data=model too big=overfit • High bias =model too simple=underfit
  • 13.
  • 14.
    Fully connected networks Avery basic/generic network, Full nodes connectivity Used as a building block in more complex topologies High level task: Maps features into classes
  • 15.
    Convolutional neural networks Onvery simple images fully connected networks work pretty well with images converted into vectors, but: • Simple images (~10x10) works well, bigger images (~100x100) don’t: • Too much data(parameters) is needed in order to train FC networks that way, not practical. 100x100 image 10K pixel, 2 layer FC network 100M parameters. Entering convolutional neural networks: • Encodes special dependency, kind of Wight sharing • Two main parts: • Conv/Subsample acts as feature generators • FC maps feature ensemble into classes
  • 16.
    Recurrent neural networks Ingeneral neural networks works well on bounded areas, AKA the data collected to train. In order to predict time series data (like stocks, ...) we need time factor. RNNs: • Neurons as self connected • Backpropegated through time. • Each time stamp is now considered a laeyer. • Issue: We need deep network  Many layers  Vanishing gradient problem
  • 17.
    Long Short Termmemory networks Solves the vanishing gradient problem, Long memory by default Contains gates that act as decision points Usually LSTMs are proffered over RNN , more compute is needed per timestamp but overall accuracy is better.
  • 18.
  • 19.
    Assembly C (compiler)C++(OOP) JAVA(managed) Python (run time) Where we are in technology timeline perspective Model protos High level (e.g. keras) ???? ???? ????
  • 20.
    The programming language ScienceData science and deep learning are very close friends. All are frontend languages with performant backend language (C++) 3 main languages: My personal take … : Python is the leading language: • Free • Won the deep learning community • Most of the new tools / frameworks are python friendly. • Production friendly • Easy low level binding
  • 21.
    Frameworks Big frameworks supportedby environment Caffee TensorFlow MXNet Keras Torch CNTK Theano Good comparision reference : https://github.com/zer0n/deepframeworks Nnet MXNet Darch deepnet H2O Neural networks toolbox
  • 22.
    The big data/Cloudarena All major cloud providers have ML services, deep learning model development included. Many other dedicated cloud services , some already acquired by tier 1 providers: • Nervana • Databricks • Turi (GraphLab) • H2O • ..
  • 23.
  • 24.
    Currently NVIDIA rules Markettop level segmentation: • Training – building the model, Data center • Inference – Running the model, also edge/client In the short term intel is positioned to take significant inference market share (SW moves only, existing x86 hw).
  • 25.
    The (rough) deeplearning compute math • We have model capacity • We have chip capacity • Throughput = chip capacity/model capacity But the story have few twists, It turns out that: • Models can work well with low precision parameters • A lot of sparse areas • Memory plays significant role as well
  • 26.
    New compute architectureswave is coming Handle 16,8,4,2,1 bit networks Expect 100-300x effective compute boost Memory paths adjustments
  • 27.
    The race tothe AI silicon has kicked off Everybody is playing: Startups, Technology companies (Verticals), Corporations Segments of the game: • ASIC VS FPGA • Edge VS cloud • Inference VS training • Network Generic VS network specific • Models Arch/Eco-system
  • 28.
  • 29.
    the AI era– New A group Academia Development Training and programs
  • 30.
    A lot ofHW/SW activity, The public ones  Knights Mill Intel FPGA SDK
  • 31.

Editor's Notes

  • #4 Smart Home Industry 4.0 Retail Autonomous cars Robotics Medical FinTech Cognitive computing 5G Wearables