Text
Deep Learning and Business Models
Tran Quoc Hoan
The University of Tokyo
VNITC@2015-09-13
2
Today agendas
Deep Learning boom
Essentials of Deep Learning
Deep Learning - the newest
Deep Learning and Business Models
3
What is Deep Learning?
Deep Learning is a new trend of machine learning that enables
machines to unravel high level abstraction in large amount of data
Image
recognition
Speech
recognition
Customer Centric
Management
Natural Language
Processing
Drug Discovery
& Toxicology
Deep Learning Boom
–Albert Einstein
“If we knew what it was we were doing, it would not
be called research, would it?”
Big News - ILSVRC 2012
ILSVRC 2012 (ImageNet
Large Scale Visual
Recognition)
SuperVision
ISI
OXFORD_VGG
XRCE/INRIA
UnivofAmsterdam
29.576%
27.058%26.979%26.172%
15.315%
Error rate
Image source: http://cs.stanford.edu/people/karpathy/cnnembed/
5
6
Big News - Google brain
Self-taught learning with
unlabelled youtube
videos and 16,000
computers
AI grandmother cell (2012)
(Cat)(Human)
7
Data and Machine Learning
Most learning
algorithms
New AI method
(Deep Learning)
Amount of data
Performance
Image source: http://cs229.stanford.edu/materials/CS229-DeepLearning.pdf
8
New boom for the Giants & Startups
Google brain
project (2012)
Facebook AI Research
Lab (2013 Dec.)
Project Adam (2014)
Enlitic
Ersatz Labs
MetaMind
Nervana Systems
Skymind
Deepmind
One of 10 breakthrough technologies 2013
(MIT Technology Review)
PFN (Japan)
9
Pioneers
Geoffrey Hinton

(Toronto, Google)
Yoshua Bengio

(Montreal)
Yann LeCun

(NewYork, Facebook)
Andrew Ng

(Stanford, Baidu)
Jeffrey Dean

(Google)
Le Viet Quoc

(Stanford, Google)
Theory +
algorithms
Implementation
10
Sub-summary
・More than 2000 papers in 2014 ~ 2015 with 47 Google services.
Essentials of Deep Learning
–Albert Einstein
“If we knew what it was we were doing, it would not
be called research, would it?”
12
The basic calculation
x1
x2
x3
+1
z
f
Active function
z = f(x1w1 + x2w2 + x3w3 + w4)
w1
w2
w3
w4
Input
Forward propagation
wi: weight parameters
13
Neural network (1/2)
x1
x2
x3
+1
Input layer
+1 +1
y
Hidden layers
Output layer
f
f
f
f
h1
h2
z1
z2
Put deeper
The deeper layer displays for comprehensive
combination of small parts
Prob(cat) = 99%
14
Neural network (2/2)
x1
x2
x3
+1
Input layer
+1 +1
y1
Hidden layers
• Loss function E measures
how output labels are similar
to true labels.

• For N input data
{w1ij}
{w2ij}
{w3ij}
Supervisor learning
y2
yk
Output
(Ex. probability of each in k class
in classification task)
Find parameters {w} to minimize E → How?
15
Backward propagation
x1
x2
x3
+1
Input layer
r s
y
Output layer
y*
E
w
∂E
∂y
∂E
∂y
∂y
∂s
∂E
∂y
∂y
∂s
∂s
∂w
∂E
∂w
=
Backward from loss to update parameters
(true label)
How E change
when move w
How E change when move s
How E change when
move y
16
Vanishing gradient problem
x1
x2
x3
+1
+1 +1
y
y*
E
…
Error vanish with back-propagation at low layer
・Neural network architecture reached the limit before
deep learning boom
17
Features representation
・Deep learning reduced the human’s time-consuming of
features selection process in classification task
Features selection
Parameter learning
Answer
Features selection
+
Parameter learning
Answer
Previous methods Deep Learning
Convolutional Neural Network
ConvNet diagram from Torch Tutorial
Sub sampling
Emphasis special
features
The most popular kind of deep learning model
18
How to train parameters
Trainable parameters
Values in each kernel of convolutional layer
Weigh values and bias in fully connection layer
Stochastic Gradient Descent (SGD) (others: AdaGrad, Adam,…)
Parameters update
W ← W - ß
∂E
∂W
Backward propagation
Learning rate
SGD - Mini batch
Batch learning: update all samples in each update (over-fitting)
Mini batch: update parameters with some samples
20
Training trick
Data normalization
Reduce learning rate after some iterations
Momentum SGD
21
22
Auto-encoder (1/2)
y
…
How to initialize these parameters
23
Auto-encoder (2/2)
The ability of
reconstruct input
makes a good
initialization for
parameters
x x’
h(x)
Reconstruct error
E = ||x - x’||
Encode Decode
24
Robust algorithms
Data augmentation: more
noisy data → more robust
model
Nodes in network don’t
need to active all (mimic
human brain) → 

Dropout concept
…
Deep Learning - The Newest
–Albert Einstein
“If we knew what it was we were doing, it would not
be called research, would it?”
26
In Services
Google
Translate App
Deep learning inside (work offline)
July 29, 2015 version
27
In Services
My implemented service http://yelpio.hongo.wide.ad.jp/
28
In Medical Imaging
Segmentation
Offline due to image’s copyright
In Automatic Devices
http://blogs.nvidia.com/blog/2015/02/24/deep-learning-drive/
Self-driving car
29
In Speech Interfaces
99% is much different with 95%
(Andrew Ng. - Baidu)
https://medium.com/s-c-a-l-e/how-baidu-mastered-mandarin-with-deep-
learning-and-lots-of-data-1d94032564a5 30
In Drug Discovery
Active
Active
Inactive
1
0
1
0
0
1
10010000110011
Chemical
compound
Assay data Finger print
+ Activity
Deep Neural Net
PubChem
Database
Prediction of Drug Activity
multiple targets
http://on-demand.gputechconf.com/gtc/2015/presentation/S5813-Nobuyuki-Ota.pdf 31
Applications in IoT(Internet of Things)
More difficult tasks in AI, robotics,
information processing
Huge amount of time series data
and states of sensor & devices data
RNN (recurrent neural network)
Difficult to get supervisor data
VAE (variational auto encoder)
Take action in conditions,
environments
Deep Reinforcement Learning
Image source:

http://on-demand.gputechconf.com/gtc/2015/presentation/S5813-
Nobuyuki-Ota.pdf
32
RNN (1/3)
Recurrent Neural Network: loop inside neural network
Represent for time series data, sequence of inputs
(speech model, natural language model,…)
.
.
.
.
.
.
.
.
.
x(t) y(t)
z(t) = f( z(t-1), x(t) )
33
RNN (2/3)
Time series extension of RNN
…
…
t = 1 t = 2 t = 3 t = T
Back propagation through time (BPTT)
+ Long Short-Term Memory
How to learn
parameters?
34
…
…
…
…
…
…
…
…
…
…
…
RNN (3/3)
Neural Turing Machine [A. Graves, 2014]
Neural network which has the capability of coupling the external memories
Applications: COPY, PRIORITY SORT
35
NN with
parameters to
coupling to
external
memories
36
VAE (variational auto encoder)
Learn a mapping from some latent variable z to a
complicated distribution on x
p(x) = ∫p(x,z)dz where p(x, z) = p(x|z)p(z)
p(z) = something simple and p(x|z) = f(z) = neural network
Encode
VAE
Example: generate face image from 29 hidden
variables
http://vdumoulin.github.io/morphing_faces/online_demo.html
37
VAE
Example: learning from number
and writing style
Experiment (PFN): given a
number x, produce outputs as
remained numbers with writing
style of x
VAE: useful in half-supervised
learning process (especially when
training data are not enough)
Image from ディープラーニングが活かすIoT, Preferred Networks, Inc. 2015/06/09 Interop 2015 seminar
39
Reinforcement Learning (RL)
Environment
(street, factory,…)
Agent

(car, robot,…)
Action
a(t)
Reward r(t)
State s(t)
Total rewards in the future
R = r(t) + µr(t+1) + µ2r(t+2) + …
(µ < 1)
Design next action
40
Deep Reinforcement Learning (RL)
Q - learning: need to know future expectation reward Q(s, a)
of action a at state s (Bellman update equation)

Q(s, a) <— Q(s, a) + ß ( r + µ maxa’ Q(s’, a’) - Q(s, a) )
Deep Q-Learning Network [V. Mnih, 2015]
Get Q(s, a) by a deep neural network Q(s, a, w)
Maximize: L(w) = E[ (r + µ max Q(s’, a’, w) - Q(s, a, w))2 ]
Useful when there are many states
41
PFN - demo https://www.youtube.com/watch?v=RH2TmreYkdA
Deep Learning and Business Models
–Albert Einstein
“If we knew what it was we were doing, it would not
be called research, would it?”
The future of the field
Better hardware and bigger
machine cluster
Better implementations and
optimizations
Understand video, text and
signals
Develop in business models
Deep learning is just
the tip of the iceberg
43
Model 1 - Framework development
Bridge the gap between
algorithms and
implementation
Provide common
interfaces and
understandable API for
users
Pylearn2
44
Model 2 - Building a deep-able hardware
New hardware (chipset)
architecture for Deep
Learning
Synopsys
http://www.bdti.com/InsideDSP/2015/04/21/Synopsys
Processors’ convolutional network capabilities
Qualcomm Zeroth Platform
Cognitive computing and custom CPU
drive Snapdragon 820 processors
https://www.qualcomm.com/news/snapdragon/2015/03/02/cognitive-
computing-and-custom-cpu-drive-next-gen-snapdragon-processors
NVIDIA DIGITS
DevBox
https://developer.nvidia.com/devbox
45
Model 3 - Deep Intelligences for IoT
Sensing
Action
Feedback
Analysis
Union
Image from ディープラーニングが活かすIoT, Preferred Networks, Inc. 2015/06/09 Interop 2015 seminar
Model 4 - Host API and deep learning service
Data
in
Model
out
https://www.metamind.io/vision/train
https://www.metamind.io/language/train
Usability + Tuning + Scale out
47
Model 5 - Personal Deep Learning
Data
in
Label
out
DeepFace
(Facebook)
http://www.adweek.com/socialtimes/deepface/433401
Facial verification & tagging
Applied previous solutions
Almost free 48
Summary
Deep Learning has the capability to find patterns
among data by enabling wide range of
abstraction.
Deep Learning shown significant results in voice
and image recognition compared with
conventional machine learning methods.
Deep Learning has potential applications and
business models in some important key sectors.
The challenges are your ideas !
49
“Thank you for listening.”
• Some good materials for learning
• Papers, code, blog, seminars, online courses
• For beginner: 深層学習 (機械学習プロフェッショナルシリーズ)
• Deep Learning - an MIT Press book in preparation
• http://www.iro.umontreal.ca/~bengioy/dlbook/
50

Deep Learning And Business Models (VNITC 2015-09-13)

  • 1.
    Text Deep Learning andBusiness Models Tran Quoc Hoan The University of Tokyo VNITC@2015-09-13
  • 2.
    2 Today agendas Deep Learningboom Essentials of Deep Learning Deep Learning - the newest Deep Learning and Business Models
  • 3.
    3 What is DeepLearning? Deep Learning is a new trend of machine learning that enables machines to unravel high level abstraction in large amount of data Image recognition Speech recognition Customer Centric Management Natural Language Processing Drug Discovery & Toxicology
  • 4.
    Deep Learning Boom –AlbertEinstein “If we knew what it was we were doing, it would not be called research, would it?”
  • 5.
    Big News -ILSVRC 2012 ILSVRC 2012 (ImageNet Large Scale Visual Recognition) SuperVision ISI OXFORD_VGG XRCE/INRIA UnivofAmsterdam 29.576% 27.058%26.979%26.172% 15.315% Error rate Image source: http://cs.stanford.edu/people/karpathy/cnnembed/ 5
  • 6.
    6 Big News -Google brain Self-taught learning with unlabelled youtube videos and 16,000 computers AI grandmother cell (2012) (Cat)(Human)
  • 7.
    7 Data and MachineLearning Most learning algorithms New AI method (Deep Learning) Amount of data Performance Image source: http://cs229.stanford.edu/materials/CS229-DeepLearning.pdf
  • 8.
    8 New boom forthe Giants & Startups Google brain project (2012) Facebook AI Research Lab (2013 Dec.) Project Adam (2014) Enlitic Ersatz Labs MetaMind Nervana Systems Skymind Deepmind One of 10 breakthrough technologies 2013 (MIT Technology Review) PFN (Japan)
  • 9.
    9 Pioneers Geoffrey Hinton
 (Toronto, Google) YoshuaBengio
 (Montreal) Yann LeCun
 (NewYork, Facebook) Andrew Ng
 (Stanford, Baidu) Jeffrey Dean
 (Google) Le Viet Quoc
 (Stanford, Google) Theory + algorithms Implementation
  • 10.
    10 Sub-summary ・More than 2000papers in 2014 ~ 2015 with 47 Google services.
  • 11.
    Essentials of DeepLearning –Albert Einstein “If we knew what it was we were doing, it would not be called research, would it?”
  • 12.
    12 The basic calculation x1 x2 x3 +1 z f Activefunction z = f(x1w1 + x2w2 + x3w3 + w4) w1 w2 w3 w4 Input Forward propagation wi: weight parameters
  • 13.
    13 Neural network (1/2) x1 x2 x3 +1 Inputlayer +1 +1 y Hidden layers Output layer f f f f h1 h2 z1 z2 Put deeper The deeper layer displays for comprehensive combination of small parts Prob(cat) = 99%
  • 14.
    14 Neural network (2/2) x1 x2 x3 +1 Inputlayer +1 +1 y1 Hidden layers • Loss function E measures how output labels are similar to true labels.
 • For N input data {w1ij} {w2ij} {w3ij} Supervisor learning y2 yk Output (Ex. probability of each in k class in classification task) Find parameters {w} to minimize E → How?
  • 15.
    15 Backward propagation x1 x2 x3 +1 Input layer rs y Output layer y* E w ∂E ∂y ∂E ∂y ∂y ∂s ∂E ∂y ∂y ∂s ∂s ∂w ∂E ∂w = Backward from loss to update parameters (true label) How E change when move w How E change when move s How E change when move y
  • 16.
    16 Vanishing gradient problem x1 x2 x3 +1 +1+1 y y* E … Error vanish with back-propagation at low layer ・Neural network architecture reached the limit before deep learning boom
  • 17.
    17 Features representation ・Deep learningreduced the human’s time-consuming of features selection process in classification task Features selection Parameter learning Answer Features selection + Parameter learning Answer Previous methods Deep Learning
  • 18.
    Convolutional Neural Network ConvNetdiagram from Torch Tutorial Sub sampling Emphasis special features The most popular kind of deep learning model 18
  • 19.
    How to trainparameters Trainable parameters Values in each kernel of convolutional layer Weigh values and bias in fully connection layer Stochastic Gradient Descent (SGD) (others: AdaGrad, Adam,…) Parameters update W ← W - ß ∂E ∂W Backward propagation Learning rate
  • 20.
    SGD - Minibatch Batch learning: update all samples in each update (over-fitting) Mini batch: update parameters with some samples 20
  • 21.
    Training trick Data normalization Reducelearning rate after some iterations Momentum SGD 21
  • 22.
    22 Auto-encoder (1/2) y … How toinitialize these parameters
  • 23.
    23 Auto-encoder (2/2) The abilityof reconstruct input makes a good initialization for parameters x x’ h(x) Reconstruct error E = ||x - x’|| Encode Decode
  • 24.
    24 Robust algorithms Data augmentation:more noisy data → more robust model Nodes in network don’t need to active all (mimic human brain) → 
 Dropout concept …
  • 25.
    Deep Learning -The Newest –Albert Einstein “If we knew what it was we were doing, it would not be called research, would it?”
  • 26.
    26 In Services Google Translate App Deeplearning inside (work offline) July 29, 2015 version
  • 27.
    27 In Services My implementedservice http://yelpio.hongo.wide.ad.jp/
  • 28.
  • 29.
  • 30.
    In Speech Interfaces 99%is much different with 95% (Andrew Ng. - Baidu) https://medium.com/s-c-a-l-e/how-baidu-mastered-mandarin-with-deep- learning-and-lots-of-data-1d94032564a5 30
  • 31.
    In Drug Discovery Active Active Inactive 1 0 1 0 0 1 10010000110011 Chemical compound Assaydata Finger print + Activity Deep Neural Net PubChem Database Prediction of Drug Activity multiple targets http://on-demand.gputechconf.com/gtc/2015/presentation/S5813-Nobuyuki-Ota.pdf 31
  • 32.
    Applications in IoT(Internetof Things) More difficult tasks in AI, robotics, information processing Huge amount of time series data and states of sensor & devices data RNN (recurrent neural network) Difficult to get supervisor data VAE (variational auto encoder) Take action in conditions, environments Deep Reinforcement Learning Image source:
 http://on-demand.gputechconf.com/gtc/2015/presentation/S5813- Nobuyuki-Ota.pdf 32
  • 33.
    RNN (1/3) Recurrent NeuralNetwork: loop inside neural network Represent for time series data, sequence of inputs (speech model, natural language model,…) . . . . . . . . . x(t) y(t) z(t) = f( z(t-1), x(t) ) 33
  • 34.
    RNN (2/3) Time seriesextension of RNN … … t = 1 t = 2 t = 3 t = T Back propagation through time (BPTT) + Long Short-Term Memory How to learn parameters? 34 … … … … … … … … … … …
  • 35.
    RNN (3/3) Neural TuringMachine [A. Graves, 2014] Neural network which has the capability of coupling the external memories Applications: COPY, PRIORITY SORT 35 NN with parameters to coupling to external memories
  • 36.
    36 VAE (variational autoencoder) Learn a mapping from some latent variable z to a complicated distribution on x p(x) = ∫p(x,z)dz where p(x, z) = p(x|z)p(z) p(z) = something simple and p(x|z) = f(z) = neural network Encode
  • 37.
    VAE Example: generate faceimage from 29 hidden variables http://vdumoulin.github.io/morphing_faces/online_demo.html 37
  • 38.
    VAE Example: learning fromnumber and writing style Experiment (PFN): given a number x, produce outputs as remained numbers with writing style of x VAE: useful in half-supervised learning process (especially when training data are not enough) Image from ディープラーニングが活かすIoT, Preferred Networks, Inc. 2015/06/09 Interop 2015 seminar
  • 39.
    39 Reinforcement Learning (RL) Environment (street,factory,…) Agent
 (car, robot,…) Action a(t) Reward r(t) State s(t) Total rewards in the future R = r(t) + µr(t+1) + µ2r(t+2) + … (µ < 1) Design next action
  • 40.
    40 Deep Reinforcement Learning(RL) Q - learning: need to know future expectation reward Q(s, a) of action a at state s (Bellman update equation)
 Q(s, a) <— Q(s, a) + ß ( r + µ maxa’ Q(s’, a’) - Q(s, a) ) Deep Q-Learning Network [V. Mnih, 2015] Get Q(s, a) by a deep neural network Q(s, a, w) Maximize: L(w) = E[ (r + µ max Q(s’, a’, w) - Q(s, a, w))2 ] Useful when there are many states
  • 41.
    41 PFN - demohttps://www.youtube.com/watch?v=RH2TmreYkdA
  • 42.
    Deep Learning andBusiness Models –Albert Einstein “If we knew what it was we were doing, it would not be called research, would it?”
  • 43.
    The future ofthe field Better hardware and bigger machine cluster Better implementations and optimizations Understand video, text and signals Develop in business models Deep learning is just the tip of the iceberg 43
  • 44.
    Model 1 -Framework development Bridge the gap between algorithms and implementation Provide common interfaces and understandable API for users Pylearn2 44
  • 45.
    Model 2 -Building a deep-able hardware New hardware (chipset) architecture for Deep Learning Synopsys http://www.bdti.com/InsideDSP/2015/04/21/Synopsys Processors’ convolutional network capabilities Qualcomm Zeroth Platform Cognitive computing and custom CPU drive Snapdragon 820 processors https://www.qualcomm.com/news/snapdragon/2015/03/02/cognitive- computing-and-custom-cpu-drive-next-gen-snapdragon-processors NVIDIA DIGITS DevBox https://developer.nvidia.com/devbox 45
  • 46.
    Model 3 -Deep Intelligences for IoT Sensing Action Feedback Analysis Union Image from ディープラーニングが活かすIoT, Preferred Networks, Inc. 2015/06/09 Interop 2015 seminar
  • 47.
    Model 4 -Host API and deep learning service Data in Model out https://www.metamind.io/vision/train https://www.metamind.io/language/train Usability + Tuning + Scale out 47
  • 48.
    Model 5 -Personal Deep Learning Data in Label out DeepFace (Facebook) http://www.adweek.com/socialtimes/deepface/433401 Facial verification & tagging Applied previous solutions Almost free 48
  • 49.
    Summary Deep Learning hasthe capability to find patterns among data by enabling wide range of abstraction. Deep Learning shown significant results in voice and image recognition compared with conventional machine learning methods. Deep Learning has potential applications and business models in some important key sectors. The challenges are your ideas ! 49
  • 50.
    “Thank you forlistening.” • Some good materials for learning • Papers, code, blog, seminars, online courses • For beginner: 深層学習 (機械学習プロフェッショナルシリーズ) • Deep Learning - an MIT Press book in preparation • http://www.iro.umontreal.ca/~bengioy/dlbook/ 50