Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Text
Deep Learning and Business Models
Tran Quoc Hoan
The University of Tokyo
VNITC@2015-09-13
2
Today agendas
Deep Learning boom
Essentials of Deep Learning
Deep Learning - the newest
Deep Learning and Business Models
3
What is Deep Learning?
Deep Learning is a new trend of machine learning that enables
machines to unravel high level abst...
Deep Learning Boom
–Albert Einstein
“If we knew what it was we were doing, it would not
be called research, would it?”
Big News - ILSVRC 2012
ILSVRC 2012 (ImageNet
Large Scale Visual
Recognition)
SuperVision
ISI
OXFORD_VGG
XRCE/INRIA
UnivofA...
6
Big News - Google brain
Self-taught learning with
unlabelled youtube
videos and 16,000
computers
AI grandmother cell (20...
7
Data and Machine Learning
Most learning
algorithms
New AI method
(Deep Learning)
Amount of data
Performance
Image source...
8
New boom for the Giants & Startups
Google brain
project (2012)
Facebook AI Research
Lab (2013 Dec.)
Project Adam (2014)
...
9
Pioneers
Geoffrey Hinton

(Toronto, Google)
Yoshua Bengio

(Montreal)
Yann LeCun

(NewYork, Facebook)
Andrew Ng

(Stanfo...
10
Sub-summary
・More than 2000 papers in 2014 ~ 2015 with 47 Google services.
Essentials of Deep Learning
–Albert Einstein
“If we knew what it was we were doing, it would not
be called research, would...
12
The basic calculation
x1
x2
x3
+1
z
f
Active function
z = f(x1w1 + x2w2 + x3w3 + w4)
w1
w2
w3
w4
Input
Forward propagat...
13
Neural network (1/2)
x1
x2
x3
+1
Input layer
+1 +1
y
Hidden layers
Output layer
f
f
f
f
h1
h2
z1
z2
Put deeper
The deep...
14
Neural network (2/2)
x1
x2
x3
+1
Input layer
+1 +1
y1
Hidden layers
• Loss function E measures
how output labels are si...
15
Backward propagation
x1
x2
x3
+1
Input layer
r s
y
Output layer
y*
E
w
∂E
∂y
∂E
∂y
∂y
∂s
∂E
∂y
∂y
∂s
∂s
∂w
∂E
∂w
=
Back...
16
Vanishing gradient problem
x1
x2
x3
+1
+1 +1
y
y*
E
…
Error vanish with back-propagation at low layer
・Neural network a...
17
Features representation
・Deep learning reduced the human’s time-consuming of
features selection process in classificatio...
Convolutional Neural Network
ConvNet diagram from Torch Tutorial
Sub sampling
Emphasis special
features
The most popular k...
How to train parameters
Trainable parameters
Values in each kernel of convolutional layer
Weigh values and bias in fully c...
SGD - Mini batch
Batch learning: update all samples in each update (over-fitting)
Mini batch: update parameters with some s...
Training trick
Data normalization
Reduce learning rate after some iterations
Momentum SGD
21
22
Auto-encoder (1/2)
y
…
How to initialize these parameters
23
Auto-encoder (2/2)
The ability of
reconstruct input
makes a good
initialization for
parameters
x x’
h(x)
Reconstruct er...
24
Robust algorithms
Data augmentation: more
noisy data → more robust
model
Nodes in network don’t
need to active all (mim...
Deep Learning - The Newest
–Albert Einstein
“If we knew what it was we were doing, it would not
be called research, would ...
26
In Services
Google
Translate App
Deep learning inside (work offline)
July 29, 2015 version
27
In Services
My implemented service http://yelpio.hongo.wide.ad.jp/
28
In Medical Imaging
Segmentation
Offline due to image’s copyright
In Automatic Devices
http://blogs.nvidia.com/blog/2015/02/24/deep-learning-drive/
Self-driving car
29
In Speech Interfaces
99% is much different with 95%
(Andrew Ng. - Baidu)
https://medium.com/s-c-a-l-e/how-baidu-mastered-ma...
In Drug Discovery
Active
Active
Inactive
1
0
1
0
0
1
10010000110011
Chemical
compound
Assay data Finger print
+ Activity
D...
Applications in IoT(Internet of Things)
More difficult tasks in AI, robotics,
information processing
Huge amount of time se...
RNN (1/3)
Recurrent Neural Network: loop inside neural network
Represent for time series data, sequence of inputs
(speech ...
RNN (2/3)
Time series extension of RNN
…
…
t = 1 t = 2 t = 3 t = T
Back propagation through time (BPTT)
+ Long Short-Term ...
RNN (3/3)
Neural Turing Machine [A. Graves, 2014]
Neural network which has the capability of coupling the external memorie...
36
VAE (variational auto encoder)
Learn a mapping from some latent variable z to a
complicated distribution on x
p(x) = ∫p...
VAE
Example: generate face image from 29 hidden
variables
http://vdumoulin.github.io/morphing_faces/online_demo.html
37
VAE
Example: learning from number
and writing style
Experiment (PFN): given a
number x, produce outputs as
remained number...
39
Reinforcement Learning (RL)
Environment
(street, factory,…)
Agent

(car, robot,…)
Action
a(t)
Reward r(t)
State s(t)
To...
40
Deep Reinforcement Learning (RL)
Q - learning: need to know future expectation reward Q(s, a)
of action a at state s (B...
41
PFN - demo https://www.youtube.com/watch?v=RH2TmreYkdA
Deep Learning and Business Models
–Albert Einstein
“If we knew what it was we were doing, it would not
be called research,...
The future of the field
Better hardware and bigger
machine cluster
Better implementations and
optimizations
Understand vide...
Model 1 - Framework development
Bridge the gap between
algorithms and
implementation
Provide common
interfaces and
underst...
Model 2 - Building a deep-able hardware
New hardware (chipset)
architecture for Deep
Learning
Synopsys
http://www.bdti.com...
Model 3 - Deep Intelligences for IoT
Sensing
Action
Feedback
Analysis
Union
Image from ディープラーニングが活かすIoT, Preferred Network...
Model 4 - Host API and deep learning service
Data
in
Model
out
https://www.metamind.io/vision/train
https://www.metamind.i...
Model 5 - Personal Deep Learning
Data
in
Label
out
DeepFace
(Facebook)
http://www.adweek.com/socialtimes/deepface/433401
F...
Summary
Deep Learning has the capability to find patterns
among data by enabling wide range of
abstraction.
Deep Learning s...
“Thank you for listening.”
• Some good materials for learning
• Papers, code, blog, seminars, online courses
• For beginne...
Upcoming SlideShare
Loading in …5
×

Deep Learning And Business Models (VNITC 2015-09-13)

3,831 views

Published on

Deep Learning essential concepts and business models in the future.

Published in: Education

Deep Learning And Business Models (VNITC 2015-09-13)

  1. 1. Text Deep Learning and Business Models Tran Quoc Hoan The University of Tokyo VNITC@2015-09-13
  2. 2. 2 Today agendas Deep Learning boom Essentials of Deep Learning Deep Learning - the newest Deep Learning and Business Models
  3. 3. 3 What is Deep Learning? Deep Learning is a new trend of machine learning that enables machines to unravel high level abstraction in large amount of data Image recognition Speech recognition Customer Centric Management Natural Language Processing Drug Discovery & Toxicology
  4. 4. Deep Learning Boom –Albert Einstein “If we knew what it was we were doing, it would not be called research, would it?”
  5. 5. Big News - ILSVRC 2012 ILSVRC 2012 (ImageNet Large Scale Visual Recognition) SuperVision ISI OXFORD_VGG XRCE/INRIA UnivofAmsterdam 29.576% 27.058%26.979%26.172% 15.315% Error rate Image source: http://cs.stanford.edu/people/karpathy/cnnembed/ 5
  6. 6. 6 Big News - Google brain Self-taught learning with unlabelled youtube videos and 16,000 computers AI grandmother cell (2012) (Cat)(Human)
  7. 7. 7 Data and Machine Learning Most learning algorithms New AI method (Deep Learning) Amount of data Performance Image source: http://cs229.stanford.edu/materials/CS229-DeepLearning.pdf
  8. 8. 8 New boom for the Giants & Startups Google brain project (2012) Facebook AI Research Lab (2013 Dec.) Project Adam (2014) Enlitic Ersatz Labs MetaMind Nervana Systems Skymind Deepmind One of 10 breakthrough technologies 2013 (MIT Technology Review) PFN (Japan)
  9. 9. 9 Pioneers Geoffrey Hinton
 (Toronto, Google) Yoshua Bengio
 (Montreal) Yann LeCun
 (NewYork, Facebook) Andrew Ng
 (Stanford, Baidu) Jeffrey Dean
 (Google) Le Viet Quoc
 (Stanford, Google) Theory + algorithms Implementation
  10. 10. 10 Sub-summary ・More than 2000 papers in 2014 ~ 2015 with 47 Google services.
  11. 11. Essentials of Deep Learning –Albert Einstein “If we knew what it was we were doing, it would not be called research, would it?”
  12. 12. 12 The basic calculation x1 x2 x3 +1 z f Active function z = f(x1w1 + x2w2 + x3w3 + w4) w1 w2 w3 w4 Input Forward propagation wi: weight parameters
  13. 13. 13 Neural network (1/2) x1 x2 x3 +1 Input layer +1 +1 y Hidden layers Output layer f f f f h1 h2 z1 z2 Put deeper The deeper layer displays for comprehensive combination of small parts Prob(cat) = 99%
  14. 14. 14 Neural network (2/2) x1 x2 x3 +1 Input layer +1 +1 y1 Hidden layers • Loss function E measures how output labels are similar to true labels.
 • For N input data {w1ij} {w2ij} {w3ij} Supervisor learning y2 yk Output (Ex. probability of each in k class in classification task) Find parameters {w} to minimize E → How?
  15. 15. 15 Backward propagation x1 x2 x3 +1 Input layer r s y Output layer y* E w ∂E ∂y ∂E ∂y ∂y ∂s ∂E ∂y ∂y ∂s ∂s ∂w ∂E ∂w = Backward from loss to update parameters (true label) How E change when move w How E change when move s How E change when move y
  16. 16. 16 Vanishing gradient problem x1 x2 x3 +1 +1 +1 y y* E … Error vanish with back-propagation at low layer ・Neural network architecture reached the limit before deep learning boom
  17. 17. 17 Features representation ・Deep learning reduced the human’s time-consuming of features selection process in classification task Features selection Parameter learning Answer Features selection + Parameter learning Answer Previous methods Deep Learning
  18. 18. Convolutional Neural Network ConvNet diagram from Torch Tutorial Sub sampling Emphasis special features The most popular kind of deep learning model 18
  19. 19. How to train parameters Trainable parameters Values in each kernel of convolutional layer Weigh values and bias in fully connection layer Stochastic Gradient Descent (SGD) (others: AdaGrad, Adam,…) Parameters update W ← W - ß ∂E ∂W Backward propagation Learning rate
  20. 20. SGD - Mini batch Batch learning: update all samples in each update (over-fitting) Mini batch: update parameters with some samples 20
  21. 21. Training trick Data normalization Reduce learning rate after some iterations Momentum SGD 21
  22. 22. 22 Auto-encoder (1/2) y … How to initialize these parameters
  23. 23. 23 Auto-encoder (2/2) The ability of reconstruct input makes a good initialization for parameters x x’ h(x) Reconstruct error E = ||x - x’|| Encode Decode
  24. 24. 24 Robust algorithms Data augmentation: more noisy data → more robust model Nodes in network don’t need to active all (mimic human brain) → 
 Dropout concept …
  25. 25. Deep Learning - The Newest –Albert Einstein “If we knew what it was we were doing, it would not be called research, would it?”
  26. 26. 26 In Services Google Translate App Deep learning inside (work offline) July 29, 2015 version
  27. 27. 27 In Services My implemented service http://yelpio.hongo.wide.ad.jp/
  28. 28. 28 In Medical Imaging Segmentation Offline due to image’s copyright
  29. 29. In Automatic Devices http://blogs.nvidia.com/blog/2015/02/24/deep-learning-drive/ Self-driving car 29
  30. 30. In Speech Interfaces 99% is much different with 95% (Andrew Ng. - Baidu) https://medium.com/s-c-a-l-e/how-baidu-mastered-mandarin-with-deep- learning-and-lots-of-data-1d94032564a5 30
  31. 31. In Drug Discovery Active Active Inactive 1 0 1 0 0 1 10010000110011 Chemical compound Assay data Finger print + Activity Deep Neural Net PubChem Database Prediction of Drug Activity multiple targets http://on-demand.gputechconf.com/gtc/2015/presentation/S5813-Nobuyuki-Ota.pdf 31
  32. 32. Applications in IoT(Internet of Things) More difficult tasks in AI, robotics, information processing Huge amount of time series data and states of sensor & devices data RNN (recurrent neural network) Difficult to get supervisor data VAE (variational auto encoder) Take action in conditions, environments Deep Reinforcement Learning Image source:
 http://on-demand.gputechconf.com/gtc/2015/presentation/S5813- Nobuyuki-Ota.pdf 32
  33. 33. RNN (1/3) Recurrent Neural Network: loop inside neural network Represent for time series data, sequence of inputs (speech model, natural language model,…) . . . . . . . . . x(t) y(t) z(t) = f( z(t-1), x(t) ) 33
  34. 34. RNN (2/3) Time series extension of RNN … … t = 1 t = 2 t = 3 t = T Back propagation through time (BPTT) + Long Short-Term Memory How to learn parameters? 34 … … … … … … … … … … …
  35. 35. RNN (3/3) Neural Turing Machine [A. Graves, 2014] Neural network which has the capability of coupling the external memories Applications: COPY, PRIORITY SORT 35 NN with parameters to coupling to external memories
  36. 36. 36 VAE (variational auto encoder) Learn a mapping from some latent variable z to a complicated distribution on x p(x) = ∫p(x,z)dz where p(x, z) = p(x|z)p(z) p(z) = something simple and p(x|z) = f(z) = neural network Encode
  37. 37. VAE Example: generate face image from 29 hidden variables http://vdumoulin.github.io/morphing_faces/online_demo.html 37
  38. 38. VAE Example: learning from number and writing style Experiment (PFN): given a number x, produce outputs as remained numbers with writing style of x VAE: useful in half-supervised learning process (especially when training data are not enough) Image from ディープラーニングが活かすIoT, Preferred Networks, Inc. 2015/06/09 Interop 2015 seminar
  39. 39. 39 Reinforcement Learning (RL) Environment (street, factory,…) Agent
 (car, robot,…) Action a(t) Reward r(t) State s(t) Total rewards in the future R = r(t) + µr(t+1) + µ2r(t+2) + … (µ < 1) Design next action
  40. 40. 40 Deep Reinforcement Learning (RL) Q - learning: need to know future expectation reward Q(s, a) of action a at state s (Bellman update equation)
 Q(s, a) <— Q(s, a) + ß ( r + µ maxa’ Q(s’, a’) - Q(s, a) ) Deep Q-Learning Network [V. Mnih, 2015] Get Q(s, a) by a deep neural network Q(s, a, w) Maximize: L(w) = E[ (r + µ max Q(s’, a’, w) - Q(s, a, w))2 ] Useful when there are many states
  41. 41. 41 PFN - demo https://www.youtube.com/watch?v=RH2TmreYkdA
  42. 42. Deep Learning and Business Models –Albert Einstein “If we knew what it was we were doing, it would not be called research, would it?”
  43. 43. The future of the field Better hardware and bigger machine cluster Better implementations and optimizations Understand video, text and signals Develop in business models Deep learning is just the tip of the iceberg 43
  44. 44. Model 1 - Framework development Bridge the gap between algorithms and implementation Provide common interfaces and understandable API for users Pylearn2 44
  45. 45. Model 2 - Building a deep-able hardware New hardware (chipset) architecture for Deep Learning Synopsys http://www.bdti.com/InsideDSP/2015/04/21/Synopsys Processors’ convolutional network capabilities Qualcomm Zeroth Platform Cognitive computing and custom CPU drive Snapdragon 820 processors https://www.qualcomm.com/news/snapdragon/2015/03/02/cognitive- computing-and-custom-cpu-drive-next-gen-snapdragon-processors NVIDIA DIGITS DevBox https://developer.nvidia.com/devbox 45
  46. 46. Model 3 - Deep Intelligences for IoT Sensing Action Feedback Analysis Union Image from ディープラーニングが活かすIoT, Preferred Networks, Inc. 2015/06/09 Interop 2015 seminar
  47. 47. Model 4 - Host API and deep learning service Data in Model out https://www.metamind.io/vision/train https://www.metamind.io/language/train Usability + Tuning + Scale out 47
  48. 48. Model 5 - Personal Deep Learning Data in Label out DeepFace (Facebook) http://www.adweek.com/socialtimes/deepface/433401 Facial verification & tagging Applied previous solutions Almost free 48
  49. 49. Summary Deep Learning has the capability to find patterns among data by enabling wide range of abstraction. Deep Learning shown significant results in voice and image recognition compared with conventional machine learning methods. Deep Learning has potential applications and business models in some important key sectors. The challenges are your ideas ! 49
  50. 50. “Thank you for listening.” • Some good materials for learning • Papers, code, blog, seminars, online courses • For beginner: 深層学習 (機械学習プロフェッショナルシリーズ) • Deep Learning - an MIT Press book in preparation • http://www.iro.umontreal.ca/~bengioy/dlbook/ 50

×