SlideShare a Scribd company logo
1 of 36
Download to read offline
[N] – SHOT Learning
In deep learning models
davinnovation@gmail.com
Learning process
Human Deep Learning
What they need…
single one picture book 60,000 train data (MNIST)
In Traditional Deep Learning…
http://pinktentacle.com/tag/waseda-university/
One Shot Learning
Give Inference ability to Learning model
One Shot Learning
Train One(Few) Data, Works Awesome
One Shot Learning – Many approaches
Transfer Learning
Domain Adaptation
Imitation Learning
…..
Started from Meta Learning
Meta means something that is "self-referencing".
‘Knowing about knowing’
Today…
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Turing Machine
Basic Structure
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
N x M memory
Block
𝑀𝑡 : t is time
N
M
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Write Heads
Controller
External Input External Output
Read Heads
σ𝑖 𝑤𝑡(𝑖) = 1 (0 ≤ 𝑤𝑡(𝑖) ≤ 1)0.9 0.1 0 ...
Weight vector : 𝑤𝑡
1 2
2 1
1 3
1.1
1.9
1.2
𝑟𝑡 ← ෍
𝑖
𝑤𝑡(𝑖)𝑀𝑡(𝑖)
Read vector : 𝑟𝑡
i = 0 i = 1 …
i = 0 i = 1 …
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Write Heads
σ𝑖 𝑊𝑡(𝑖) = 1 (0 ≤ 𝑊𝑡(𝑖) ≤ 1)
0.9 0.1 0 ...
Weight vector : 𝑊𝑡0.1 1.8
2 1
0.1 2.7
erase vector : 𝑒𝑡 (0 < 𝑒𝑡 < 1)
𝑀′
𝑡(𝑖) ← 𝑀𝑡−1(𝑖)[1 − 𝑤𝑡 𝑖 𝑒𝑡]
~1
~0
~1
In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 )
`
Erase -> Add
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Write Heads
σ𝑖 𝑊𝑡(𝑖) = 1 (0 ≤ 𝑊𝑡(𝑖) ≤ 1)
0.9 0.1 0 ...
Weight vector : 𝑊𝑡
1 1.9
2.9 1.1
0.1 2.7
add vector : 𝑎 𝑡 (0 < 𝑎 𝑡 < 1)
𝑀𝑡 𝑖 ← 𝑀′
𝑡 𝑖 + 𝑤𝑡 𝑖 𝑎 𝑡
~1
~1
~0
In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 )
Erase -> Add
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Content-based addressing >
- Based on Similarity of ‘Current value’ and ‘predicted by controller value’
location-based addressing >
Like 𝑓 𝑥, 𝑦 = 𝑥 x 𝑦 : variable 𝑥 , 𝑦 store them in different addresses, retrieve
them and perform a multiplication algorithm
=>
- Based on addressed by location
Use both mechanisms concurrently
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate (0, 1)
𝑠𝑡 : shift weighting ( only integer )
γ 𝑡 : sharping weighting
Content-based addressing
Location-based addressing
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate
𝑠𝑡 : shift weighting
γ 𝑡 : sharping weighting
Content
Addressing
𝑤𝑡
𝑐
𝑘 𝑡
β 𝑡
𝑀𝑡
𝑤𝑡
𝑐
𝑖 ←
exp(β 𝑡 𝐾 𝑘 𝑡,𝑀𝑡 𝑖 )
σ 𝑗 exp(β 𝑡 𝐾 𝑘 𝑡,𝑀𝑡 𝑗 )
𝐾 u, v =
𝑢 ∙ 𝑣
| 𝑢 | ∙ | 𝑣 |
∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦
1 2
2 1
1 3
3
2
1
𝑘 𝑡 𝑀𝑡
β 𝑡 = 0 0.5
0.5
.61
.39
β 𝑡 = 5
.98
.02
β 𝑡 = 50
𝑤𝑡
𝑐
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate
𝑠𝑡 : shift weighting
γ 𝑡 : sharping weighting
Interpolation𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡−1
𝑔𝑡
𝑤𝑡
𝑔
← 𝑔𝑡 𝑤𝑡
𝑐
+ (1 − 𝑔𝑡) 𝑤𝑡−1
.61
.39
𝑤𝑡
𝑐
.01
.90
𝑤𝑡−1
𝑔𝑡 = 0 .01
.90
𝑔𝑡 = 0.5 .36
.64
𝑤𝑡
𝑔
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate
𝑠𝑡 : shift weighting
γ 𝑡 : sharping weighting
Convolutional
Shift
𝑤𝑡
𝑔
𝑤′ 𝑡
𝑠𝑡
𝑤′
𝑡(𝑖) ← σ 𝑗=0
𝑁−1
𝑤𝑡
𝑔
𝑗 𝑠𝑡(𝑖 − 𝑗) // circular convolution
.01
.90
𝑤𝑡
𝑔
0
1
𝑠𝑡
0.9
𝑤′
𝑡(0)
.01
.90
𝑤𝑡
𝑔
0
1
𝑠𝑡
.01
𝑤′
𝑡(1)
.01
.90
𝑤′
𝑡
0
1
𝑠𝑡
1
0
𝑠𝑡
0.5
0.5
𝑠𝑡
Right shift Left shift Spread shift
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate (0, 1)
𝑠𝑡 : shift weighting ( only integer )
γ 𝑡 : sharping weighting
Sharpening 𝑤𝑡
𝑤′ 𝑡
γ 𝑡
𝑤𝑡 𝑖 ←
𝑤′
𝑡(𝑖)γ 𝑡
σ 𝑗 𝑤′
𝑡(𝑗)γ 𝑡
.01
.90
𝑤′
𝑡
γ 𝑡 = 0 0.5
0.5
.01
.90
.0001
.9998
γ 𝑡 = 1
γ 𝑡 = 2
Neural Turing Machine
https://arxiv.org/abs/1410.5401
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Neural Network
(RNN+LSTM)
Controller
Experiment – Copy train time
Experiment – Copy result
NTM
LSTM
Input : Random binary vector ( 8bit )
NTM > LSTM
Quick & Clear
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
NTM is ‘Meta learning’
=>
Use Memory-Augment, make rapid learn model.
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
Memory
Read Heads Write Heads
Controller
External Input External OutputAddressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Content-based addressing >
- Based on Similarity of ‘Current value’ and ‘predicted by controller value’
location-based addressing >
Like 𝑓 𝑥, 𝑦 = 𝑥 x 𝑦 : variable 𝑥 , 𝑦 store them in different addresses, retrieve
them and perform a multiplication algorithm
=>
- Based on addressed by location
Use both mechanisms concurrently
=> Doesn’t use location-based addressing
=> Only Use Content-based addressing
Memory-Augmented Neural Networks (MANN)
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate (0, 1)
𝑠𝑡 : shift weighting ( only integer )
γ 𝑡 : sharping weighting
Content-based addressing
Location-based addressing
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
MANN
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑟
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate (0, 1)
𝑠𝑡 : shift weighting ( only integer )
γ 𝑡 : sharping weighting
Content-based addressing
Location-based addressing
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
MANN
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Addressing Mechanism
How to calculate ‘Weight Vector’ 𝑊𝑡
Use both mechanisms concurrently
𝑤𝑡−1
𝑀𝑡
𝑘 𝑡
β 𝑡
𝑔𝑡
𝑠𝑡
γ 𝑡
Content
Addressing
Interpolation
Convolutional
Shift
Sharpening
Previous State
Controller Outputs
𝑤𝑡
𝑐
𝑤𝑡
𝑔
𝑤𝑡
𝑤′ 𝑡
𝑘 𝑡 : key vector
β 𝑡 : key strength
𝑔𝑡 : interpolation gate
𝑠𝑡 : shift weighting
γ 𝑡 : sharping weighting
Content
Addressing
𝑤𝑡
𝑟
𝑘 𝑡
𝑀𝑡
𝑤𝑡
𝑟
𝑖 ←
exp(𝐾 𝑘 𝑡,𝑀𝑡 𝑖 )
σ 𝑗 exp(𝐾 𝑘 𝑡,𝑀𝑡 𝑗 )
𝐾 u, v =
𝑢 ∙ 𝑣
| 𝑢 | ∙ | 𝑣 |
∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦
1 2
2 1
1 3
3
2
1
𝑘 𝑡 𝑀𝑡
.53
.47
𝑤𝑡
𝑟
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
MANN
Memory
Read Heads Write Heads
Controller
External Input External Output
Neural Network
(RNN)
Write Heads
0.9 0.1 0 ...
0.1 1.8
2 1
0.1 2.7
𝑤𝑡
𝑢
← γ 𝑤𝑡−1
𝑢
+ 𝑤𝑡
𝑟
+ 𝑤𝑡
𝑤
~1
~0
~1
In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 )
`
Memory-Augmented
Neural Networks https://arxiv.org/pdf/1605.06065.pdf
Least Recently Used Access ( LRUA )
Compute from MANN
𝑤𝑡
𝑤
← σ α 𝑤𝑡−1
𝑟
+ (1 − σ α ) 𝑤𝑡−1
𝑙𝑢
Decay Parameter
𝑤𝑡
𝑙𝑢
(𝑖) = ቊ
0, 𝑖𝑓 𝑤𝑡
𝑢
𝑖 𝑖𝑠 𝑏𝑖𝑔 𝑒𝑛𝑜𝑢𝑔ℎ
1, 𝑒𝑙𝑠𝑒
𝑀𝑡 𝑖 ← 𝑀𝑡−1 𝑖 + 𝑤𝑡
𝑤
𝑖 𝑘 𝑡
Least Used Memory
Sigmoid Function
Scalar gate parameter
𝑤𝑡
𝑤
𝑘 𝑡 : write vector
Experiment – Data
One Episode :
Input : (𝑥0, 𝑛𝑢𝑙𝑙), 𝑥1, 𝑦0 , 𝑥2, 𝑦1 , … (𝑥 𝑇, 𝑦 𝑇−1)
Output : (𝑦′0), 𝑦′1 , 𝑦′2 , … (𝑦′ 𝑇)
𝑝 𝑦𝑡 𝑥𝑡; 𝐷1:𝑡−1; 𝜃)
(𝑥0, 𝑛𝑢𝑙𝑙 )
(𝑦′0)
RNN
𝑥1, 𝑦0
(𝑦′1)
Experiment – Data
Omniglot Dataset : 1600 > classes
 1200 class train, 423 class test ( downscale to 20x20 )
+ plus rotate augmentation
Experiment – Classification Result
Trained with one-hot vector representations
With Five randomly chosen labels, train 100,000 episode ( each episode are ‘new class’ )
Instance
: class emerge count…?
Active One-Shot https://cs.stanford.edu/~woodward/papers/active_one_shot_learning_2016.pdf
MANN + ‘decision predict or pass’
Just Like Quiz Show!
Active One-Shot
https://cs.stanford.edu/~woodward/papers/active_one_shot_learning_2016.pdf
RNN
𝑦𝑡, 𝑥 𝑡+1 or (0, 𝑥 𝑡+1)
[0,1] || (𝑦′ 𝑡, 0)
𝑦𝑡, 𝑥 𝑡+1
[0,1]
𝑟𝑡 = −0.05
𝑦𝑡, 𝑥 𝑡+1
(𝑦′ 𝑡, 0)
𝑟𝑡 = ቊ
+1, 𝑖𝑓 𝑦′ 𝑡 = 𝑦𝑡
−1
Results https://cs.stanford.edu/~woodward/papers/active_one_shot_learning_2016.pdf
Want some more?
http://proceedings.mlr.press/v37/romera-
paredes15.pdf
https://arxiv.org/pdf/1606.04080.pdf
Reference
• https://tensorflow.blog/tag/one-shot-learning/
• http://www.modulabs.co.kr/DeepLAB_library/11115
• https://www.youtube.com/watch?v=CzQSQ_0Z-QU
• https://www.slideshare.net/JisungDavidKim/oneshot-
learning
• https://norman3.github.io/papers/docs/neural_turing_
machine.html
• https://www.slideshare.net/webaquebec/guillaume-
chevalier-deep-learning-avec-tensor-flow
• https://www.slideshare.net/ssuser06e0c5/metalearnin
g-with-memory-augmented-neural-networks

More Related Content

Similar to One shot learning - deep learning ( meta learn )

Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networksParveenMalik18
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksarjitkantgupta
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMayuraD1
 
NIPS2017 NMT-VNN
NIPS2017 NMT-VNNNIPS2017 NMT-VNN
NIPS2017 NMT-VNN성재 최
 
Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...
Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...
Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...Yam Peleg
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing MachinesKato Yuzuru
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
Echo state networks and locomotion patterns
Echo state networks and locomotion patternsEcho state networks and locomotion patterns
Echo state networks and locomotion patternsVito Strano
 
Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From DataSungjoon Choi
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfssuser7f0b19
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks IISang Jun Lee
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural NetworksDatabricks
 
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARNMLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARNJosh Patterson
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionEun Ji Lee
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slidesMLconf
 
Neural-Networks.ppt
Neural-Networks.pptNeural-Networks.ppt
Neural-Networks.pptRINUSATHYAN
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfKundjanasith Thonglek
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...Shuhei Yoshida
 

Similar to One shot learning - deep learning ( meta learn ) (20)

Lecture 4 neural networks
Lecture 4 neural networksLecture 4 neural networks
Lecture 4 neural networks
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 
NIPS2017 NMT-VNN
NIPS2017 NMT-VNNNIPS2017 NMT-VNN
NIPS2017 NMT-VNN
 
Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...
Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...
Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...
 
Neural Turing Machines
Neural Turing MachinesNeural Turing Machines
Neural Turing Machines
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Echo state networks and locomotion patterns
Echo state networks and locomotion patternsEcho state networks and locomotion patterns
Echo state networks and locomotion patterns
 
Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From Data
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdf
 
Lecture 5: Neural Networks II
Lecture 5: Neural Networks IILecture 5: Neural Networks II
Lecture 5: Neural Networks II
 
Introduction to Neural Networks
Introduction to Neural NetworksIntroduction to Neural Networks
Introduction to Neural Networks
 
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARNMLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
MLConf 2013: Metronome and Parallel Iterative Algorithms on YARN
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
Neural-Networks.ppt
Neural-Networks.pptNeural-Networks.ppt
Neural-Networks.ppt
 
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdfAuto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
Auto-Scaling Apache Spark cluster using Deep Reinforcement Learning.pdf
 
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
InfoGAN: Interpretable Representation Learning by Information Maximizing Gen...
 

More from Dong Heon Cho

Forward-Forward Algorithm
Forward-Forward AlgorithmForward-Forward Algorithm
Forward-Forward AlgorithmDong Heon Cho
 
Neural Radiance Field
Neural Radiance FieldNeural Radiance Field
Neural Radiance FieldDong Heon Cho
 
2020 > Self supervised learning
2020 > Self supervised learning2020 > Self supervised learning
2020 > Self supervised learningDong Heon Cho
 
All about that pooling
All about that poolingAll about that pooling
All about that poolingDong Heon Cho
 
Background elimination review
Background elimination reviewBackground elimination review
Background elimination reviewDong Heon Cho
 
Transparent Latent GAN
Transparent Latent GANTransparent Latent GAN
Transparent Latent GANDong Heon Cho
 
Multi object Deep reinforcement learning
Multi object Deep reinforcement learningMulti object Deep reinforcement learning
Multi object Deep reinforcement learningDong Heon Cho
 
Multi agent reinforcement learning for sequential social dilemmas
Multi agent reinforcement learning for sequential social dilemmasMulti agent reinforcement learning for sequential social dilemmas
Multi agent reinforcement learning for sequential social dilemmasDong Heon Cho
 
Hybrid reward architecture
Hybrid reward architectureHybrid reward architecture
Hybrid reward architectureDong Heon Cho
 
Use Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutesUse Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutesDong Heon Cho
 
AlexNet and so on...
AlexNet and so on...AlexNet and so on...
AlexNet and so on...Dong Heon Cho
 
Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDong Heon Cho
 
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few dataDong Heon Cho
 
Domain adaptation gan
Domain adaptation ganDomain adaptation gan
Domain adaptation ganDong Heon Cho
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDong Heon Cho
 

More from Dong Heon Cho (20)

Forward-Forward Algorithm
Forward-Forward AlgorithmForward-Forward Algorithm
Forward-Forward Algorithm
 
What is Texture.pdf
What is Texture.pdfWhat is Texture.pdf
What is Texture.pdf
 
BADGE
BADGEBADGE
BADGE
 
Neural Radiance Field
Neural Radiance FieldNeural Radiance Field
Neural Radiance Field
 
2020 > Self supervised learning
2020 > Self supervised learning2020 > Self supervised learning
2020 > Self supervised learning
 
All about that pooling
All about that poolingAll about that pooling
All about that pooling
 
Background elimination review
Background elimination reviewBackground elimination review
Background elimination review
 
Transparent Latent GAN
Transparent Latent GANTransparent Latent GAN
Transparent Latent GAN
 
Image matting atoc
Image matting atocImage matting atoc
Image matting atoc
 
Multi object Deep reinforcement learning
Multi object Deep reinforcement learningMulti object Deep reinforcement learning
Multi object Deep reinforcement learning
 
Multi agent reinforcement learning for sequential social dilemmas
Multi agent reinforcement learning for sequential social dilemmasMulti agent reinforcement learning for sequential social dilemmas
Multi agent reinforcement learning for sequential social dilemmas
 
Multi agent System
Multi agent SystemMulti agent System
Multi agent System
 
Hybrid reward architecture
Hybrid reward architectureHybrid reward architecture
Hybrid reward architecture
 
Use Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutesUse Jupyter notebook guide in 5 minutes
Use Jupyter notebook guide in 5 minutes
 
AlexNet and so on...
AlexNet and so on...AlexNet and so on...
AlexNet and so on...
 
Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image Perspective
 
LOL win prediction
LOL win predictionLOL win prediction
LOL win prediction
 
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few data
 
Domain adaptation gan
Domain adaptation ganDomain adaptation gan
Domain adaptation gan
 
Dense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other ModelsDense sparse-dense training for dnn and Other Models
Dense sparse-dense training for dnn and Other Models
 

Recently uploaded

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad EscortsCall girls in Ahmedabad High profile
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 

Recently uploaded (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
(ISHITA) Call Girls Service Hyderabad Call Now 8617697112 Hyderabad Escorts
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 

One shot learning - deep learning ( meta learn )

  • 1. [N] – SHOT Learning In deep learning models davinnovation@gmail.com
  • 3. What they need… single one picture book 60,000 train data (MNIST)
  • 4. In Traditional Deep Learning… http://pinktentacle.com/tag/waseda-university/
  • 5. One Shot Learning Give Inference ability to Learning model
  • 6. One Shot Learning Train One(Few) Data, Works Awesome
  • 7. One Shot Learning – Many approaches Transfer Learning Domain Adaptation Imitation Learning …..
  • 8. Started from Meta Learning Meta means something that is "self-referencing". ‘Knowing about knowing’ Today…
  • 9. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Turing Machine Basic Structure
  • 10. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) N x M memory Block 𝑀𝑡 : t is time N M
  • 11. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Write Heads Controller External Input External Output Read Heads σ𝑖 𝑤𝑡(𝑖) = 1 (0 ≤ 𝑤𝑡(𝑖) ≤ 1)0.9 0.1 0 ... Weight vector : 𝑤𝑡 1 2 2 1 1 3 1.1 1.9 1.2 𝑟𝑡 ← ෍ 𝑖 𝑤𝑡(𝑖)𝑀𝑡(𝑖) Read vector : 𝑟𝑡 i = 0 i = 1 … i = 0 i = 1 …
  • 12. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Write Heads σ𝑖 𝑊𝑡(𝑖) = 1 (0 ≤ 𝑊𝑡(𝑖) ≤ 1) 0.9 0.1 0 ... Weight vector : 𝑊𝑡0.1 1.8 2 1 0.1 2.7 erase vector : 𝑒𝑡 (0 < 𝑒𝑡 < 1) 𝑀′ 𝑡(𝑖) ← 𝑀𝑡−1(𝑖)[1 − 𝑤𝑡 𝑖 𝑒𝑡] ~1 ~0 ~1 In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 ) ` Erase -> Add
  • 13. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Write Heads σ𝑖 𝑊𝑡(𝑖) = 1 (0 ≤ 𝑊𝑡(𝑖) ≤ 1) 0.9 0.1 0 ... Weight vector : 𝑊𝑡 1 1.9 2.9 1.1 0.1 2.7 add vector : 𝑎 𝑡 (0 < 𝑎 𝑡 < 1) 𝑀𝑡 𝑖 ← 𝑀′ 𝑡 𝑖 + 𝑤𝑡 𝑖 𝑎 𝑡 ~1 ~1 ~0 In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 ) Erase -> Add
  • 14. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Content-based addressing > - Based on Similarity of ‘Current value’ and ‘predicted by controller value’ location-based addressing > Like 𝑓 𝑥, 𝑦 = 𝑥 x 𝑦 : variable 𝑥 , 𝑦 store them in different addresses, retrieve them and perform a multiplication algorithm => - Based on addressed by location Use both mechanisms concurrently
  • 15. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate (0, 1) 𝑠𝑡 : shift weighting ( only integer ) γ 𝑡 : sharping weighting Content-based addressing Location-based addressing
  • 16. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate 𝑠𝑡 : shift weighting γ 𝑡 : sharping weighting Content Addressing 𝑤𝑡 𝑐 𝑘 𝑡 β 𝑡 𝑀𝑡 𝑤𝑡 𝑐 𝑖 ← exp(β 𝑡 𝐾 𝑘 𝑡,𝑀𝑡 𝑖 ) σ 𝑗 exp(β 𝑡 𝐾 𝑘 𝑡,𝑀𝑡 𝑗 ) 𝐾 u, v = 𝑢 ∙ 𝑣 | 𝑢 | ∙ | 𝑣 | ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 1 2 2 1 1 3 3 2 1 𝑘 𝑡 𝑀𝑡 β 𝑡 = 0 0.5 0.5 .61 .39 β 𝑡 = 5 .98 .02 β 𝑡 = 50 𝑤𝑡 𝑐
  • 17. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate 𝑠𝑡 : shift weighting γ 𝑡 : sharping weighting Interpolation𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡−1 𝑔𝑡 𝑤𝑡 𝑔 ← 𝑔𝑡 𝑤𝑡 𝑐 + (1 − 𝑔𝑡) 𝑤𝑡−1 .61 .39 𝑤𝑡 𝑐 .01 .90 𝑤𝑡−1 𝑔𝑡 = 0 .01 .90 𝑔𝑡 = 0.5 .36 .64 𝑤𝑡 𝑔
  • 18. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate 𝑠𝑡 : shift weighting γ 𝑡 : sharping weighting Convolutional Shift 𝑤𝑡 𝑔 𝑤′ 𝑡 𝑠𝑡 𝑤′ 𝑡(𝑖) ← σ 𝑗=0 𝑁−1 𝑤𝑡 𝑔 𝑗 𝑠𝑡(𝑖 − 𝑗) // circular convolution .01 .90 𝑤𝑡 𝑔 0 1 𝑠𝑡 0.9 𝑤′ 𝑡(0) .01 .90 𝑤𝑡 𝑔 0 1 𝑠𝑡 .01 𝑤′ 𝑡(1) .01 .90 𝑤′ 𝑡 0 1 𝑠𝑡 1 0 𝑠𝑡 0.5 0.5 𝑠𝑡 Right shift Left shift Spread shift
  • 19. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate (0, 1) 𝑠𝑡 : shift weighting ( only integer ) γ 𝑡 : sharping weighting Sharpening 𝑤𝑡 𝑤′ 𝑡 γ 𝑡 𝑤𝑡 𝑖 ← 𝑤′ 𝑡(𝑖)γ 𝑡 σ 𝑗 𝑤′ 𝑡(𝑗)γ 𝑡 .01 .90 𝑤′ 𝑡 γ 𝑡 = 0 0.5 0.5 .01 .90 .0001 .9998 γ 𝑡 = 1 γ 𝑡 = 2
  • 20. Neural Turing Machine https://arxiv.org/abs/1410.5401 Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Neural Network (RNN+LSTM) Controller
  • 21. Experiment – Copy train time
  • 22. Experiment – Copy result NTM LSTM Input : Random binary vector ( 8bit ) NTM > LSTM Quick & Clear
  • 23. Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf NTM is ‘Meta learning’ => Use Memory-Augment, make rapid learn model.
  • 24. Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf Memory Read Heads Write Heads Controller External Input External OutputAddressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Content-based addressing > - Based on Similarity of ‘Current value’ and ‘predicted by controller value’ location-based addressing > Like 𝑓 𝑥, 𝑦 = 𝑥 x 𝑦 : variable 𝑥 , 𝑦 store them in different addresses, retrieve them and perform a multiplication algorithm => - Based on addressed by location Use both mechanisms concurrently => Doesn’t use location-based addressing => Only Use Content-based addressing Memory-Augmented Neural Networks (MANN)
  • 25. Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate (0, 1) 𝑠𝑡 : shift weighting ( only integer ) γ 𝑡 : sharping weighting Content-based addressing Location-based addressing Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf MANN
  • 26. Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑟 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate (0, 1) 𝑠𝑡 : shift weighting ( only integer ) γ 𝑡 : sharping weighting Content-based addressing Location-based addressing Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf MANN
  • 27. Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Addressing Mechanism How to calculate ‘Weight Vector’ 𝑊𝑡 Use both mechanisms concurrently 𝑤𝑡−1 𝑀𝑡 𝑘 𝑡 β 𝑡 𝑔𝑡 𝑠𝑡 γ 𝑡 Content Addressing Interpolation Convolutional Shift Sharpening Previous State Controller Outputs 𝑤𝑡 𝑐 𝑤𝑡 𝑔 𝑤𝑡 𝑤′ 𝑡 𝑘 𝑡 : key vector β 𝑡 : key strength 𝑔𝑡 : interpolation gate 𝑠𝑡 : shift weighting γ 𝑡 : sharping weighting Content Addressing 𝑤𝑡 𝑟 𝑘 𝑡 𝑀𝑡 𝑤𝑡 𝑟 𝑖 ← exp(𝐾 𝑘 𝑡,𝑀𝑡 𝑖 ) σ 𝑗 exp(𝐾 𝑘 𝑡,𝑀𝑡 𝑗 ) 𝐾 u, v = 𝑢 ∙ 𝑣 | 𝑢 | ∙ | 𝑣 | ∶ 𝑐𝑜𝑠𝑖𝑛𝑒 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 1 2 2 1 1 3 3 2 1 𝑘 𝑡 𝑀𝑡 .53 .47 𝑤𝑡 𝑟 Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf MANN
  • 28. Memory Read Heads Write Heads Controller External Input External Output Neural Network (RNN) Write Heads 0.9 0.1 0 ... 0.1 1.8 2 1 0.1 2.7 𝑤𝑡 𝑢 ← γ 𝑤𝑡−1 𝑢 + 𝑤𝑡 𝑟 + 𝑤𝑡 𝑤 ~1 ~0 ~1 In my case - For compute, set `nearly’ 0 and 1 ( it’s not actually 0 || 1 ) ` Memory-Augmented Neural Networks https://arxiv.org/pdf/1605.06065.pdf Least Recently Used Access ( LRUA ) Compute from MANN 𝑤𝑡 𝑤 ← σ α 𝑤𝑡−1 𝑟 + (1 − σ α ) 𝑤𝑡−1 𝑙𝑢 Decay Parameter 𝑤𝑡 𝑙𝑢 (𝑖) = ቊ 0, 𝑖𝑓 𝑤𝑡 𝑢 𝑖 𝑖𝑠 𝑏𝑖𝑔 𝑒𝑛𝑜𝑢𝑔ℎ 1, 𝑒𝑙𝑠𝑒 𝑀𝑡 𝑖 ← 𝑀𝑡−1 𝑖 + 𝑤𝑡 𝑤 𝑖 𝑘 𝑡 Least Used Memory Sigmoid Function Scalar gate parameter 𝑤𝑡 𝑤 𝑘 𝑡 : write vector
  • 29. Experiment – Data One Episode : Input : (𝑥0, 𝑛𝑢𝑙𝑙), 𝑥1, 𝑦0 , 𝑥2, 𝑦1 , … (𝑥 𝑇, 𝑦 𝑇−1) Output : (𝑦′0), 𝑦′1 , 𝑦′2 , … (𝑦′ 𝑇) 𝑝 𝑦𝑡 𝑥𝑡; 𝐷1:𝑡−1; 𝜃) (𝑥0, 𝑛𝑢𝑙𝑙 ) (𝑦′0) RNN 𝑥1, 𝑦0 (𝑦′1)
  • 30. Experiment – Data Omniglot Dataset : 1600 > classes  1200 class train, 423 class test ( downscale to 20x20 ) + plus rotate augmentation
  • 31. Experiment – Classification Result Trained with one-hot vector representations With Five randomly chosen labels, train 100,000 episode ( each episode are ‘new class’ ) Instance : class emerge count…?
  • 33. Active One-Shot https://cs.stanford.edu/~woodward/papers/active_one_shot_learning_2016.pdf RNN 𝑦𝑡, 𝑥 𝑡+1 or (0, 𝑥 𝑡+1) [0,1] || (𝑦′ 𝑡, 0) 𝑦𝑡, 𝑥 𝑡+1 [0,1] 𝑟𝑡 = −0.05 𝑦𝑡, 𝑥 𝑡+1 (𝑦′ 𝑡, 0) 𝑟𝑡 = ቊ +1, 𝑖𝑓 𝑦′ 𝑡 = 𝑦𝑡 −1
  • 36. Reference • https://tensorflow.blog/tag/one-shot-learning/ • http://www.modulabs.co.kr/DeepLAB_library/11115 • https://www.youtube.com/watch?v=CzQSQ_0Z-QU • https://www.slideshare.net/JisungDavidKim/oneshot- learning • https://norman3.github.io/papers/docs/neural_turing_ machine.html • https://www.slideshare.net/webaquebec/guillaume- chevalier-deep-learning-avec-tensor-flow • https://www.slideshare.net/ssuser06e0c5/metalearnin g-with-memory-augmented-neural-networks