Meta-Learning with Memory Augmented Neural Networks

Meta-Learning with Memory
Augmented Neural Networks
Master Seminar II: Data Analytics
Presented By:
Sakshi Singh (305238)
Seminar II: Data Analytics

OUTLINE
• Introduction
• Related work
• Methodology
• Model
• Experiments and Results
• Conclusion
• Future Work
• References
2Meta-Learning with Memory-Augmented Neural NetworksSakshi Singh (Universität Hildesheim)

Paper Introduction
• Title: Meta-Learning with Memory-Augmented Neural Networks
• Authors: Adam Santoro (Google DeepMind), Sergey Bartunov (Google DeepMind,
Natural Research University Higher School of Economics), Mathew Botvinick
(Google DeepMind), Daan Wierstra (Google DeepMind) and Timothy Lillicrap
(Google DeepMind)
• Publish date: June 19th, 2016
• Published by: ICML'16: Proceedings of the 33rd International Conference on
International Conference on Machine Learning
3Meta-Learning with Memory-Augmented Neural NetworksSakshi Singh (Universität Hildesheim)

Introduction
• Traditional gradient-based networks: require lot of data to learn by
iteratively relearning the parameters
• Meta-learning: learning to learn
• Memory Augmented Neural Networks (MAML):
• Refers to a class of external memory equipped networks
• rapidly integrates new data to make predictions after few samples
Meta-Learning with Memory-Augmented Neural Networks 4Sakshi Singh (Universität Hildesheim)

Why MANN?
• It can store information in a form that is:
• Stable (accessed whenever needed)
• Element-wise addressable (only relevant piece of info accessed)
• The number of parameters are not tied up the memory.
• Approach in this paper:
• Slowly learn representation of raw data
• Rapidly bind never-seen-before data to external memory

Related work
• Hochreiter, Sepp, Younger, A Steven, and Conwell, Peter R. “Learning
to learn using gradient descent in Artificial Neural Networks” ICANN
2001, pp. 87–94. Springer (2001)
• Graves, Alex, Wayne, Greg, and Danihelka, Ivo. “Neural Turing
Machines”. arXiv preprint arXiv:1410.5401 (2014)
• Lake, Brenden M, Salakhutdinov, Ruslan, and Tenenbaum, Joshua B.
“Human-level concept learning through probabilistic program
induction. Science”, 350(6266):13321338 (2015)
Meta-Learning with Memory-Augmented Neural Networks 6
Sakshi Singh (Universität Hildesheim)

Methodology
• Learning is done in episodes
• Example:
• Human’s lifespan = 1 episode
• Evolution = meta-learning
• Learning during episode = very important
• Episodes contain similar problems but are different

Episode Setup
1. Pick out different characters (Omniglot) and assign random labels
3 4 1 2

Episode Setup
2. Shuffle in any random sequence from dataset-to-dataset

Episode Setup
3. Pass one by one to neural network and it has to predict correct labels

Episode Setup
• Dataset: D = {dt} = {(xt, yt)}
• yt is presented in offset manner:
• Regression: real valued number xt
• Classification: class label for xt
• Input sequence:
• (x1, null), (x2, y1), . . . , (xT , yT−1)
• yt is both output and input

Episode Setup
Feed-forward neural networks (no feedback) Solution 1: Recurrent Neural Networks

Episode Setup
Solution 2: Augmented memory
• RNN: controller
• External memory: RAM
• Controller network
• send data to external memory (write)
• query external memory to retrieve data
(read)

Model: NTM
Neural Turing Machines
• Content-based approach
• Controller interact with external
memory using read-write heads
• Capable of
• long-term storage via slow weight updates
• short-term storage via external memory

Content-based approach: NTM
• Controller produces the key to store and retrieve data
• read-write vector
• Memory rt is retrieved using

LRUA
• Least Recently Used Access
• Pure content-based memory module
• Two options:
• Least used location: preserving information recently encoded
• Last used location: updating the memory with new information

LRUA
• Usage weight:
• Least used weight:
• Write weight:
• Writing to memory:

Experiments
• Datasets:
• Classification: Omniglot
• Regression: Gaussian Process (GP)
• Data augmented by translating & rotating images by 90◦, 180 ◦, 270 ◦
• Training classes: 1200
• Testing classes: 423

Omniglot Classification (1)
• Training:
• 100,000 episodes on 5 random classes
• Testing:
• on never-before-seen classes
• use one-hot vector as class labels

MANN Vs. Human
• Performance of MANN surpassed
human performance
• Educated guessing in MANN: if a
sample produced key which was a
poor match in memory, then
probability of correct guessing
increases on first instance.

Omniglot Classification (2)
• Class representation approach: so that no. of classes per episode can
increase arbitrarily
• Characters from each label uniformly sampled from {‘a’, ’b’, ’c’, ’d’, ’e’}
• Produces random strings
• String length= 25 (with 5 values=1, rest=0)
• Total 3125 possible labels

LSTM MANN
One-hot
Vector
representation
Class
representation
5 random
classes/episode
(length 50)
15 random
classes/episode
(length 100)

Experiment: Different Algorithms

Experiment: Persistent Memory
• Better strategy: wipe external
memory episode-to-episode
(a) → 5-classes per episode length
50: learning very slow and no
spike in accuracy
(b) → 10-classes per episode
length 75: accuracy
comparable

Experiment: Curriculum Training
• Done to scale classification capabilities
• Initially 15-classes per episode
• After every 10,000 episodes increase no. of
classes per episode by 1
• Result: network maintained high accuracy
• Tested on 50-classes per episode, increasing
by 100 (max)
• Result: network with decaying performance

Regression
• Generated functions using GP with fixed
hyper-parameters
• Each episode:
• presentation of x value(1D, 2D or 3D)
• Time offset function values f(xt-1)
• Binding x values to their function values
• GP can performs complex queries on all
data points

Conclusion
• Central contribution: demonstrated the ability of MANN to do meta-
learning
• Gradual, incremental learning encodes background knowledge and
external memory binds information to new tasks
• Content-based memory access approach
• MANN outperformed all the existing methods
• External memory yields better results

Future Work
• Meta-learning can discover optimal memory addressing procedures
• Training on wider range of tasks
• MANN performance in meta-learning tasks requiring active learning
• Exploring requirements of robust performance
• Accessing maximum capacity of the network

References
1. Adam Santoro, Sergey Bartunov, Mathew Botvinick, Daan Wierstra and Timothy
Lillicrap, Meta-Learning with Memory-Augmented Neural Networks (2016)
2. Graves, Alex, Wayne, Greg, and Danihelka, Ivo. “Neural Turing Machines”. arXiv
preprint arXiv:1410.5401 (2014)
3. Meta learning with memory augmented neural network and NPI,
https://www.youtube.com/watch?v=s3iV4SK3CuU, accessed on 19.05.2020
4. Meta-Learning and One-Shot Learning,
https://www.youtube.com/watch?v=KUWywwvQv8E, accessed on 19.05.2020

THANK YOU!
Questions?

Meta-Learning with Memory Augmented Neural Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (11)

Similar to Meta-Learning with Memory Augmented Neural Networks

Similar to Meta-Learning with Memory Augmented Neural Networks (20)

Recently uploaded

Recently uploaded (20)

Meta-Learning with Memory Augmented Neural Networks