ICML 2017 Meta network

Meta Network
ICML 2017 citation: 0
Tsendsuren Munkhdalai
Hong Yu
University of Massachusetts, MA, USA
Katy Lee @ Datalab 2017.09.11
1

Background
• two level learning on meta learning:
• slow learning of a meta-level model preforming
across tasks
• rapid learning of a base-level model acting with
each task
2

Motivation
• We want the neural network to learn and to
generalize a new task or concept from a single
example on the fly.
3

Related Work
• Santoro et al., Meta-learning with memory-
augmented neural networks. ICML 2016. -> use
the external memory as temporary memory
4

• Ravi, Sachin and Larochell, Hugo. Optimization as
a model for few-shot learning. ICLR 2017
5

6

7

Base Learner b:
overlook
• Unlike standard neural nets, b is parameterized by
slow weights W and example-level fast weights W*
• slow weights: updated via a learning algorithm
during training
• fast weights: generated by the meta learner for
every input.
29

How Base Learner b
provides meta info
• X’ belongs to support set
• The meta information is derived from the base
learner in form of the loss gradient information:
30

Model
W W*
u (Q, Q*)
m(Z) d(G)
index R
31

Meta Learner
• function: collects meta-info. and produce per example
fast weight W* for the base learner when training
• fast weight W* generating function m with parameter Z
• fast weight Q* generating function d with parameter G
• a dynamic representation learning function u (Q, Q*)
32

• m learns the mapping from the loss gradient to the
fast weights for the support set(not the final ones
output to the base learner)
• we store the fast weights in a memory M which is
indexed with task dependent embeddings R
= {ri}i=1 to n of the support examples, obtained by u
Meta Learner: m
33

Memory
index R
r1 W*1
r2 W*2
r3 W*3
r4 W*4
34

Meta Learner:u
• a dynamic representation learning network u (Q,
Q*)
• We generate the fast weights Q* on a per task basis
as follows:
35

Meta Learner
• Once the fast weights are generated, the task
dependent supper set input representations are
computed as:
• r’i =u(Q,Q*,x’i)
• Q and Q* are integrated using the layer
augmentation method
• => Memory is ready!
36

Meta Learner
• After the fast weights Wi* are stored in the memory M and
the index R is constructed
• given input xi in the training set/test set
1. embed xi using the dynamic representation u, ri
=u(Q,Q*,xi)
2. reads the memory with soft attention
37

Model
W W*
u (Q, Q*)
m(Z) d(G)
index R
b
5 conv with 64 filter
maxpool
2 fc
u
5 conv with 64 filter
maxpool
2 fc
d, m
3 fc with 20
neurons
39

• W, W*: slow and fast weight for the base learner
• Q, Q*: slow and fast weight for representational
learning
• Z: slow weight for m, a nn that generates fast
weight W*
• G: slow weight for m, a nn that generates fast
weight Q*
40

On support set
memory ready now!
N: size of support set
42

On training set
W, Q, Z, G
L: size of support set
R: index of memory
43

One-shot Learning exp.
• Omniglot previous split
• Omniglot standard split.
• Mini-ImageNet
44

• Mini-ImageNet
45

Omniglot Previous Split
• following matching network exp. settings
• 1200 training classes, 423 testing classes, 20 examples
per class
• three variations of MetaNet
• MetaNet+: additional task-level weight for base learner
• MetaNet
• MetaNet-: not task-level fast weight W* for meta-learner
46

• Omniglot standard split
• Mini-ImageNet
48

Omniglot stardard split
• 1200 -> 964 training classes, 423 -> 659 testing
classes
49

• Mini-ImageNet
51

Mini-ImageNet
• 64 training classes, 20 testing classes
• 600 examples per class
52

Generalization Experiment
• N-way training and K-way testing
• Rapid Parameterization of Fixed Weight
• Meta-level continual learning
54

55

N-way training and K-way
testing
56

57

• replace the base learner with a new CNN during
evaluation.
• the fast weight of the new CNN is generated by the
meta learner that is trained to paramerterize the old
based leaner(target CNN)
58

Rapid Parameterization of
Fixed Weight
• small: 32 filters
• target: 64 filters
• big: 128 filters
59

60

Meta-level continual learning
• train and test on Omniglot -> train on MNIST -> train
and test on Omniglot again
61

Meta-level continual learning
acc difference = after - before(on Omniglot)
62

Conclusion
• pro:
• Interesting model, slow and fast weight have
different functions
• Solid experiment
• con:
• It’s kinda hard to read
63

Future Work
• other meta information other than gradient
• detect task/domain automatically
64

ICML 2017 Meta network

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to ICML 2017 Meta network

Similar to ICML 2017 Meta network (20)

Recently uploaded

Recently uploaded (20)

ICML 2017 Meta network

Editor's Notes