SlideShare a Scribd company logo
Meta Network
ICML 2017 citation: 0
Tsendsuren Munkhdalai
Hong Yu
University of Massachusetts, MA, USA
Katy Lee @ Datalab 2017.09.11
1
Background
• two level learning on meta learning:
• slow learning of a meta-level model preforming
across tasks
• rapid learning of a base-level model acting with
each task
2
Motivation
• We want the neural network to learn and to
generalize a new task or concept from a single
example on the fly.
3
Related Work
• Santoro et al., Meta-learning with memory-
augmented neural networks. ICML 2016. -> use
the external memory as temporary memory
4
• Ravi, Sachin and Larochell, Hugo. Optimization as
a model for few-shot learning. ICLR 2017
5
• Ravi, Sachin and Larochell, Hugo. Optimization as
a model for few-shot learning. ICLR 2017
6
• Ravi, Sachin and Larochell, Hugo. Optimization as
a model for few-shot learning. ICLR 2017
7
Model
per example
per task
26
Model
W W*
27
Base Learner b:
overlook
• Unlike standard neural nets, b is parameterized by
slow weights W and example-level fast weights W*
• slow weights: updated via a learning algorithm
during training
• fast weights: generated by the meta learner for
every input.
29
How Base Learner b
provides meta info
• X’ belongs to support set
• The meta information is derived from the base
learner in form of the loss gradient information:
30
Model
W W*
u (Q, Q*)
m(Z) d(G)
index R
31
Meta Learner
• function: collects meta-info. and produce per example
fast weight W* for the base learner when training
• fast weight W* generating function m with parameter Z
• fast weight Q* generating function d with parameter G
• a dynamic representation learning function u (Q, Q*)
32
• m learns the mapping from the loss gradient to the
fast weights for the support set(not the final ones
output to the base learner)
• we store the fast weights in a memory M which is
indexed with task dependent embeddings R
= {ri}i=1 to n of the support examples, obtained by u
Meta Learner: m
33
Memory
index R
r1 W*1
r2 W*2
r3 W*3
r4 W*4
34
Meta Learner:u
• a dynamic representation learning network u (Q,
Q*)
• We generate the fast weights Q* on a per task basis
as follows:
35
Meta Learner
• Once the fast weights are generated, the task
dependent supper set input representations are
computed as:
• r’i =u(Q,Q*,x’i)
• Q and Q* are integrated using the layer
augmentation method
• => Memory is ready!
36
Meta Learner
• After the fast weights Wi* are stored in the memory M and
the index R is constructed
• given input xi in the training set/test set
1. embed xi using the dynamic representation u, ri
=u(Q,Q*,xi)
2. reads the memory with soft attention
37
Layer Augmentation
38
Model
W W*
u (Q, Q*)
m(Z) d(G)
index R
b
5 conv with 64 filter
maxpool
2 fc
u
5 conv with 64 filter
maxpool
2 fc
d, m
3 fc with 20
neurons
39
• W, W*: slow and fast weight for the base learner
• Q, Q*: slow and fast weight for representational
learning
• Z: slow weight for m, a nn that generates fast
weight W*
• G: slow weight for m, a nn that generates fast
weight Q*
40
41
On support set
memory ready now!
N: size of support set
42
On training set
W, Q, Z, G
L: size of support set
R: index of memory
43
One-shot Learning exp.
• Omniglot previous split
• Omniglot standard split.
• Mini-ImageNet
44
• Omniglot previous split
• Omniglot standard split.
• Mini-ImageNet
One-shot Learning exp.
45
Omniglot Previous Split
• following matching network exp. settings
• 1200 training classes, 423 testing classes, 20 examples
per class
• three variations of MetaNet
• MetaNet+: additional task-level weight for base learner
• MetaNet
• MetaNet-: not task-level fast weight W* for meta-learner
46
Omniglot Previous Split
47
• Omniglot previous split
• Omniglot standard split
• Mini-ImageNet
One-shot Learning exp.
48
Omniglot stardard split
• 1200 -> 964 training classes, 423 -> 659 testing
classes
49
Omniglot stardard split
50
One-shot Learning exp.
• Omniglot previous split
• Omniglot standard split.
• Mini-ImageNet
51
Mini-ImageNet
• 64 training classes, 20 testing classes
• 600 examples per class
52
Mini-ImageNet
53
Generalization Experiment
• N-way training and K-way testing
• Rapid Parameterization of Fixed Weight
• Meta-level continual learning
54
Generalization Experiment
• N-way training and K-way testing
• Rapid Parameterization of Fixed Weight
• Meta-level continual learning
55
N-way training and K-way
testing
56
Generalization Experiment
• N-way training and K-way testing
• Rapid Parameterization of Fixed Weight
• Meta-level continual learning
57
Generalization Experiment
• replace the base learner with a new CNN during
evaluation.
• the fast weight of the new CNN is generated by the
meta learner that is trained to paramerterize the old
based leaner(target CNN)
58
Rapid Parameterization of
Fixed Weight
• small: 32 filters
• target: 64 filters
• big: 128 filters
59
Generalization Experiment
• N-way training and K-way testing
• Rapid Parameterization of Fixed Weight
• Meta-level continual learning
60
Meta-level continual learning
• train and test on Omniglot -> train on MNIST -> train
and test on Omniglot again
61
Meta-level continual learning
acc difference = after - before(on Omniglot)
62
Conclusion
• pro:
• Interesting model, slow and fast weight have
different functions
• Solid experiment
• con:
• It’s kinda hard to read
63
Future Work
• other meta information other than gradient
• detect task/domain automatically
64

More Related Content

What's hot

01 Introduction to Machine Learning
01 Introduction to Machine Learning01 Introduction to Machine Learning
01 Introduction to Machine Learning
Tamer Ahmed Farrag, PhD
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Pedro Lopes
 
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursionMaking neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursion
Katy Lee
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design TrainingESCOM
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
ﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
02 Fundamental Concepts of ANN
02 Fundamental Concepts of ANN02 Fundamental Concepts of ANN
02 Fundamental Concepts of ANN
Tamer Ahmed Farrag, PhD
 
Deep Learning Part 1 : Neural Networks
Deep Learning Part 1 : Neural NetworksDeep Learning Part 1 : Neural Networks
Deep Learning Part 1 : Neural Networks
Madhu Sanjeevi (Mady)
 
ANN load forecasting
ANN load forecastingANN load forecasting
ANN load forecasting
Dr Ashok Tiwari
 
Survey on contrastive self supervised l earning
Survey on contrastive self supervised l earningSurvey on contrastive self supervised l earning
Survey on contrastive self supervised l earning
Anirudh Ganguly
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
Gayatri Khanvilkar
 
Handwriting recognition
Handwriting recognitionHandwriting recognition
Handwriting recognition
Maeda Hanafi
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Sujit Pal
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
Mohamed Talaat
 
Fast AutoAugment
Fast AutoAugmentFast AutoAugment
Fast AutoAugment
Yongsu Baek
 
Long Zhou - 2017 - Neural System Combination for Machine Transaltion
Long Zhou - 2017 -  Neural System Combination for Machine TransaltionLong Zhou - 2017 -  Neural System Combination for Machine Transaltion
Long Zhou - 2017 - Neural System Combination for Machine Transaltion
Association for Computational Linguistics
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
Ronald Teo
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep Learning
Madhu Sanjeevi (Mady)
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
InVID Project
 
nnU-Net: a self-configuring method for deep learning-based biomedical image s...
nnU-Net: a self-configuring method for deep learning-based biomedical image s...nnU-Net: a self-configuring method for deep learning-based biomedical image s...
nnU-Net: a self-configuring method for deep learning-based biomedical image s...
ivaderivader
 

What's hot (20)

01 Introduction to Machine Learning
01 Introduction to Machine Learning01 Introduction to Machine Learning
01 Introduction to Machine Learning
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
Making neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursionMaking neural programming architectures generalize via recursion
Making neural programming architectures generalize via recursion
 
NEURAL Network Design Training
NEURAL Network Design  TrainingNEURAL Network Design  Training
NEURAL Network Design Training
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
02 Fundamental Concepts of ANN
02 Fundamental Concepts of ANN02 Fundamental Concepts of ANN
02 Fundamental Concepts of ANN
 
Deep Learning Part 1 : Neural Networks
Deep Learning Part 1 : Neural NetworksDeep Learning Part 1 : Neural Networks
Deep Learning Part 1 : Neural Networks
 
ANN load forecasting
ANN load forecastingANN load forecasting
ANN load forecasting
 
AI IEEE
AI IEEEAI IEEE
AI IEEE
 
Survey on contrastive self supervised l earning
Survey on contrastive self supervised l earningSurvey on contrastive self supervised l earning
Survey on contrastive self supervised l earning
 
Activation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural networkActivation functions and Training Algorithms for Deep Neural network
Activation functions and Training Algorithms for Deep Neural network
 
Handwriting recognition
Handwriting recognitionHandwriting recognition
Handwriting recognition
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Fast AutoAugment
Fast AutoAugmentFast AutoAugment
Fast AutoAugment
 
Long Zhou - 2017 - Neural System Combination for Machine Transaltion
Long Zhou - 2017 -  Neural System Combination for Machine TransaltionLong Zhou - 2017 -  Neural System Combination for Machine Transaltion
Long Zhou - 2017 - Neural System Combination for Machine Transaltion
 
Lec3 dqn
Lec3 dqnLec3 dqn
Lec3 dqn
 
Introduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep LearningIntroduction of Machine learning and Deep Learning
Introduction of Machine learning and Deep Learning
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
 
nnU-Net: a self-configuring method for deep learning-based biomedical image s...
nnU-Net: a self-configuring method for deep learning-based biomedical image s...nnU-Net: a self-configuring method for deep learning-based biomedical image s...
nnU-Net: a self-configuring method for deep learning-based biomedical image s...
 

Similar to ICML 2017 Meta network

Network recasting
Network recastingNetwork recasting
Network recasting
NAVER Engineering
 
Neural network learning ability
Neural network learning abilityNeural network learning ability
Neural network learning ability
Nabeel Aron
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
Sri Ambati
 
Incremental Machine Learning.pptx
Incremental Machine Learning.pptxIncremental Machine Learning.pptx
Incremental Machine Learning.pptx
SHAILIPATEL19
 
Deep learning with keras
Deep learning with kerasDeep learning with keras
Deep learning with keras
MOHITKUMAR1379
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Dalei Li
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
QuantUniversity
 
Neural Nets from Scratch
Neural Nets from ScratchNeural Nets from Scratch
Neural Nets from Scratch
Seth H. Weidman
 
Remote Laboratory VISIR - Re-design of a MOOC RLMS based in Moodle
Remote Laboratory VISIR - Re-design of a MOOC RLMS based in MoodleRemote Laboratory VISIR - Re-design of a MOOC RLMS based in Moodle
Remote Laboratory VISIR - Re-design of a MOOC RLMS based in Moodle
Manuel Castro
 
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with DiscussionsIntroduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
Joonyoung Yi
 
Centertrack and naver airush 2020 review
Centertrack and naver airush 2020 reviewCentertrack and naver airush 2020 review
Centertrack and naver airush 2020 review
경훈 김
 
Lecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning TechniquesLecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning Techniques
Maninda Edirisooriya
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
QuantUniversity
 
Chapter10.pptx
Chapter10.pptxChapter10.pptx
Chapter10.pptx
adnansbp
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
ankit_ppt
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
milad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
Mehrnaz Faraz
 
Ds for finance day 2
Ds for finance day 2Ds for finance day 2
Ds for finance day 2
QuantUniversity
 
6232 implementing a microsoft sql server 2008 database
6232 implementing a microsoft sql server 2008 database6232 implementing a microsoft sql server 2008 database
6232 implementing a microsoft sql server 2008 databasebestip
 
6231 maintaining a microsoft sql server 2008 database
6231 maintaining a microsoft sql server 2008 database6231 maintaining a microsoft sql server 2008 database
6231 maintaining a microsoft sql server 2008 databasebestip
 

Similar to ICML 2017 Meta network (20)

Network recasting
Network recastingNetwork recasting
Network recasting
 
Neural network learning ability
Neural network learning abilityNeural network learning ability
Neural network learning ability
 
StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
Incremental Machine Learning.pptx
Incremental Machine Learning.pptxIncremental Machine Learning.pptx
Incremental Machine Learning.pptx
 
Deep learning with keras
Deep learning with kerasDeep learning with keras
Deep learning with keras
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
 
Neural Nets from Scratch
Neural Nets from ScratchNeural Nets from Scratch
Neural Nets from Scratch
 
Remote Laboratory VISIR - Re-design of a MOOC RLMS based in Moodle
Remote Laboratory VISIR - Re-design of a MOOC RLMS based in MoodleRemote Laboratory VISIR - Re-design of a MOOC RLMS based in Moodle
Remote Laboratory VISIR - Re-design of a MOOC RLMS based in Moodle
 
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with DiscussionsIntroduction to MAML (Model Agnostic Meta Learning) with Discussions
Introduction to MAML (Model Agnostic Meta Learning) with Discussions
 
Centertrack and naver airush 2020 review
Centertrack and naver airush 2020 reviewCentertrack and naver airush 2020 review
Centertrack and naver airush 2020 review
 
Lecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning TechniquesLecture 11 - Advance Learning Techniques
Lecture 11 - Advance Learning Techniques
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 
Chapter10.pptx
Chapter10.pptxChapter10.pptx
Chapter10.pptx
 
Deep learning summary
Deep learning summaryDeep learning summary
Deep learning summary
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Ds for finance day 2
Ds for finance day 2Ds for finance day 2
Ds for finance day 2
 
6232 implementing a microsoft sql server 2008 database
6232 implementing a microsoft sql server 2008 database6232 implementing a microsoft sql server 2008 database
6232 implementing a microsoft sql server 2008 database
 
6231 maintaining a microsoft sql server 2008 database
6231 maintaining a microsoft sql server 2008 database6231 maintaining a microsoft sql server 2008 database
6231 maintaining a microsoft sql server 2008 database
 

Recently uploaded

By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 

Recently uploaded (20)

By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 

ICML 2017 Meta network

  • 1. Meta Network ICML 2017 citation: 0 Tsendsuren Munkhdalai Hong Yu University of Massachusetts, MA, USA Katy Lee @ Datalab 2017.09.11 1
  • 2. Background • two level learning on meta learning: • slow learning of a meta-level model preforming across tasks • rapid learning of a base-level model acting with each task 2
  • 3. Motivation • We want the neural network to learn and to generalize a new task or concept from a single example on the fly. 3
  • 4. Related Work • Santoro et al., Meta-learning with memory- augmented neural networks. ICML 2016. -> use the external memory as temporary memory 4
  • 5. • Ravi, Sachin and Larochell, Hugo. Optimization as a model for few-shot learning. ICLR 2017 5
  • 6. • Ravi, Sachin and Larochell, Hugo. Optimization as a model for few-shot learning. ICLR 2017 6
  • 7. • Ravi, Sachin and Larochell, Hugo. Optimization as a model for few-shot learning. ICLR 2017 7
  • 10. Base Learner b: overlook • Unlike standard neural nets, b is parameterized by slow weights W and example-level fast weights W* • slow weights: updated via a learning algorithm during training • fast weights: generated by the meta learner for every input. 29
  • 11. How Base Learner b provides meta info • X’ belongs to support set • The meta information is derived from the base learner in form of the loss gradient information: 30
  • 12. Model W W* u (Q, Q*) m(Z) d(G) index R 31
  • 13. Meta Learner • function: collects meta-info. and produce per example fast weight W* for the base learner when training • fast weight W* generating function m with parameter Z • fast weight Q* generating function d with parameter G • a dynamic representation learning function u (Q, Q*) 32
  • 14. • m learns the mapping from the loss gradient to the fast weights for the support set(not the final ones output to the base learner) • we store the fast weights in a memory M which is indexed with task dependent embeddings R = {ri}i=1 to n of the support examples, obtained by u Meta Learner: m 33
  • 15. Memory index R r1 W*1 r2 W*2 r3 W*3 r4 W*4 34
  • 16. Meta Learner:u • a dynamic representation learning network u (Q, Q*) • We generate the fast weights Q* on a per task basis as follows: 35
  • 17. Meta Learner • Once the fast weights are generated, the task dependent supper set input representations are computed as: • r’i =u(Q,Q*,x’i) • Q and Q* are integrated using the layer augmentation method • => Memory is ready! 36
  • 18. Meta Learner • After the fast weights Wi* are stored in the memory M and the index R is constructed • given input xi in the training set/test set 1. embed xi using the dynamic representation u, ri =u(Q,Q*,xi) 2. reads the memory with soft attention 37
  • 20. Model W W* u (Q, Q*) m(Z) d(G) index R b 5 conv with 64 filter maxpool 2 fc u 5 conv with 64 filter maxpool 2 fc d, m 3 fc with 20 neurons 39
  • 21. • W, W*: slow and fast weight for the base learner • Q, Q*: slow and fast weight for representational learning • Z: slow weight for m, a nn that generates fast weight W* • G: slow weight for m, a nn that generates fast weight Q* 40
  • 22. 41
  • 23. On support set memory ready now! N: size of support set 42
  • 24. On training set W, Q, Z, G L: size of support set R: index of memory 43
  • 25. One-shot Learning exp. • Omniglot previous split • Omniglot standard split. • Mini-ImageNet 44
  • 26. • Omniglot previous split • Omniglot standard split. • Mini-ImageNet One-shot Learning exp. 45
  • 27. Omniglot Previous Split • following matching network exp. settings • 1200 training classes, 423 testing classes, 20 examples per class • three variations of MetaNet • MetaNet+: additional task-level weight for base learner • MetaNet • MetaNet-: not task-level fast weight W* for meta-learner 46
  • 29. • Omniglot previous split • Omniglot standard split • Mini-ImageNet One-shot Learning exp. 48
  • 30. Omniglot stardard split • 1200 -> 964 training classes, 423 -> 659 testing classes 49
  • 32. One-shot Learning exp. • Omniglot previous split • Omniglot standard split. • Mini-ImageNet 51
  • 33. Mini-ImageNet • 64 training classes, 20 testing classes • 600 examples per class 52
  • 35. Generalization Experiment • N-way training and K-way testing • Rapid Parameterization of Fixed Weight • Meta-level continual learning 54
  • 36. Generalization Experiment • N-way training and K-way testing • Rapid Parameterization of Fixed Weight • Meta-level continual learning 55
  • 37. N-way training and K-way testing 56
  • 38. Generalization Experiment • N-way training and K-way testing • Rapid Parameterization of Fixed Weight • Meta-level continual learning 57
  • 39. Generalization Experiment • replace the base learner with a new CNN during evaluation. • the fast weight of the new CNN is generated by the meta learner that is trained to paramerterize the old based leaner(target CNN) 58
  • 40. Rapid Parameterization of Fixed Weight • small: 32 filters • target: 64 filters • big: 128 filters 59
  • 41. Generalization Experiment • N-way training and K-way testing • Rapid Parameterization of Fixed Weight • Meta-level continual learning 60
  • 42. Meta-level continual learning • train and test on Omniglot -> train on MNIST -> train and test on Omniglot again 61
  • 43. Meta-level continual learning acc difference = after - before(on Omniglot) 62
  • 44. Conclusion • pro: • Interesting model, slow and fast weight have different functions • Solid experiment • con: • It’s kinda hard to read 63
  • 45. Future Work • other meta information other than gradient • detect task/domain automatically 64

Editor's Notes

  1. @author
  2. The goal of a meta-level learner is to acquire generic knowledge of different tasks. The knowledge can then be transferred to the base-level learner to provide generalization in the context of a single task. —————- The base and meta-level models can be framed in a single learner (Schmidhuber, 1987) (??) or in separate learners (Bengio et al., 1990; Hochreiter et al., 2001).
  3. It must learn to hold data samples in memory until the appropriate labels are presented at the next time-step, after which sample-class information can be bound and stored for later use @relation to this work @TODO put more related work
  4. we propose an LSTM- based meta-learner model to learn the exact optimization algorithm used to train another learner neural network classifier in the few-shot regime. They split the network to meta-train and meta-test. during meta-train phase, they train the meta-learner to optimise the optimizer @relation to this work
  5. They split the network to meta-train and meta-test. during meta-train phase, they train the meta-learner to optimise the optimizer @relation to this work
  6. when considering each dataset D ∈ D_meta−train, the training objective(for meta learner?) we use is the loss L_test of the produced classifier on D’s test set D_test. While iterating over the examples in D’s training set D_train, at each time step t the LSTM meta-learner receives (∇_θt−1 Lt , Lt ) from the learner (the classifier) and proposes the new set of parameters θt . The process repeats for T steps, after which the classifier and its final parameters are evaluated on the test set to produce the loss that is then used to train the meta-learner. M? @TODO shortly explain
  7. 非常重要
  8. working memory -> short term nunnery -> long term memory?
  9. @TODO 例子?
  10. fast weight A clears the states by the contextual overlay
  11. intermediate steps between the recurrent computation fast weight clears things(from the state?) up from the contextual overlay, storing the useful history information of **this sequence**
  12. instead of using iterative learning backdrop one shot learning borrowing from hop field network fast weight is retrieving information from the hop field network youtube video 06:30
  13. It was downsampled to 48 ⇥ 48 greyscale. The full dataset contains 15 views since facial expressions were not discernible from the more extreme viewpoints. The resulting dataset contained > 100, 000 images. 317 identities appeared in the training set with the remaining 20 identities in the test set. Given the input face image, the goal is to classify the subject’s facial expression into one of the six different categories: neutral, smile, surprise, squint, disgust and scream. - Not only does the dataset have unbalanced numbers of labels, some of the expressions, for example squint and disgust, are are very hard to dis- tinguish. In order to perform well on this task, the models need to generalize over different lighting conditions and viewpoints.
  14. where do we store the information when examining the eyebrow in convnet: in fast weght: stack like mechanism
  15. @TODO biological feasible? The term in square brackets is just the scalar product of an earlier hidden state vector, h(⌧ ), with the current hidden state vector, hs(t+1), during the iterative inner loop. So at each iteration of the inner loop, the fast weight matrix is exactly equivalent to attending to past hidden vectors in proportion to their scalar product with the current hidden vector, weighted by a decay factor. During the inner loop iterations, attention will become more focussed on past hidden states that manage to attract the current hidden state. -> attract 是長得像的意思嗎?
  16. hopeful net is not that efficient can only store log n of pattern compare with associative LSTM (train with backprop) -> use the hop field net more efficiently
  17. @TODO is this page necessary?
  18. There are three major components of this model, the memory, the meta-learner and the base learner the meta learner is used to produce the fast weight for the base learner when each training example comes in the meta learner will make use of the support set to set up its own per task fast weight Q* in lots of few shot learning paper, learns a network that maps a small labelled support set and an unlabelled example to its label in this setting, they use only one example in support set so it’s cheap to obtain
  19. Let’s take a look at the base learner here
  20. the difference between normal neural network and this base learner is that it has fast weight
  21. we will see how it make use of the support set to give meta-information to the meta learner The base learner uses a representation of meta information obtained by using a support set, to provide the meta learner with feedbacks about the new input task. 有一撇是support set Here Li is the loss for support examples {x0i,yi0}Ni=1. N is the number of support examples in the task set (typically a single instance per class in the one-shot learning setup). delta_i is the loss gradient with respect to parameters W and is our meta information. Note that the loss function loss_task is generic and can take any form, such as a cumulative reward in reinforcement learning. For our one-shot classification setup we use cross-entropy loss. The meta learner takes in the gradient information nabla_i and generates the fast parameters W’ as in Equation 1 (and store it in the memoery). - Alternatively the base learner can take as input the task specific representations {ri}Li=1 produced by the dynamic representation learning network, effectively reducing the number of MetaNet parameters and leveraging shared representations. In this case, the base learner is forced to operate in the dynamic task space constructed by u instead of building new representations from the raw inputs {xi}Li=1
  22. Now we can take a look at the meta learner, we can see how it receives the meta info, learns a good representation function to embed the example. and store the fast weight in the memory for future training use. @TODO: Question: memory是會一直累加嗎?每次new task會把舊的洗掉嗎?
  23. This is not the final fast weight delta i is from the support set 接下來講怎麼產生Q* The representation learning function u is a neural net parameterized by slow weights Q and task-level fast weights Q⇤. It uses the representation loss lossemb to capture a representation learning objective and to obtain the gradients as meta information. We generate the fast weights Q⇤ on a per task basis as follows:
  24. @external This is not the final fast weight delta i is from the support set The fast weights are then stored in a memory M = {Wi⇤ }Ni=1 . The memory M is indexed with task dependent embeddings R = {ri0 }Ni=1 of the support examples {x0i }Ni=1 , obtained by the dynamic representation learning function u.
  25. these are the fast weight generated from the support set with the representation of the supper set examples as the index @不同task的fast weight會一直累加?
  26. @hard to understand d denotes a neural net parameterized by G, that accepts variable sized input. First, we sample T examples (T < N ) {x0i , yi0 }Ti=1 from the support set and obtain the loss gradient as meta information. Then d observes the gradient corresponding to each sampled example and summarizes into the task specific parameters
  27. the way it computes Q* , connects the knowledge of Q* and Q
  28. attention: cosine similarity norm: softmax Q and W will learn gradually across task Q* focus on task W* focus on example
  29. @TODO double check Intuitively, the fast and slow weights in the layer augmented neural net can be seen as feature detectors operating in two distinct numeric domains. The application of the non-linearity maps them into the same domain, which is [0, 1) in the case of ReLU so that the activations can be aggregated and processed further. Our aggregation func- tion here is element-wise sum.
  30. network architectue Omniglot we used a CNN with 64 filters as the base learner b. This CNN has 5 convolutional layers, each of which is a 3 x 3 convolution with 64 filters, followed by a ReLU non-linearity, a 2 x 2 max-pooling layer, a fully connected (FC) layer, and a softmax layer. Another CNN with the same architecture is used to define the dynamic representation learning function u, from which we take the output of the FC layer as the task dependent representation r. We trained a similar CNNs architecture with 32 filters for the experiment on Mini-ImageNet. However for computational efficiency as well as to demonstrate the flexibility of MetaNet, the last three layers of these CNN models were augmented by fast weights. For the networks d and m, we used a single-layer LSTM with 20 hidden units and a three-layer MLP with 20 hidden units and ReLU non-linearity. As in Andrychowicz et al. (2016), the parameters G and Z of d and m are shared across the coordinates of the gradients ∇ and the gradients are normalized using the same preprocessing rule (with p = 7). The MetaNet parameters θ are optimized with ADAM. The initial learning rate was set to 10−3. The model parameters θ were randomly initialized from the uniform distribution over [-0.1, 0.1).
  31. train W, Q, Z, G i from 1 to N: support set
  32. those with star are fast weight
  33. with respect to W @put images here @why sometimes sample, sometimes not support set those with star are fast weight
  34. those with star are fast weight
  35. @TODO MetaNet+ details three variations of MetaNet as ablation exp.
  36. @TODO MetaNet+ details @TODO read two papers Q* is useful additional task level weight is not that helpful
  37. @TODO MetaNet+ details three variations of MetaNet as ablation exp.
  38. @TODO MetaNet+ details three variations of MetaNet as ablation exp.
  39. @TODO MetaNet+ details three variations of MetaNet as ablation exp.
  40. train < test, decrease test > train, increase - how to augment the softmax?
  41. @TODO MetaNet+ details three variations of MetaNet as ablation exp.
  42. We replaced the entire base learner with a new CNN during evaluation. The slow weights of this network remained fixed. The fast weights are generated by the meta learner that is trained to parameterize the old base learner and used to augmented the fixed slow weights. The small CNN has 32 filters and the large CNN has 128 filters. target: 64
  43. @TODO during evaluation? based on support set -> predict big(orange) and small CNN (blue) are new base learner target CNN (filter # 64) is the original base learner that is learned along The performance difference between these models is large in earlier training iterations. However, as the meta learner sees more one-shot learning trials, the test ac- curacies of the base learners converge. This results show that MetaNet effectively learns to parameterize a neural net with fixed weights.
  44. **500 classes MNIST**, acc: 72% after 2400 MNIST trials
  45. **500 classes MNIST**, acc: 72% after 2400 MNIST trials reverse transfer learning train on omniglot -> mnist The positive values indicate that the training on the second problem automatically improves the performance of the earlier task exhibiting the reverse transfer property. Therefore, we can conclude that MetaNet successfully performs reverse transfer. At the same time, it is skilled on MNIST one-shot classification. The MNIST training accuracy reaches over 72% after 2400 MNIST trials. However, reverse transfer happens only up to a certain point in MNIST training (2400 trials). After that, the meta weights start to forget the Omniglot information. As a result from 2800 trials onwards, the Omniglot test accuracy drops. Nevertheless even after 7600 MNIST trials, at which point the MNIST training ac- curacy reached over 90%, the Omniglot performance drop was only 1.7%.
  46. ablation study on all components?