1) The Meta Network model proposes a two-level learning approach for few-shot learning. It includes a slow-learning meta-learner and a fast-learning base learner.
2) The meta-learner learns to generate fast weights for the base learner using gradient-based meta information from previous tasks. It stores these weights in a memory indexed by task embeddings.
3) Experiments on few-shot classification datasets like Omniglot and MiniImageNet demonstrate the model can learn new concepts from very few examples through fast adaptation of the base learner's weights.
Optimization as a model for few shot learningKaty Lee
paper presentation of "Optimization as a model for few shot learning" at ICLR 2017 by Sachin Ravi and Hugo Larochelle
highly related to "learning to learn by gradient descent by gradient descent"
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksTaesu Kim
Paper review: "Model-Agnostic Meta-learning for fast adaptation of deep networks" by C. Finn et al (ICML2017)
Presented at Tensorflow-KR paper review forum (#PR12) by Taesu Kim
Paper link: https://arxiv.org/abs/1703.03400
Video link: https://youtu.be/fxJXXKZb-ik (in Korean)
http://www.neosapience.com
Optimization as a model for few shot learningKaty Lee
paper presentation of "Optimization as a model for few shot learning" at ICLR 2017 by Sachin Ravi and Hugo Larochelle
highly related to "learning to learn by gradient descent by gradient descent"
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksTaesu Kim
Paper review: "Model-Agnostic Meta-learning for fast adaptation of deep networks" by C. Finn et al (ICML2017)
Presented at Tensorflow-KR paper review forum (#PR12) by Taesu Kim
Paper link: https://arxiv.org/abs/1703.03400
Video link: https://youtu.be/fxJXXKZb-ik (in Korean)
http://www.neosapience.com
The following topics we cover in the event..
1.Introduction of neural networks
(What,Why and How)
2.Types of neural networks
(For different types of problems)
3.Neural networks Algorithms explanation (Forward and Back propagation)
4.Demo of neural networks
(Image classification like bird , aeroplane,person and etc...)
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
Training of Deep neural network is difficult task. Deep neural network train with the help of training algorithms and activation function This is an overview of Activation Function and Training Algorithms used for Deep Neural Network. It underlines a brief comparative study of activation function and training algorithms.
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
Slides for talk at PyData Seattle 2017 about Matthew Honnibal's 4-step recipe for Deep Learning NLP pipelines. Description of the stages in pipeline as well as 3 examples of document classification, document similarity and sentence similarity. Examples include Keras custom layers for different types of attention.
This presentation provides an introduction to the artificial neural networks topic, its learning, network architecture, back propagation training algorithm, and its applications.
Upload photos Copy this Meetup
Things we will discuss are
1.Introduction of Machine learning and deep learning.
2.Applications of ML and DL.
3.Various learning algorithms of ML and DL.
4.Quick introduction of open source solutions for all programming languages.
5.Finally A broad picture of what you can do with Deep learning to this tech world.
○ 개요
현재 많은 연구자들이 network를 깊고 넓게 설계함으로써 높은 인식률을 갖는 네트워크를 얻고 있다. Network의 크기가 증가하면서 parameter와 computation의 수가 증가하게 되었고, 이러한 문제를 해결하기 위하여 pruning을 기반으로 한 압축 알고리즘들이 제안되어 왔다. 하지만 이러한 방법을 이용하여서는 network architecture자체를 바꿀 수 없기 때문에, 구조에서 오는 한계점들은 해결할 수 없었다.
Network recasting은 구조의 특성으로 인하여 발생하는 한계들을 해결하기 위하여 network architecture 자체를 바꾸는 방법이다. Network recasting을 이용하면 network를 구성하고있는 block들을 다른 형태의 block으로 변환을 할 수 있게 된다. Block-wise recasting 방법을 사용하여 각 block들을 변환할 수 있고, 해당 방법을 연속하여 적용함으로써 전체 network의 구조를 바꿀 수 있다. Sequential recasting 방법을 이용하게 되면 inference accuracy를 더욱 잘 보존할 수 있고, 또한 network architecture에 상관 없이 vanishing gradient problem을 완화 시킬 수 있다. Network recasting을 같은 network architecture에 적용하게 되면 parameter와 computation을 줄이는 효과를 얻을 수 있고, 다른 종류의 network architecture로 변환하게 되면 network를 가속시킬 수 있다. 이러한 경우에는 network architecture 자체를 변경할 수 있기 때문에 구조적 한계보다 더 높은 속도 향상을 얻을 수 있다.
The following topics we cover in the event..
1.Introduction of neural networks
(What,Why and How)
2.Types of neural networks
(For different types of problems)
3.Neural networks Algorithms explanation (Forward and Back propagation)
4.Demo of neural networks
(Image classification like bird , aeroplane,person and etc...)
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
Training of Deep neural network is difficult task. Deep neural network train with the help of training algorithms and activation function This is an overview of Activation Function and Training Algorithms used for Deep Neural Network. It underlines a brief comparative study of activation function and training algorithms.
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Sujit Pal
Slides for talk at PyData Seattle 2017 about Matthew Honnibal's 4-step recipe for Deep Learning NLP pipelines. Description of the stages in pipeline as well as 3 examples of document classification, document similarity and sentence similarity. Examples include Keras custom layers for different types of attention.
This presentation provides an introduction to the artificial neural networks topic, its learning, network architecture, back propagation training algorithm, and its applications.
Upload photos Copy this Meetup
Things we will discuss are
1.Introduction of Machine learning and deep learning.
2.Applications of ML and DL.
3.Various learning algorithms of ML and DL.
4.Quick introduction of open source solutions for all programming languages.
5.Finally A broad picture of what you can do with Deep learning to this tech world.
○ 개요
현재 많은 연구자들이 network를 깊고 넓게 설계함으로써 높은 인식률을 갖는 네트워크를 얻고 있다. Network의 크기가 증가하면서 parameter와 computation의 수가 증가하게 되었고, 이러한 문제를 해결하기 위하여 pruning을 기반으로 한 압축 알고리즘들이 제안되어 왔다. 하지만 이러한 방법을 이용하여서는 network architecture자체를 바꿀 수 없기 때문에, 구조에서 오는 한계점들은 해결할 수 없었다.
Network recasting은 구조의 특성으로 인하여 발생하는 한계들을 해결하기 위하여 network architecture 자체를 바꾸는 방법이다. Network recasting을 이용하면 network를 구성하고있는 block들을 다른 형태의 block으로 변환을 할 수 있게 된다. Block-wise recasting 방법을 사용하여 각 block들을 변환할 수 있고, 해당 방법을 연속하여 적용함으로써 전체 network의 구조를 바꿀 수 있다. Sequential recasting 방법을 이용하게 되면 inference accuracy를 더욱 잘 보존할 수 있고, 또한 network architecture에 상관 없이 vanishing gradient problem을 완화 시킬 수 있다. Network recasting을 같은 network architecture에 적용하게 되면 parameter와 computation을 줄이는 효과를 얻을 수 있고, 다른 종류의 network architecture로 변환하게 되면 network를 가속시킬 수 있다. 이러한 경우에는 network architecture 자체를 변경할 수 있기 때문에 구조적 한계보다 더 높은 속도 향상을 얻을 수 있다.
Knowledge is constantly revised (evolves) as new pieces of information is made available over time. We term it “knowledge augmentation”. Hence it is feasible to achieve knowledge augmentation via incremental learning.
Keras is a high level framework that runs on top of AI library such as Tensorflow, Theano, or CNTK. The key feature of Keras is that it allow to switch out the underlying library without performing any code changes. Keras contains commonly used neural-network building blocks such as layers, optimizer, activation functions etc and keras has support for convolutional and recurrent neural networks. In addition keras contains datasets and some pre-trained deep learnig applications that make it easier to learn for beginners. Essentially Keras is democrasting deep learning by reducing barrier into deep learning.
Two strategies for large-scale multi-label classification on the YouTube-8M d...Dalei Li
The project to participate in the Kaggle YouTube-8M video understanding competition. Four algorithms that can be run on a single machine are implemented, namely, multi-label k-nearest neighbor, multi-label radial basis function network (one-vs-rest), and multi-label logistic regression and on-vs-rest multi-layer neural network.
Neural Nets from Scratch
This contains the slides (without animations) for a talk I gave on the mathematical foundations of neural nets, and how one would code a neural net from scratch in a way that made the math work out, using Python.
Remote Laboratory VISIR - Re-design of a MOOC RLMS based in MoodleManuel Castro
Presentation of the redesign of a VISIR MOOC devoted to increase the collaborative activities as well as the final competences and activities of the students using VISIR fo practical competences in electronics
Learn End-to-End Learning, Multi-Task Learning, Transfer Learning and Meta Learning. This was one of the lectures of a full course I taught in University of Moratuwa, Sri Lanka on 2024 first half of the year.
Interest in Deep Learning has been growing in the past few years. With advances in software and hardware technologies, Neural Networks are making a resurgence. With interest in AI based applications growing, and companies like IBM, Google, Microsoft, NVidia investing heavily in computing and software applications, it is time to understand Deep Learning better!
In this lecture, we will get an introduction to Autoencoders and Recurrent Neural Networks and understand the state-of-the-art in hardware and software architectures. Functional Demos will be presented in Keras, a popular Python package with a backend in Theano. This will be a preview of the QuantUniversity Deep Learning Workshop that will be offered in 2017.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
ICML 2017 Meta network
1. Meta Network
ICML 2017 citation: 0
Tsendsuren Munkhdalai
Hong Yu
University of Massachusetts, MA, USA
Katy Lee @ Datalab 2017.09.11
1
2. Background
• two level learning on meta learning:
• slow learning of a meta-level model preforming
across tasks
• rapid learning of a base-level model acting with
each task
2
3. Motivation
• We want the neural network to learn and to
generalize a new task or concept from a single
example on the fly.
3
4. Related Work
• Santoro et al., Meta-learning with memory-
augmented neural networks. ICML 2016. -> use
the external memory as temporary memory
4
5. • Ravi, Sachin and Larochell, Hugo. Optimization as
a model for few-shot learning. ICLR 2017
5
6. • Ravi, Sachin and Larochell, Hugo. Optimization as
a model for few-shot learning. ICLR 2017
6
7. • Ravi, Sachin and Larochell, Hugo. Optimization as
a model for few-shot learning. ICLR 2017
7
10. Base Learner b:
overlook
• Unlike standard neural nets, b is parameterized by
slow weights W and example-level fast weights W*
• slow weights: updated via a learning algorithm
during training
• fast weights: generated by the meta learner for
every input.
29
11. How Base Learner b
provides meta info
• X’ belongs to support set
• The meta information is derived from the base
learner in form of the loss gradient information:
30
13. Meta Learner
• function: collects meta-info. and produce per example
fast weight W* for the base learner when training
• fast weight W* generating function m with parameter Z
• fast weight Q* generating function d with parameter G
• a dynamic representation learning function u (Q, Q*)
32
14. • m learns the mapping from the loss gradient to the
fast weights for the support set(not the final ones
output to the base learner)
• we store the fast weights in a memory M which is
indexed with task dependent embeddings R
= {ri}i=1 to n of the support examples, obtained by u
Meta Learner: m
33
16. Meta Learner:u
• a dynamic representation learning network u (Q,
Q*)
• We generate the fast weights Q* on a per task basis
as follows:
35
17. Meta Learner
• Once the fast weights are generated, the task
dependent supper set input representations are
computed as:
• r’i =u(Q,Q*,x’i)
• Q and Q* are integrated using the layer
augmentation method
• => Memory is ready!
36
18. Meta Learner
• After the fast weights Wi* are stored in the memory M and
the index R is constructed
• given input xi in the training set/test set
1. embed xi using the dynamic representation u, ri
=u(Q,Q*,xi)
2. reads the memory with soft attention
37
20. Model
W W*
u (Q, Q*)
m(Z) d(G)
index R
b
5 conv with 64 filter
maxpool
2 fc
u
5 conv with 64 filter
maxpool
2 fc
d, m
3 fc with 20
neurons
39
21. • W, W*: slow and fast weight for the base learner
• Q, Q*: slow and fast weight for representational
learning
• Z: slow weight for m, a nn that generates fast
weight W*
• G: slow weight for m, a nn that generates fast
weight Q*
40
27. Omniglot Previous Split
• following matching network exp. settings
• 1200 training classes, 423 testing classes, 20 examples
per class
• three variations of MetaNet
• MetaNet+: additional task-level weight for base learner
• MetaNet
• MetaNet-: not task-level fast weight W* for meta-learner
46
38. Generalization Experiment
• N-way training and K-way testing
• Rapid Parameterization of Fixed Weight
• Meta-level continual learning
57
39. Generalization Experiment
• replace the base learner with a new CNN during
evaluation.
• the fast weight of the new CNN is generated by the
meta learner that is trained to paramerterize the old
based leaner(target CNN)
58
44. Conclusion
• pro:
• Interesting model, slow and fast weight have
different functions
• Solid experiment
• con:
• It’s kinda hard to read
63
45. Future Work
• other meta information other than gradient
• detect task/domain automatically
64
Editor's Notes
@author
The goal of a meta-level learner is to acquire generic knowledge of different tasks. The knowledge can then be transferred to the base-level learner to provide generalization in the context of a single task.
—————-
The base and meta-level models can be framed in a single learner (Schmidhuber, 1987) (??)
or in separate learners (Bengio et al., 1990; Hochreiter et al., 2001).
It must learn to hold data samples in memory until the appropriate labels are presented at the next time-step, after which sample-class information can be bound and stored for later use
@relation to this work
@TODO put more related work
we propose an LSTM- based meta-learner model to learn the exact optimization algorithm used to train another learner neural network classifier in the few-shot regime.
They split the network to meta-train and meta-test. during meta-train phase, they train the meta-learner to optimise the optimizer
@relation to this work
They split the network to meta-train and meta-test. during meta-train phase, they train the meta-learner to optimise the optimizer
@relation to this work
when considering each dataset D ∈ D_meta−train, the training objective(for meta learner?) we use is the loss L_test of the produced classifier on D’s test set D_test. While iterating over the examples in D’s training set D_train, at each time step t the LSTM meta-learner receives (∇_θt−1 Lt , Lt ) from the learner (the classifier) and proposes the new set of parameters θt . The process repeats for T steps, after which the classifier and its final parameters are evaluated on the test set to produce the loss that is then used to train the meta-learner.
M?
@TODO shortly explain
非常重要
working memory -> short term nunnery -> long term memory?
@TODO 例子?
fast weight A clears the states by the contextual overlay
intermediate steps between the recurrent computation
fast weight clears things(from the state?) up from the contextual overlay, storing the useful history information of **this sequence**
instead of using iterative learning backdrop
one shot learning borrowing from hop field network
fast weight is retrieving information from the hop field network
youtube video 06:30
It was downsampled to 48 ⇥ 48 greyscale. The full dataset contains 15
views since facial expressions were not discernible from the more extreme viewpoints. The resulting dataset contained > 100, 000 images. 317 identities appeared in the training set with the remaining 20 identities in the test set.
Given the input face image, the goal is to classify the subject’s facial expression into one of the six different categories: neutral, smile, surprise, squint, disgust and scream.
- Not only does the dataset have unbalanced numbers of labels, some of the expressions, for example squint and disgust, are are very hard to dis- tinguish. In order to perform well on this task, the models need to generalize over different lighting conditions and viewpoints.
where do we store the information when examining the eyebrow
in convnet:
in fast weght: stack like mechanism
@TODO biological feasible?
The term in square brackets is just the scalar product of an earlier hidden state vector, h(⌧ ), with the current hidden state vector, hs(t+1), during the iterative inner loop. So at each iteration of the inner loop, the fast weight matrix is exactly equivalent to attending to past hidden vectors in proportion to their scalar product with the current hidden vector, weighted by a decay factor. During the inner loop iterations, attention will become more focussed on past hidden states that manage to attract the current hidden state.
-> attract 是長得像的意思嗎?
hopeful net is not that efficient can only store log n of pattern
compare with associative LSTM
(train with backprop) -> use the hop field net more efficiently
@TODO is this page necessary?
There are three major components of this model, the memory, the meta-learner and the base learner
the meta learner is used to produce the fast weight for the base learner when each training example comes in
the meta learner will make use of the support set to set up its own per task fast weight Q*
in lots of few shot learning paper, learns a network that maps a small labelled support set and an unlabelled example to its label
in this setting, they use only one example in support set so it’s cheap to obtain
Let’s take a look at the base learner here
the difference between normal neural network and this base learner is that it has fast weight
we will see how it make use of the support set to give meta-information to the meta learner
The base learner uses a representation of meta information obtained by using a support set, to provide the meta learner with feedbacks about the new input task.
有一撇是support set
Here Li is the loss for support examples {x0i,yi0}Ni=1. N is the number of support examples in the task set (typically a single instance per class in the one-shot learning setup). delta_i is the loss gradient with respect to parameters W and is our meta information. Note that the loss function loss_task is generic and can take any form, such as a cumulative reward in reinforcement learning. For our one-shot classification setup we use cross-entropy loss.
The meta learner takes in the gradient information nabla_i and generates the fast parameters W’ as in Equation 1 (and store it in the memoery).
- Alternatively the base learner can take as input the task specific representations {ri}Li=1 produced by the dynamic representation learning network, effectively reducing the number of MetaNet parameters and leveraging shared representations. In this case, the base learner is forced to operate in the dynamic task space constructed by u instead of building new representations from the raw inputs {xi}Li=1
Now we can take a look at the meta learner, we can see how it receives the meta info, learns a good representation function to embed the example. and store the fast weight in the memory for future training use.
@TODO: Question: memory是會一直累加嗎?每次new task會把舊的洗掉嗎?
This is not the final fast weight
delta i is from the support set
接下來講怎麼產生Q*
The representation learning function u is a neural net parameterized by slow weights Q and task-level fast weights Q⇤. It uses the representation loss lossemb to capture a representation learning objective and to obtain the gradients as meta information. We generate the fast weights Q⇤ on a per task basis as follows:
@external
This is not the final fast weight
delta i is from the support set
The fast weights are then stored in a memory M = {Wi⇤ }Ni=1 . The memory M is indexed with task dependent embeddings R = {ri0 }Ni=1 of the support examples {x0i }Ni=1 , obtained by the dynamic representation learning function u.
these are the fast weight generated from the support set with the representation of the supper set examples as the index
@不同task的fast weight會一直累加?
@hard to understand
d denotes a neural net parameterized by G, that accepts variable sized input. First, we sample T examples (T < N ) {x0i , yi0 }Ti=1 from the support set and obtain the loss gradient as meta information. Then d observes the gradient corresponding to each sampled example and summarizes into the task specific parameters
the way it computes Q* , connects the knowledge of Q* and Q
attention: cosine similarity
norm: softmax
Q and W will learn gradually across task
Q* focus on task
W* focus on example
@TODO double check
Intuitively, the fast and slow weights in the layer augmented neural net can be seen as feature detectors operating in two distinct numeric domains. The application of the non-linearity maps them into the same domain, which is [0, 1) in the case of ReLU so that the activations can be aggregated and processed further. Our aggregation func- tion here is element-wise sum.
network architectue
Omniglot
we used a CNN with 64 filters as the base learner b. This CNN has 5 convolutional layers, each of which is a 3 x 3 convolution with 64 filters, followed by a ReLU non-linearity, a 2 x 2 max-pooling layer, a fully connected (FC) layer, and a softmax layer.
Another CNN with the same architecture is used to define the dynamic representation learning function u, from which we take the output of the FC layer as the task dependent representation r.
We trained a similar CNNs architecture with 32 filters for the experiment on Mini-ImageNet.
However for computational efficiency as well as to demonstrate the flexibility of MetaNet, the last three layers of these CNN models were augmented by fast weights.
For the networks d and m, we used a single-layer LSTM with 20 hidden units and a three-layer MLP with 20 hidden units and ReLU non-linearity. As in Andrychowicz et al. (2016), the parameters G and Z of d and m are shared across the coordinates of the gradients ∇ and the gradients are normalized using the same preprocessing rule (with p = 7). The MetaNet parameters θ are optimized with ADAM. The initial learning rate was set to 10−3. The model parameters θ were randomly initialized from the uniform distribution over [-0.1, 0.1).
train W, Q, Z, G
i from 1 to N: support set
those with star are fast weight
with respect to W
@put images here
@why sometimes sample, sometimes not
support set
those with star are fast weight
those with star are fast weight
@TODO MetaNet+ details
three variations of MetaNet as ablation exp.
@TODO MetaNet+ details
@TODO read two papers
Q* is useful
additional task level weight is not that helpful
@TODO MetaNet+ details
three variations of MetaNet as ablation exp.
@TODO MetaNet+ details
three variations of MetaNet as ablation exp.
@TODO MetaNet+ details
three variations of MetaNet as ablation exp.
train < test, decrease
test > train, increase
- how to augment the softmax?
@TODO MetaNet+ details
three variations of MetaNet as ablation exp.
We replaced the entire base learner with a new CNN during evaluation. The slow weights of this network remained fixed. The fast weights are generated by the meta learner that is trained to parameterize the old base learner and used to augmented the fixed slow weights.
The small CNN has 32 filters and the large CNN has 128 filters.
target: 64
@TODO during evaluation? based on support set -> predict
big(orange) and small CNN (blue) are new base learner
target CNN (filter # 64) is the original base learner that is learned along
The performance difference between these models is large in earlier training iterations. However, as the meta learner sees more one-shot learning trials, the test ac- curacies of the base learners converge. This results show that MetaNet effectively learns to parameterize a neural net with fixed weights.
**500 classes MNIST**, acc: 72% after 2400 MNIST trials
**500 classes MNIST**, acc: 72% after 2400 MNIST trials
reverse transfer learning
train on omniglot -> mnist
The positive values indicate that the training on the second problem automatically improves the performance of the earlier task exhibiting the reverse transfer property. Therefore, we can conclude that MetaNet successfully performs reverse transfer. At the same time, it is skilled on MNIST one-shot classification. The MNIST training accuracy reaches over 72% after 2400 MNIST trials.
However, reverse transfer happens only up to a certain point in MNIST training (2400 trials). After that, the meta weights start to forget the Omniglot information. As a result from 2800 trials onwards, the Omniglot test accuracy drops. Nevertheless even after 7600 MNIST trials, at which point the MNIST training ac- curacy reached over 90%, the Omniglot performance drop was only 1.7%.