The document describes Model-Agnostic Meta-Learning (MAML), an algorithm for fast adaptation of neural networks to new tasks. MAML learns model parameters that can quickly be fine-tuned to new tasks using only a small number of gradient steps. The meta-learner optimizes the model's initialization such that a single gradient update on new tasks minimizes loss. MAML is model-agnostic, requiring no specific architecture, and can be used for classification, regression and reinforcement learning tasks.
Meta-learning, or learning how to learn, is our innate ability to learn new, ever more complex tasks very efficiently by building on prior experience. It is a very exciting direction for machine learning (and AI in general). In this tutorial, I introduce the main concepts and state of the art.
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksTaesu Kim
Paper review: "Model-Agnostic Meta-learning for fast adaptation of deep networks" by C. Finn et al (ICML2017)
Presented at Tensorflow-KR paper review forum (#PR12) by Taesu Kim
Paper link: https://arxiv.org/abs/1703.03400
Video link: https://youtu.be/fxJXXKZb-ik (in Korean)
http://www.neosapience.com
Meta-learning, or learning how to learn, is our innate ability to learn new, ever more complex tasks very efficiently by building on prior experience. It is a very exciting direction for machine learning (and AI in general). In this tutorial, I introduce the main concepts and state of the art.
PR12-094: Model-Agnostic Meta-Learning for fast adaptation of deep networksTaesu Kim
Paper review: "Model-Agnostic Meta-learning for fast adaptation of deep networks" by C. Finn et al (ICML2017)
Presented at Tensorflow-KR paper review forum (#PR12) by Taesu Kim
Paper link: https://arxiv.org/abs/1703.03400
Video link: https://youtu.be/fxJXXKZb-ik (in Korean)
http://www.neosapience.com
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Continual Learning with Deep Architectures - Tutorial ICML 2021Vincenzo Lomonaco
Humans have the extraordinary ability to learn continually from experience. Not only we can apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of Artificial Intelligence (AI) is building an artificial “continual learning” agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex knowledge and skills (Parisi, 2019). However, despite early speculations and few pioneering works (Ring, 1998; Thrun, 1998; Carlson, 2010), very little research and effort has been devoted to address this vision. Current AI systems greatly suffer from the exposure to new data or environments which even slightly differ from the ones for which they have been trained for (Goodfellow, 2013). Moreover, the learning process is usually constrained on fixed datasets within narrow and isolated tasks which may hardly lead to the emergence of more complex and autonomous intelligent behaviors. In essence, continual learning and adaptation capabilities, while more than often thought as fundamental pillars of every intelligent agent, have been mostly left out of the main AI research focus.
In this tutorial, we propose to summarize the application of these ideas in light of the more recent advances in machine learning research and in the context of deep architectures for AI (Lomonaco, 2019). Starting from a motivation and a brief history, we link recent Continual Learning advances to previous research endeavours on related topics and we summarize the state-of-the-art in terms of major approaches, benchmarks and key results. In the second part of the tutorial we plan to cover more exploratory studies about Continual Learning with low supervised signals and the relationships with other paradigms such as Unsupervised, Semi-Supervised and Reinforcement Learning. We will also highlight the impact of recent Neuroscience discoveries in the design of original continual learning algorithms as well as their deployment in real-world applications. Finally, we will underline the notion of continual learning as a key technological enabler for Sustainable Machine Learning and its societal impact, as well as recap interesting research questions and directions worth addressing in the future.
Authors: Vincenzo Lomonaco, Irina Rish
Official Website: https://sites.google.com/view/cltutorial-icml2021
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
Training of Deep neural network is difficult task. Deep neural network train with the help of training algorithms and activation function This is an overview of Activation Function and Training Algorithms used for Deep Neural Network. It underlines a brief comparative study of activation function and training algorithms.
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
Computer Vision 분야에서 CNN은 과연 살아남을 수 있을까요?
안녕하세요 TensorFlow Korea 논문 읽기 모임 PR-12의 317번째 논문 리뷰입니다.
이번에는 Google Research, Brain Team의 MLP-Mixer: An all-MLP Architecture for Vision을 리뷰해보았습니다.
Attention의 공격도 버거운데 이번에는 MLP(Multi-Layer Perceptron)의 공격입니다.
MLP만을 사용해서 Image Classification을 하는데 성능도 좋고 속도도 빠르고....
구조를 간단히 소개해드리면 ViT(Vision Transformer)의 self-attention 부분을 MLP로 변경하였습니다.
MLP block 2개를 사용하여 하나는 patch(token)들 간의 연산을 하는데 사용하고, 하나는 patch 내부 연산을 하는데 사용합니다.
사실 MLP를 사용하긴 했지만 논문에도 언급되어 있듯이, 이 부분을 일종의 convolution이라고 볼 수 있는데요...
그래도 transformer 기반의 network이 가질 수밖에 없는 quadratic complexity를 linear로 낮춰주고
convolution의 inductive bias 거의 없이 아주아주 simple한 구조를 활용하여 이렇게 좋은 성능을 보여준 점이 멋집니다.
반면에 역시나 data를 많이 써야 한다거나, MLP의 한계인 fixed length의 input만 받을 수 있다는 점은 단점이라고 생각하는데요,
이 연구를 시작으로 MLP도 다시한번 조명받는 계기가 되면 좋을 것 같네요
비슷한 시점에 나온 비슷한 연구들도 마지막에 간략하게 소개하였습니다.
재미있게 봐주세요. 감사합니다!
논문링크: https://arxiv.org/abs/2105.01601
영상링크: https://youtu.be/KQmZlxdnnuY
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
Review of Do and Batzoglou. "What is the expectation maximization algorith?" Nat. Biotechnol. 2008;26:897. Also covers the Data Augmentation and Stan implementation. Resources at https://github.com/kaz-yos/em_da_repo
Optimization as a model for few shot learningKaty Lee
paper presentation of "Optimization as a model for few shot learning" at ICLR 2017 by Sachin Ravi and Hugo Larochelle
highly related to "learning to learn by gradient descent by gradient descent"
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
Continual Learning with Deep Architectures - Tutorial ICML 2021Vincenzo Lomonaco
Humans have the extraordinary ability to learn continually from experience. Not only we can apply previously learned knowledge and skills to new situations, we can also use these as the foundation for later learning. One of the grand goals of Artificial Intelligence (AI) is building an artificial “continual learning” agent that constructs a sophisticated understanding of the world from its own experience through the autonomous incremental development of ever more complex knowledge and skills (Parisi, 2019). However, despite early speculations and few pioneering works (Ring, 1998; Thrun, 1998; Carlson, 2010), very little research and effort has been devoted to address this vision. Current AI systems greatly suffer from the exposure to new data or environments which even slightly differ from the ones for which they have been trained for (Goodfellow, 2013). Moreover, the learning process is usually constrained on fixed datasets within narrow and isolated tasks which may hardly lead to the emergence of more complex and autonomous intelligent behaviors. In essence, continual learning and adaptation capabilities, while more than often thought as fundamental pillars of every intelligent agent, have been mostly left out of the main AI research focus.
In this tutorial, we propose to summarize the application of these ideas in light of the more recent advances in machine learning research and in the context of deep architectures for AI (Lomonaco, 2019). Starting from a motivation and a brief history, we link recent Continual Learning advances to previous research endeavours on related topics and we summarize the state-of-the-art in terms of major approaches, benchmarks and key results. In the second part of the tutorial we plan to cover more exploratory studies about Continual Learning with low supervised signals and the relationships with other paradigms such as Unsupervised, Semi-Supervised and Reinforcement Learning. We will also highlight the impact of recent Neuroscience discoveries in the design of original continual learning algorithms as well as their deployment in real-world applications. Finally, we will underline the notion of continual learning as a key technological enabler for Sustainable Machine Learning and its societal impact, as well as recap interesting research questions and directions worth addressing in the future.
Authors: Vincenzo Lomonaco, Irina Rish
Official Website: https://sites.google.com/view/cltutorial-icml2021
Activation functions and Training Algorithms for Deep Neural networkGayatri Khanvilkar
Training of Deep neural network is difficult task. Deep neural network train with the help of training algorithms and activation function This is an overview of Activation Function and Training Algorithms used for Deep Neural Network. It underlines a brief comparative study of activation function and training algorithms.
Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco
In the last decade, Convolutional Neural Networks (CNNs) have shown to perform incredibly well in many computer vision tasks such as object recognition and object detection, being able to extract meaningful high-level invariant features. However, partly because of their complex training and tricky hyper-parameters tuning, CNNs have been scarcely studied in the context of incremental learning where data are available in consecutive batches and retraining the model from scratch is unfeasible. In this work we compare different incremental learning strategies for CNN based architectures, targeting real-word applications.
If you are interested in this work please cite:
Lomonaco, V., & Maltoni, D. (2016, September). Comparing Incremental Learning Strategies for Convolutional Neural Networks. In IAPR Workshop on Artificial Neural Networks in Pattern Recognition (pp. 175-184). Springer International Publishing.
For further information visit my website: http://www.vincenzolomonaco.com/
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
Computer Vision 분야에서 CNN은 과연 살아남을 수 있을까요?
안녕하세요 TensorFlow Korea 논문 읽기 모임 PR-12의 317번째 논문 리뷰입니다.
이번에는 Google Research, Brain Team의 MLP-Mixer: An all-MLP Architecture for Vision을 리뷰해보았습니다.
Attention의 공격도 버거운데 이번에는 MLP(Multi-Layer Perceptron)의 공격입니다.
MLP만을 사용해서 Image Classification을 하는데 성능도 좋고 속도도 빠르고....
구조를 간단히 소개해드리면 ViT(Vision Transformer)의 self-attention 부분을 MLP로 변경하였습니다.
MLP block 2개를 사용하여 하나는 patch(token)들 간의 연산을 하는데 사용하고, 하나는 patch 내부 연산을 하는데 사용합니다.
사실 MLP를 사용하긴 했지만 논문에도 언급되어 있듯이, 이 부분을 일종의 convolution이라고 볼 수 있는데요...
그래도 transformer 기반의 network이 가질 수밖에 없는 quadratic complexity를 linear로 낮춰주고
convolution의 inductive bias 거의 없이 아주아주 simple한 구조를 활용하여 이렇게 좋은 성능을 보여준 점이 멋집니다.
반면에 역시나 data를 많이 써야 한다거나, MLP의 한계인 fixed length의 input만 받을 수 있다는 점은 단점이라고 생각하는데요,
이 연구를 시작으로 MLP도 다시한번 조명받는 계기가 되면 좋을 것 같네요
비슷한 시점에 나온 비슷한 연구들도 마지막에 간략하게 소개하였습니다.
재미있게 봐주세요. 감사합니다!
논문링크: https://arxiv.org/abs/2105.01601
영상링크: https://youtu.be/KQmZlxdnnuY
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
Review of Do and Batzoglou. "What is the expectation maximization algorith?" Nat. Biotechnol. 2008;26:897. Also covers the Data Augmentation and Stan implementation. Resources at https://github.com/kaz-yos/em_da_repo
Optimization as a model for few shot learningKaty Lee
paper presentation of "Optimization as a model for few shot learning" at ICLR 2017 by Sachin Ravi and Hugo Larochelle
highly related to "learning to learn by gradient descent by gradient descent"
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
I reviewed the PEARL paper.
PEARL (Probabilistic Embeddings for Actor-critic RL) is an off-policy meta-RL algorithm to achieve both meta-training and adaptation efficiency. It performs probabilistic encoder filtering of latent task variables to enables posterior sampling for structured and efficient exploration.
Outline
- Abstract
- Introduction
- Probabilistic Latent Context
- Off-Policy Meta-Reinforcement Learning
- Experiments
Link: https://arxiv.org/abs/1903.08254
Thank you!
Machine learning is a method of data analysis that automates analytical model building. It is a branch of artificial intelligence based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention
Slides of the course on big data by Clement Levallois from EMLYON Business School.
For business students. Check the online video connected with these slides.
-> Machine learning explained in simple terms to a business audience: what is a training set, a test set, and how does machine learning differ from statistics.
Similar to Introduction to MAML (Model Agnostic Meta Learning) with Discussions (20)
Mixture-Rank Matrix Approximation for Collaborative FilteringJoonyoung Yi
The unofficial slide of Mixture-Rank Matrix Approximation for Collaborative Filtering (NIPS 2017)
--
Abstract of the paper: Low-rank matrix approximation (LRMA) methods have achieved excellent accuracy among today's collaborative filtering (CF) methods. In existing LRMA methods, the rank of user/item feature matrices is typically fixed, i.e., the same rank is adopted to describe all users/items. However, our studies show that submatrices with different ranks could coexist in the same user-item rating matrix, so that approximations with fixed ranks cannot perfectly describe the internal structures of the rating matrix, therefore leading to inferior recommendation accuracy. In this paper, a mixture-rank matrix approximation (MRMA) method is proposed, in which user-item ratings can be characterized by a mixture of LRMA models with different ranks. Meanwhile, a learning algorithm capitalizing on iterated condition modes is proposed to tackle the non-convex optimization problem pertaining to MRMA. Experimental studies on MovieLens and Netflix datasets demonstrate that MRMA can outperform six state-of-the-art LRMA-based CF methods in terms of recommendation accuracy.
Sparsity Normalization: Stabilizing the Expected Outputs of Deep NetworksJoonyoung Yi
The learning of deep models, in which a numerous of parameters are superimposed, is known to be a fairly sensitive process and should be carefully done through a combination of several techniques that can help to stabilize it. We introduce an additional challenge that has never been explicitly studied: the heterogeneity of sparsity at the instance level due to missing values or the innate nature of the input distribution. We confirm experimentally on the widely used benchmark datasets that this variable sparsity problem makes the output statistics of neurons unstable and makes the learning process more difficult by saturating non-linearities. We also provide the analysis of this phenomenon, and based on our analysis, we present a simple technique to prevent this issue, referred to as Sparsity Normalization (SN). Finally, we show that the performance can be significantly improved with SN on certain popular benchmark datasets, or that similar performance can be achieved with lower capacity. Especially focusing on the collaborative filtering problem where the variable sparsity issue has been completely ignored, we achieve new state-of-the-art results on Movielens 100k and 1M datasets, by simply applying Sparsity Normalization (SN).
https://arxiv.org/abs/1906.00150
Low-rank Matrix Approximation with StabilityJoonyoung Yi
Slide of Low-rank Matrix Approximation with Stability (SMA), ICML 2016
--
Abstract of Paper:
Low-rank matrix approximation has been widely
adopted in machine learning applications with sparse data, such as recommender systems. However, the sparsity of the data, incomplete and
noisy, introduces challenges to the algorithm stability – small changes in the training data may
significantly change the models. As a result, existing low-rank matrix approximation solutions yield low generalization performance, exhibiting high error variance on the training dataset,
and minimizing the training error may not guarantee error reduction on the testing dataset. In
this paper, we investigate the algorithm stability problem of low-rank matrix approximations.
We present a new algorithm design framework,
which (1) introduces new optimization objectives
to guide stable matrix approximation algorithm
design, and (2) solves the optimization problem
to obtain stable low-rank approximation solutions with good generalization performance. Experimental results on real-world datasets demonstrate that the proposed work can achieve better
prediction accuracy compared with both state-ofthe-art low-rank matrix approximation methods
and ensemble methods in recommendation task
A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE) Slide Joonyoung Yi
PPT for A Neural Autoregressive Approach to Collaborative Filtering (CF-NADE).
I made ppt for explaining the paper.
Abstract of the paper:
This paper proposes CF-NADE, a neural autoregressive architecture for collaborative filtering (CF) tasks, which is inspired by the Restricted Boltzmann Machine (RBM) based CF model and the Neural Autoregressive Distribution Estimator (NADE). We first describe the basic CF-NADE model for CF tasks. Then we propose to improve the model by sharing parameters between different ratings. A factored version of CF-NADE is also proposed for better scalability. Furthermore, we take the ordinal nature of the preferences into consideration and propose an ordinal cost to optimize CF-NADE, which shows superior performance. Finally, CF-NADE can be extended to a deep model, with only moderately increased computational complexity. Experimental results show that CF-NADE with a single hidden layer beats all previous state-of-the-art methods on MovieLens 1M, MovieLens 10M, and Netflix datasets, and adding more hidden layers can further improve the performance.
Exact Matrix Completion via Convex Optimization Slide (PPT)Joonyoung Yi
Slide of the paper "Exact Matrix Completion via Convex Optimization" of Emmanuel J. Candès and Benjamin Recht. We presented this slide in KAIST CS592 Class, April 2018.
- Code: https://github.com/JoonyoungYi/MCCO-numpy
- Abstract of the paper: We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M. Can we complete the matrix and recover the entries that we have not seen? We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys
𝑚≥𝐶𝑛1.2𝑟log𝑛
for some positive numerical constant C, then with very high probability, most n×n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
2. Contents
1. Meta-Learning Problem Setup (Definition & Goal)
2. Prior Work
3. Model Agnostic Meta Learning
1. Characteristics
2. Intuition
4. Approach
1. Supervised Learning
2. Gradient of Gradient
3. Reinforcement Learning
5. Experiment
6. Discussion
3. Meta-Learning Problem Set-up
• Definition of meta-learning:
• Learner is trained by meta-learner to be able to learn on many different
tasks.
• Goal:
• Learner quickly learn new tasks from a small amount of new data.
1. Problem Set-up
12. Prior Work
• Learning an update function or update rule
• LSTM optimizer
(Learning to learn by gradient descent by gradient descent)
• Meta LSTM optimizer
(Optimization as a model for few-shot learning)
• Few shot (or meta) learning for specific tasks
• Generative modeling (Neural Statistician)
• Image classification (Matching Net., Prototypical Net.)
• Reinforcement learning
(Benchmarking deep reinforcement learning for continuous control)
2. Prior Work
13. Prior Work
• Memory-augmented meta-learning
• Memory-augmented models on many tasks
(Meta-learning with memory-augmented neural networks)
• Parameter initialization of deep networks
• Exploring sensitivity while maintaining Internal representation
(Overcoming catastrophic forgetting in NN)
• Data-dependent Initializer
(Data-Dependent Initialization of CNN)
• learned initialization
(Gradient-based hyperparameter optimization through reversible learning)
2. Prior Work
14. Model-Agnostic Meta-Learning (MAML)
• Goal:
• Quickly adapt to new tasks on distribution
with only small amount of data and
with only a few gradient steps,
even one gradient step.
• Learner:
• Learn a new task by using a single gradient step.
• Meta-learner:
• Learn a generalized parameter initialization of model.
3. Model-Agnostic Meta-Learning (MAML)
15. Characteristics of MAML
• The MAML learner’s weights are updated using the gradient,
rather than a learned update.
• Not need additional parameters nor require a particular learner
architecture.
• Fast adaptability through good parameter initialization
• Explicitly optimizes to learn Internal representation
(i.e. suitable for many tasks).
• Maximizes sensitivity of new task losses to the model parameters.
3. Model-Agnostic Meta-Learning (MAML)
16. Characteristics of MAML
• Model-agnostic (No matter whatever model is)
• Classification & regression with differentiable losses, Policy gradient RL.
• The model should be parameterized.
• No other assumptions on the form of the model.
• Task-agnostic (No matter whatever task is)
• Adopted all knowledge-transferable tasks.
• No other assumption is required.
3. Model-Agnostic Meta-Learning (MAML)
17. • Some internal representations
are more transferrable than others.
• Desired model parameter set is θ such that:
Applying one (or a small # of) gradient step to θ
on a new task will produce maximally effective behavior.
→ Find θ that commonly decreases loss of each task
after adaptation.
Intuition of MAML
3. Model-Agnostic Meta-Learning (MAML)
18. Supervised Learning
• Notations
• Model: (Model-agnostic)
• Task distribution:
• For each Task:
• Data distribution: q (K samples are drawn from q)
• Loss function:
• MSE for Regression
• Cross entropy for classification
• Any other differentiable loss functions can be used.
4. Approach
f✓<latexit sha1_base64="vYib4ronG48FFWnNXVHMYLu2Hrs=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG8FL+KpgrGFNpTNdtMu3Wzi7kQooX/CiwcVr/4eb/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v7Bg0kyzbjPEpnodkgNl0JxHwVK3k41p3EoeSscXU/91hPXRiTqHscpD2I6UCISjKKV2lGvi0OOtFetuXV3BrJMvILUoECzV/3q9hOWxVwhk9SYjuemGORUo2CSTyrdzPCUshEd8I6lisbcBPns3gk5sUqfRIm2pZDM1N8TOY2NGceh7YwpDs2iNxX/8zoZRpdBLlSaIVdsvijKJMGETJ8nfaE5Qzm2hDIt7K2EDammDG1EFRuCt/jyMvHP6ld17+681rgt0ijDERzDKXhwAQ24gSb4wEDCM7zCm/PovDjvzse8teQUM4fwB87nD4uEj90=</latexit><latexit sha1_base64="vYib4ronG48FFWnNXVHMYLu2Hrs=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG8FL+KpgrGFNpTNdtMu3Wzi7kQooX/CiwcVr/4eb/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v7Bg0kyzbjPEpnodkgNl0JxHwVK3k41p3EoeSscXU/91hPXRiTqHscpD2I6UCISjKKV2lGvi0OOtFetuXV3BrJMvILUoECzV/3q9hOWxVwhk9SYjuemGORUo2CSTyrdzPCUshEd8I6lisbcBPns3gk5sUqfRIm2pZDM1N8TOY2NGceh7YwpDs2iNxX/8zoZRpdBLlSaIVdsvijKJMGETJ8nfaE5Qzm2hDIt7K2EDammDG1EFRuCt/jyMvHP6ld17+681rgt0ijDERzDKXhwAQ24gSb4wEDCM7zCm/PovDjvzse8teQUM4fwB87nD4uEj90=</latexit><latexit sha1_base64="vYib4ronG48FFWnNXVHMYLu2Hrs=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG8FL+KpgrGFNpTNdtMu3Wzi7kQooX/CiwcVr/4eb/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v7Bg0kyzbjPEpnodkgNl0JxHwVK3k41p3EoeSscXU/91hPXRiTqHscpD2I6UCISjKKV2lGvi0OOtFetuXV3BrJMvILUoECzV/3q9hOWxVwhk9SYjuemGORUo2CSTyrdzPCUshEd8I6lisbcBPns3gk5sUqfRIm2pZDM1N8TOY2NGceh7YwpDs2iNxX/8zoZRpdBLlSaIVdsvijKJMGETJ8nfaE5Qzm2hDIt7K2EDammDG1EFRuCt/jyMvHP6ld17+681rgt0ijDERzDKXhwAQ24gSb4wEDCM7zCm/PovDjvzse8teQUM4fwB87nD4uEj90=</latexit><latexit sha1_base64="vYib4ronG48FFWnNXVHMYLu2Hrs=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG8FL+KpgrGFNpTNdtMu3Wzi7kQooX/CiwcVr/4eb/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v7Bg0kyzbjPEpnodkgNl0JxHwVK3k41p3EoeSscXU/91hPXRiTqHscpD2I6UCISjKKV2lGvi0OOtFetuXV3BrJMvILUoECzV/3q9hOWxVwhk9SYjuemGORUo2CSTyrdzPCUshEd8I6lisbcBPns3gk5sUqfRIm2pZDM1N8TOY2NGceh7YwpDs2iNxX/8zoZRpdBLlSaIVdsvijKJMGETJ8nfaE5Qzm2hDIt7K2EDammDG1EFRuCt/jyMvHP6ld17+681rgt0ijDERzDKXhwAQ24gSb4wEDCM7zCm/PovDjvzse8teQUM4fwB87nD4uEj90=</latexit>
T = {L(x1, y1, . . . , xK, yK), (x, y) ⇠ q}<latexit sha1_base64="3SPqX5Pd//SZfknsNJ7h81baRq4=">AAACH3icbVBNS0JBFJ1nX2ZfVss2lyRQEHkvAmsRCG0iWxhoBj55zBtHHZz30cy8UB7+lDb9lTYtKqKd/6Z56qK0AzMczrn3ztzjhpxJZZoTI7Wyura+kd7MbG3v7O5l9w/uZRAJQhsk4IF4cLGknPm0oZji9CEUFHsup013cJX4zScqJAv8uhqFtO3hns+6jGClJSdbrsMl2DHcQn7oWEUYJRfYnUDJIgydaqJUC0XtalYAWzIPHsEeO9mcWTKngGVizUkOzVFzst96KIk86ivCsZQtywxVO8ZCMcLpOGNHkoaYDHCPtjT1sUdlO54uOIYTrXSgGwh9fAVT9XdHjD0pR56rKz2s+nLRS8T/vFakuuftmPlhpKhPZg91Iw4qgCQt6DBBieIjTTARTP8VSB8LTJTONKNDsBZXXiaN09JFybo7y1Vu5mmk0RE6RnlkoTKqoGtUQw1E0DN6Re/ow3gx3oxP42tWmjLmPYfoD4zJD9DSnhc=</latexit><latexit sha1_base64="3SPqX5Pd//SZfknsNJ7h81baRq4=">AAACH3icbVBNS0JBFJ1nX2ZfVss2lyRQEHkvAmsRCG0iWxhoBj55zBtHHZz30cy8UB7+lDb9lTYtKqKd/6Z56qK0AzMczrn3ztzjhpxJZZoTI7Wyura+kd7MbG3v7O5l9w/uZRAJQhsk4IF4cLGknPm0oZji9CEUFHsup013cJX4zScqJAv8uhqFtO3hns+6jGClJSdbrsMl2DHcQn7oWEUYJRfYnUDJIgydaqJUC0XtalYAWzIPHsEeO9mcWTKngGVizUkOzVFzst96KIk86ivCsZQtywxVO8ZCMcLpOGNHkoaYDHCPtjT1sUdlO54uOIYTrXSgGwh9fAVT9XdHjD0pR56rKz2s+nLRS8T/vFakuuftmPlhpKhPZg91Iw4qgCQt6DBBieIjTTARTP8VSB8LTJTONKNDsBZXXiaN09JFybo7y1Vu5mmk0RE6RnlkoTKqoGtUQw1E0DN6Re/ow3gx3oxP42tWmjLmPYfoD4zJD9DSnhc=</latexit><latexit sha1_base64="3SPqX5Pd//SZfknsNJ7h81baRq4=">AAACH3icbVBNS0JBFJ1nX2ZfVss2lyRQEHkvAmsRCG0iWxhoBj55zBtHHZz30cy8UB7+lDb9lTYtKqKd/6Z56qK0AzMczrn3ztzjhpxJZZoTI7Wyura+kd7MbG3v7O5l9w/uZRAJQhsk4IF4cLGknPm0oZji9CEUFHsup013cJX4zScqJAv8uhqFtO3hns+6jGClJSdbrsMl2DHcQn7oWEUYJRfYnUDJIgydaqJUC0XtalYAWzIPHsEeO9mcWTKngGVizUkOzVFzst96KIk86ivCsZQtywxVO8ZCMcLpOGNHkoaYDHCPtjT1sUdlO54uOIYTrXSgGwh9fAVT9XdHjD0pR56rKz2s+nLRS8T/vFakuuftmPlhpKhPZg91Iw4qgCQt6DBBieIjTTARTP8VSB8LTJTONKNDsBZXXiaN09JFybo7y1Vu5mmk0RE6RnlkoTKqoGtUQw1E0DN6Re/ow3gx3oxP42tWmjLmPYfoD4zJD9DSnhc=</latexit><latexit sha1_base64="3SPqX5Pd//SZfknsNJ7h81baRq4=">AAACH3icbVBNS0JBFJ1nX2ZfVss2lyRQEHkvAmsRCG0iWxhoBj55zBtHHZz30cy8UB7+lDb9lTYtKqKd/6Z56qK0AzMczrn3ztzjhpxJZZoTI7Wyura+kd7MbG3v7O5l9w/uZRAJQhsk4IF4cLGknPm0oZji9CEUFHsup013cJX4zScqJAv8uhqFtO3hns+6jGClJSdbrsMl2DHcQn7oWEUYJRfYnUDJIgydaqJUC0XtalYAWzIPHsEeO9mcWTKngGVizUkOzVFzst96KIk86ivCsZQtywxVO8ZCMcLpOGNHkoaYDHCPtjT1sUdlO54uOIYTrXSgGwh9fAVT9XdHjD0pR56rKz2s+nLRS8T/vFakuuftmPlhpKhPZg91Iw4qgCQt6DBBieIjTTARTP8VSB8LTJTONKNDsBZXXiaN09JFybo7y1Vu5mmk0RE6RnlkoTKqoGtUQw1E0DN6Re/ow3gx3oxP42tWmjLmPYfoD4zJD9DSnhc=</latexit>
L<latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit>
p(T )<latexit sha1_base64="pt9rNGiWOedYqMAbHE8mujpf7lg=">AAAB9HicbVBNS8NAFHypX7V+VT16WSxCvZREBPVW8CKeKjS20May2W7apZtN2N0oJfR/ePGg4tUf481/4ybNQVsHFoaZ93iz48ecKW3b31ZpZXVtfaO8Wdna3tndq+4f3KsokYS6JOKR7PpYUc4EdTXTnHZjSXHoc9rxJ9eZ33mkUrFItPU0pl6IR4IFjGBtpIe43g+xHhPM0/bsdFCt2Q07B1omTkFqUKA1qH71hxFJQio04VipnmPH2kux1IxwOqv0E0VjTCZ4RHuGChxS5aV56hk6McoQBZE0T2iUq783UhwqNQ19M5llVIteJv7n9RIdXHopE3GiqSDzQ0HCkY5QVgEaMkmJ5lNDMJHMZEVkjCUm2hRVMSU4i19eJu5Z46rh3J3XmrdFG2U4gmOogwMX0IQbaIELBCQ8wyu8WU/Wi/VufcxHS1axcwh/YH3+AJi2kiU=</latexit><latexit sha1_base64="pt9rNGiWOedYqMAbHE8mujpf7lg=">AAAB9HicbVBNS8NAFHypX7V+VT16WSxCvZREBPVW8CKeKjS20May2W7apZtN2N0oJfR/ePGg4tUf481/4ybNQVsHFoaZ93iz48ecKW3b31ZpZXVtfaO8Wdna3tndq+4f3KsokYS6JOKR7PpYUc4EdTXTnHZjSXHoc9rxJ9eZ33mkUrFItPU0pl6IR4IFjGBtpIe43g+xHhPM0/bsdFCt2Q07B1omTkFqUKA1qH71hxFJQio04VipnmPH2kux1IxwOqv0E0VjTCZ4RHuGChxS5aV56hk6McoQBZE0T2iUq783UhwqNQ19M5llVIteJv7n9RIdXHopE3GiqSDzQ0HCkY5QVgEaMkmJ5lNDMJHMZEVkjCUm2hRVMSU4i19eJu5Z46rh3J3XmrdFG2U4gmOogwMX0IQbaIELBCQ8wyu8WU/Wi/VufcxHS1axcwh/YH3+AJi2kiU=</latexit><latexit sha1_base64="pt9rNGiWOedYqMAbHE8mujpf7lg=">AAAB9HicbVBNS8NAFHypX7V+VT16WSxCvZREBPVW8CKeKjS20May2W7apZtN2N0oJfR/ePGg4tUf481/4ybNQVsHFoaZ93iz48ecKW3b31ZpZXVtfaO8Wdna3tndq+4f3KsokYS6JOKR7PpYUc4EdTXTnHZjSXHoc9rxJ9eZ33mkUrFItPU0pl6IR4IFjGBtpIe43g+xHhPM0/bsdFCt2Q07B1omTkFqUKA1qH71hxFJQio04VipnmPH2kux1IxwOqv0E0VjTCZ4RHuGChxS5aV56hk6McoQBZE0T2iUq783UhwqNQ19M5llVIteJv7n9RIdXHopE3GiqSDzQ0HCkY5QVgEaMkmJ5lNDMJHMZEVkjCUm2hRVMSU4i19eJu5Z46rh3J3XmrdFG2U4gmOogwMX0IQbaIELBCQ8wyu8WU/Wi/VufcxHS1axcwh/YH3+AJi2kiU=</latexit><latexit sha1_base64="pt9rNGiWOedYqMAbHE8mujpf7lg=">AAAB9HicbVBNS8NAFHypX7V+VT16WSxCvZREBPVW8CKeKjS20May2W7apZtN2N0oJfR/ePGg4tUf481/4ybNQVsHFoaZ93iz48ecKW3b31ZpZXVtfaO8Wdna3tndq+4f3KsokYS6JOKR7PpYUc4EdTXTnHZjSXHoc9rxJ9eZ33mkUrFItPU0pl6IR4IFjGBtpIe43g+xHhPM0/bsdFCt2Q07B1omTkFqUKA1qH71hxFJQio04VipnmPH2kux1IxwOqv0E0VjTCZ4RHuGChxS5aV56hk6McoQBZE0T2iUq783UhwqNQ19M5llVIteJv7n9RIdXHopE3GiqSDzQ0HCkY5QVgEaMkmJ5lNDMJHMZEVkjCUm2hRVMSU4i19eJu5Z46rh3J3XmrdFG2U4gmOogwMX0IQbaIELBCQ8wyu8WU/Wi/VufcxHS1axcwh/YH3+AJi2kiU=</latexit>
20. • From line 10 in Algorithm 2,
(Recall: )
Gradient of Gradient
4. Approach
21. • From line 10 in Algorithm 2,
(Recall: )
( is differentiable)
Gradient of Gradient
4. Approach
L<latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit>
22. • From line 10 in Algorithm 2,
(Recall: )
( is differentiable)
Gradient of Gradient
4. Approach
L<latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit>
23. • From line 10 in Algorithm 2,
(Recall: )
( is differentiable)
Gradient of Gradient
4. Approach
L<latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit>
24. • From line 10 in Algorithm 2,
(Recall: )
( is differentiable)
Gradient of Gradient
4. Approach
Calculation of Hessian matrix is required.
L<latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit>
25. • From line 10 in Algorithm 2,
(Recall: )
( is differentiable)
Gradient of Gradient
4. Approach
Calculation of Hessian matrix is required.
→ MAML suggest 1st order approximation.
L<latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit>
27. 1st Order Approximation
• Update rule of MAML:
4. Approach
This part needs calculating Hessian.
Hessian makes MAML be slow.
28. 1st Order Approximation
• Update rule of MAML:
• Update rule of MAML with 1st order approximation:
4. Approach
29. 1st Order Approximation
• Update rule of MAML:
• Update rule of MAML with 1st order approximation:
4. Approach
(Regard as constant)
30. • From line 10 in Algorithm 2,
(Recall: )
( is differentiable)
Recall: Gradient of Gradient
4. Approach
In 1st order approximation,
we regard this as identity matrix I.
L<latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit>
31. Reinforcement Learning
• Notations
• Policy of an agent: such that a_t sim f_theta (x_t)
• Task distribution:
• For each Task:
• Episode:
• Transition distribution: q (K trajectories are drawn from q and f)
• Episode length: H
• Loss function: = expectation of sum of rewards
4. Approach
f✓<latexit sha1_base64="vYib4ronG48FFWnNXVHMYLu2Hrs=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG8FL+KpgrGFNpTNdtMu3Wzi7kQooX/CiwcVr/4eb/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v7Bg0kyzbjPEpnodkgNl0JxHwVK3k41p3EoeSscXU/91hPXRiTqHscpD2I6UCISjKKV2lGvi0OOtFetuXV3BrJMvILUoECzV/3q9hOWxVwhk9SYjuemGORUo2CSTyrdzPCUshEd8I6lisbcBPns3gk5sUqfRIm2pZDM1N8TOY2NGceh7YwpDs2iNxX/8zoZRpdBLlSaIVdsvijKJMGETJ8nfaE5Qzm2hDIt7K2EDammDG1EFRuCt/jyMvHP6ld17+681rgt0ijDERzDKXhwAQ24gSb4wEDCM7zCm/PovDjvzse8teQUM4fwB87nD4uEj90=</latexit><latexit sha1_base64="vYib4ronG48FFWnNXVHMYLu2Hrs=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG8FL+KpgrGFNpTNdtMu3Wzi7kQooX/CiwcVr/4eb/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v7Bg0kyzbjPEpnodkgNl0JxHwVK3k41p3EoeSscXU/91hPXRiTqHscpD2I6UCISjKKV2lGvi0OOtFetuXV3BrJMvILUoECzV/3q9hOWxVwhk9SYjuemGORUo2CSTyrdzPCUshEd8I6lisbcBPns3gk5sUqfRIm2pZDM1N8TOY2NGceh7YwpDs2iNxX/8zoZRpdBLlSaIVdsvijKJMGETJ8nfaE5Qzm2hDIt7K2EDammDG1EFRuCt/jyMvHP6ld17+681rgt0ijDERzDKXhwAQ24gSb4wEDCM7zCm/PovDjvzse8teQUM4fwB87nD4uEj90=</latexit><latexit sha1_base64="vYib4ronG48FFWnNXVHMYLu2Hrs=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG8FL+KpgrGFNpTNdtMu3Wzi7kQooX/CiwcVr/4eb/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v7Bg0kyzbjPEpnodkgNl0JxHwVK3k41p3EoeSscXU/91hPXRiTqHscpD2I6UCISjKKV2lGvi0OOtFetuXV3BrJMvILUoECzV/3q9hOWxVwhk9SYjuemGORUo2CSTyrdzPCUshEd8I6lisbcBPns3gk5sUqfRIm2pZDM1N8TOY2NGceh7YwpDs2iNxX/8zoZRpdBLlSaIVdsvijKJMGETJ8nfaE5Qzm2hDIt7K2EDammDG1EFRuCt/jyMvHP6ld17+681rgt0ijDERzDKXhwAQ24gSb4wEDCM7zCm/PovDjvzse8teQUM4fwB87nD4uEj90=</latexit><latexit sha1_base64="vYib4ronG48FFWnNXVHMYLu2Hrs=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBU0lEUG8FL+KpgrGFNpTNdtMu3Wzi7kQooX/CiwcVr/4eb/4bt20O2vpg4PHeDDPzwlQKg6777ZRWVtfWN8qbla3tnd296v7Bg0kyzbjPEpnodkgNl0JxHwVK3k41p3EoeSscXU/91hPXRiTqHscpD2I6UCISjKKV2lGvi0OOtFetuXV3BrJMvILUoECzV/3q9hOWxVwhk9SYjuemGORUo2CSTyrdzPCUshEd8I6lisbcBPns3gk5sUqfRIm2pZDM1N8TOY2NGceh7YwpDs2iNxX/8zoZRpdBLlSaIVdsvijKJMGETJ8nfaE5Qzm2hDIt7K2EDammDG1EFRuCt/jyMvHP6ld17+681rgt0ijDERzDKXhwAQ24gSb4wEDCM7zCm/PovDjvzse8teQUM4fwB87nD4uEj90=</latexit>
p(T )<latexit sha1_base64="pt9rNGiWOedYqMAbHE8mujpf7lg=">AAAB9HicbVBNS8NAFHypX7V+VT16WSxCvZREBPVW8CKeKjS20May2W7apZtN2N0oJfR/ePGg4tUf481/4ybNQVsHFoaZ93iz48ecKW3b31ZpZXVtfaO8Wdna3tndq+4f3KsokYS6JOKR7PpYUc4EdTXTnHZjSXHoc9rxJ9eZ33mkUrFItPU0pl6IR4IFjGBtpIe43g+xHhPM0/bsdFCt2Q07B1omTkFqUKA1qH71hxFJQio04VipnmPH2kux1IxwOqv0E0VjTCZ4RHuGChxS5aV56hk6McoQBZE0T2iUq783UhwqNQ19M5llVIteJv7n9RIdXHopE3GiqSDzQ0HCkY5QVgEaMkmJ5lNDMJHMZEVkjCUm2hRVMSU4i19eJu5Z46rh3J3XmrdFG2U4gmOogwMX0IQbaIELBCQ8wyu8WU/Wi/VufcxHS1axcwh/YH3+AJi2kiU=</latexit><latexit sha1_base64="pt9rNGiWOedYqMAbHE8mujpf7lg=">AAAB9HicbVBNS8NAFHypX7V+VT16WSxCvZREBPVW8CKeKjS20May2W7apZtN2N0oJfR/ePGg4tUf481/4ybNQVsHFoaZ93iz48ecKW3b31ZpZXVtfaO8Wdna3tndq+4f3KsokYS6JOKR7PpYUc4EdTXTnHZjSXHoc9rxJ9eZ33mkUrFItPU0pl6IR4IFjGBtpIe43g+xHhPM0/bsdFCt2Q07B1omTkFqUKA1qH71hxFJQio04VipnmPH2kux1IxwOqv0E0VjTCZ4RHuGChxS5aV56hk6McoQBZE0T2iUq783UhwqNQ19M5llVIteJv7n9RIdXHopE3GiqSDzQ0HCkY5QVgEaMkmJ5lNDMJHMZEVkjCUm2hRVMSU4i19eJu5Z46rh3J3XmrdFG2U4gmOogwMX0IQbaIELBCQ8wyu8WU/Wi/VufcxHS1axcwh/YH3+AJi2kiU=</latexit><latexit sha1_base64="pt9rNGiWOedYqMAbHE8mujpf7lg=">AAAB9HicbVBNS8NAFHypX7V+VT16WSxCvZREBPVW8CKeKjS20May2W7apZtN2N0oJfR/ePGg4tUf481/4ybNQVsHFoaZ93iz48ecKW3b31ZpZXVtfaO8Wdna3tndq+4f3KsokYS6JOKR7PpYUc4EdTXTnHZjSXHoc9rxJ9eZ33mkUrFItPU0pl6IR4IFjGBtpIe43g+xHhPM0/bsdFCt2Q07B1omTkFqUKA1qH71hxFJQio04VipnmPH2kux1IxwOqv0E0VjTCZ4RHuGChxS5aV56hk6McoQBZE0T2iUq783UhwqNQ19M5llVIteJv7n9RIdXHopE3GiqSDzQ0HCkY5QVgEaMkmJ5lNDMJHMZEVkjCUm2hRVMSU4i19eJu5Z46rh3J3XmrdFG2U4gmOogwMX0IQbaIELBCQ8wyu8WU/Wi/VufcxHS1axcwh/YH3+AJi2kiU=</latexit><latexit sha1_base64="pt9rNGiWOedYqMAbHE8mujpf7lg=">AAAB9HicbVBNS8NAFHypX7V+VT16WSxCvZREBPVW8CKeKjS20May2W7apZtN2N0oJfR/ePGg4tUf481/4ybNQVsHFoaZ93iz48ecKW3b31ZpZXVtfaO8Wdna3tndq+4f3KsokYS6JOKR7PpYUc4EdTXTnHZjSXHoc9rxJ9eZ33mkUrFItPU0pl6IR4IFjGBtpIe43g+xHhPM0/bsdFCt2Q07B1omTkFqUKA1qH71hxFJQio04VipnmPH2kux1IxwOqv0E0VjTCZ4RHuGChxS5aV56hk6McoQBZE0T2iUq783UhwqNQ19M5llVIteJv7n9RIdXHopE3GiqSDzQ0HCkY5QVgEaMkmJ5lNDMJHMZEVkjCUm2hRVMSU4i19eJu5Z46rh3J3XmrdFG2U4gmOogwMX0IQbaIELBCQ8wyu8WU/Wi/VufcxHS1axcwh/YH3+AJi2kiU=</latexit>
L<latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit><latexit sha1_base64="UIzmQdPm4e6dU3uEnO6VqVUUQ+8=">AAAB8XicbVBNSwMxFMzWr1q/qh69BIvgqeyKoN4KXkQ8VHBtYbuUbJptQ7PJkrwVytKf4cWDilf/jTf/jdl2D9o6EBhm3iPzJkoFN+C6305lZXVtfaO6Wdva3tndq+8fPBqVacp8qoTS3YgYJrhkPnAQrJtqRpJIsE40vi78zhPThiv5AJOUhQkZSh5zSsBKQS8hMKJE5HfTfr3hNt0Z8DLxStJAJdr9+ldvoGiWMAlUEGMCz00hzIkGTgWb1nqZYSmhYzJkgaWSJMyE+SzyFJ9YZYBjpe2TgGfq742cJMZMkshOFhHNoleI/3lBBvFlmHOZZsAknX8UZwKDwsX9eMA1oyAmlhCquc2K6YhoQsG2VLMleIsnLxP/rHnV9O7PG63bso0qOkLH6BR56AK10A1qIx9RpNAzekVvDjgvzrvzMR+tOOXOIfoD5/MH7biRPg==</latexit>
T = {L(e1, e2, . . . , eK), x1 ⇠ q(x), xt+1 ⇠ q(x|xt, at), H}<latexit sha1_base64="ZL5RnVdScnbZacaHZ6rzokxi/dU=">AAACOHicbVDLahtBEJx1HlaUh2XnmEsTEVCIELvC4ORgEPgi4hxkkCKBViyzo5Y0aPbhmV4jsdFv+ZK/yC2Qiw92yDVf4NGDkEgpGKiu6qanK0yVNOS63529Bw8fPd4vPCk+ffb8xUHp8OizSTItsCMSleheyA0qGWOHJCnspRp5FCrshtOzpd+9Qm1kErdpnuIg4uNYjqTgZKWg1GrDKfg5fKpg4FUBg3oV/GFCZsnP31ZhFnjgGxnBZWW2KnN65y3+SF+sQlXgAVmzCf4iKJXdmrsC7BJvQ8psg1ZQ+mb3iSzCmITixvQ9N6VBzjVJoXBR9DODKRdTPsa+pTGP0Azy1eULeGOVIYwSbV9MsFL/nsh5ZMw8Cm1nxGlitr2l+D+vn9Ho/SCXcZoRxmK9aJQpoASWMcJQahSk5pZwoaX9K4gJ11yQDbtoQ/C2T94lnXrtQ827OC43Pm7SKLBX7DWrMI+dsAZrshbrMMGu2Q92y+6cr86N89P5tW7dczYzL9k/cH7fAyMzp3E=</latexit><latexit sha1_base64="ZL5RnVdScnbZacaHZ6rzokxi/dU=">AAACOHicbVDLahtBEJx1HlaUh2XnmEsTEVCIELvC4ORgEPgi4hxkkCKBViyzo5Y0aPbhmV4jsdFv+ZK/yC2Qiw92yDVf4NGDkEgpGKiu6qanK0yVNOS63529Bw8fPd4vPCk+ffb8xUHp8OizSTItsCMSleheyA0qGWOHJCnspRp5FCrshtOzpd+9Qm1kErdpnuIg4uNYjqTgZKWg1GrDKfg5fKpg4FUBg3oV/GFCZsnP31ZhFnjgGxnBZWW2KnN65y3+SF+sQlXgAVmzCf4iKJXdmrsC7BJvQ8psg1ZQ+mb3iSzCmITixvQ9N6VBzjVJoXBR9DODKRdTPsa+pTGP0Azy1eULeGOVIYwSbV9MsFL/nsh5ZMw8Cm1nxGlitr2l+D+vn9Ho/SCXcZoRxmK9aJQpoASWMcJQahSk5pZwoaX9K4gJ11yQDbtoQ/C2T94lnXrtQ827OC43Pm7SKLBX7DWrMI+dsAZrshbrMMGu2Q92y+6cr86N89P5tW7dczYzL9k/cH7fAyMzp3E=</latexit><latexit sha1_base64="ZL5RnVdScnbZacaHZ6rzokxi/dU=">AAACOHicbVDLahtBEJx1HlaUh2XnmEsTEVCIELvC4ORgEPgi4hxkkCKBViyzo5Y0aPbhmV4jsdFv+ZK/yC2Qiw92yDVf4NGDkEgpGKiu6qanK0yVNOS63529Bw8fPd4vPCk+ffb8xUHp8OizSTItsCMSleheyA0qGWOHJCnspRp5FCrshtOzpd+9Qm1kErdpnuIg4uNYjqTgZKWg1GrDKfg5fKpg4FUBg3oV/GFCZsnP31ZhFnjgGxnBZWW2KnN65y3+SF+sQlXgAVmzCf4iKJXdmrsC7BJvQ8psg1ZQ+mb3iSzCmITixvQ9N6VBzjVJoXBR9DODKRdTPsa+pTGP0Azy1eULeGOVIYwSbV9MsFL/nsh5ZMw8Cm1nxGlitr2l+D+vn9Ho/SCXcZoRxmK9aJQpoASWMcJQahSk5pZwoaX9K4gJ11yQDbtoQ/C2T94lnXrtQ827OC43Pm7SKLBX7DWrMI+dsAZrshbrMMGu2Q92y+6cr86N89P5tW7dczYzL9k/cH7fAyMzp3E=</latexit><latexit sha1_base64="ZL5RnVdScnbZacaHZ6rzokxi/dU=">AAACOHicbVDLahtBEJx1HlaUh2XnmEsTEVCIELvC4ORgEPgi4hxkkCKBViyzo5Y0aPbhmV4jsdFv+ZK/yC2Qiw92yDVf4NGDkEgpGKiu6qanK0yVNOS63529Bw8fPd4vPCk+ffb8xUHp8OizSTItsCMSleheyA0qGWOHJCnspRp5FCrshtOzpd+9Qm1kErdpnuIg4uNYjqTgZKWg1GrDKfg5fKpg4FUBg3oV/GFCZsnP31ZhFnjgGxnBZWW2KnN65y3+SF+sQlXgAVmzCf4iKJXdmrsC7BJvQ8psg1ZQ+mb3iSzCmITixvQ9N6VBzjVJoXBR9DODKRdTPsa+pTGP0Azy1eULeGOVIYwSbV9MsFL/nsh5ZMw8Cm1nxGlitr2l+D+vn9Ho/SCXcZoRxmK9aJQpoASWMcJQahSk5pZwoaX9K4gJ11yQDbtoQ/C2T94lnXrtQ827OC43Pm7SKLBX7DWrMI+dsAZrshbrMMGu2Q92y+6cr86N89P5tW7dczYzL9k/cH7fAyMzp3E=</latexit>
e = (x1, a1, . . . , xH, aH)<latexit sha1_base64="RTvl4VSKDR1Czf8GOPpJeflQwCo=">AAACDnicbVBNS8MwGE7n15xfVY9egkOZMEYrgnoQBl6GpwnWDbZS0jTbwtK0JKlslP0DL/4VLx5UvHr25r8x3XrQzQcSHp7nfd/kffyYUaks69soLC2vrK4V10sbm1vbO+bu3r2MEoGJgyMWibaPJGGUE0dRxUg7FgSFPiMtf3id+a0HIiSN+J0ax8QNUZ/THsVIackzj7t+xIKUTOAVrIw8uwpRdnWDSMkqHHmNTGiceGbZqllTwEVi56QMcjQ980uPwElIuMIMSdmxrVi5KRKKYkYmpW4iSYzwEPVJR1OOQiLddLrPBB5pJYC9SOjDFZyqvztSFEo5Dn1dGSI1kPNeJv7ndRLVu3BTyuNEEY5nD/USBlUEs3BgQAXBio01QVhQ/VeIB0ggrHSEJR2CPb/yInFOa5c1+/asXL/J0yiCA3AIKsAG56AOGqAJHIDBI3gGr+DNeDJejHfjY1ZaMPKeffAHxucP0DKZkQ==</latexit><latexit sha1_base64="RTvl4VSKDR1Czf8GOPpJeflQwCo=">AAACDnicbVBNS8MwGE7n15xfVY9egkOZMEYrgnoQBl6GpwnWDbZS0jTbwtK0JKlslP0DL/4VLx5UvHr25r8x3XrQzQcSHp7nfd/kffyYUaks69soLC2vrK4V10sbm1vbO+bu3r2MEoGJgyMWibaPJGGUE0dRxUg7FgSFPiMtf3id+a0HIiSN+J0ax8QNUZ/THsVIackzj7t+xIKUTOAVrIw8uwpRdnWDSMkqHHmNTGiceGbZqllTwEVi56QMcjQ980uPwElIuMIMSdmxrVi5KRKKYkYmpW4iSYzwEPVJR1OOQiLddLrPBB5pJYC9SOjDFZyqvztSFEo5Dn1dGSI1kPNeJv7ndRLVu3BTyuNEEY5nD/USBlUEs3BgQAXBio01QVhQ/VeIB0ggrHSEJR2CPb/yInFOa5c1+/asXL/J0yiCA3AIKsAG56AOGqAJHIDBI3gGr+DNeDJejHfjY1ZaMPKeffAHxucP0DKZkQ==</latexit><latexit sha1_base64="RTvl4VSKDR1Czf8GOPpJeflQwCo=">AAACDnicbVBNS8MwGE7n15xfVY9egkOZMEYrgnoQBl6GpwnWDbZS0jTbwtK0JKlslP0DL/4VLx5UvHr25r8x3XrQzQcSHp7nfd/kffyYUaks69soLC2vrK4V10sbm1vbO+bu3r2MEoGJgyMWibaPJGGUE0dRxUg7FgSFPiMtf3id+a0HIiSN+J0ax8QNUZ/THsVIackzj7t+xIKUTOAVrIw8uwpRdnWDSMkqHHmNTGiceGbZqllTwEVi56QMcjQ980uPwElIuMIMSdmxrVi5KRKKYkYmpW4iSYzwEPVJR1OOQiLddLrPBB5pJYC9SOjDFZyqvztSFEo5Dn1dGSI1kPNeJv7ndRLVu3BTyuNEEY5nD/USBlUEs3BgQAXBio01QVhQ/VeIB0ggrHSEJR2CPb/yInFOa5c1+/asXL/J0yiCA3AIKsAG56AOGqAJHIDBI3gGr+DNeDJejHfjY1ZaMPKeffAHxucP0DKZkQ==</latexit><latexit sha1_base64="RTvl4VSKDR1Czf8GOPpJeflQwCo=">AAACDnicbVBNS8MwGE7n15xfVY9egkOZMEYrgnoQBl6GpwnWDbZS0jTbwtK0JKlslP0DL/4VLx5UvHr25r8x3XrQzQcSHp7nfd/kffyYUaks69soLC2vrK4V10sbm1vbO+bu3r2MEoGJgyMWibaPJGGUE0dRxUg7FgSFPiMtf3id+a0HIiSN+J0ax8QNUZ/THsVIackzj7t+xIKUTOAVrIw8uwpRdnWDSMkqHHmNTGiceGbZqllTwEVi56QMcjQ980uPwElIuMIMSdmxrVi5KRKKYkYmpW4iSYzwEPVJR1OOQiLddLrPBB5pJYC9SOjDFZyqvztSFEo5Dn1dGSI1kPNeJv7ndRLVu3BTyuNEEY5nD/USBlUEs3BgQAXBio01QVhQ/VeIB0ggrHSEJR2CPb/yInFOa5c1+/asXL/J0yiCA3AIKsAG56AOGqAJHIDBI3gGr+DNeDJejHfjY1ZaMPKeffAHxucP0DKZkQ==</latexit>
33. Experiment on Few-Shot Classification
• Omniglot (Lake et al., 2012)
• 50 different alphabets, 1623 characters.
• 20 instances for each characters were drawn by 20 different people.
• 1200 for training, 423 for test.
• Mini-Imagenet (Ravi & Larochelle, 2017)
• Classes for each set: train=64, validation=12, test=24.
5. Experiment
35. Effect of 1st Order Approximation
• In 1st order approximation, computation is roughly 33% better.
• Performances are similar.
5. Experiment
36. Effect of 1st Order Approximation
• In 1st order approximation, computation is roughly 33% better.
• Performances are similar.
5. Experiment
Performances are not that different.
37. Effect of 1st Order Approximation
• In 1st order approximation, computation is roughly 33% better.
• Performances are similar.
5. Experiment
Some performance is even better.
38. Experiments on Regression
• Sinusoid function:
• Amplitude (A) and phase (ɸ) are varied between tasks
• A in [0.1, 0.5]
• ɸ in [0, π]
• x in [-5.0, 5.0]
• Loss function: Mean Squared Error (MSE)
• Regressor: 2 hidden layers with 40 units and ReLU
• Training
• Use only 1 gradient step for learner
• K = 5 or 10 example (5-shot learning or 10-shot learning)
• Fixed step size (ɑ=0.01) for Adam optimizer.
5. Experiment
39. Results of 10-Shot Learning Regression
5. Experiment
[MAML] [Pretraining + Fine-tuning]
The red line is ground truth.
Fit this sine function with only few (10) samples.
40. Results of 10-Shot Learning Regression
5. Experiment
Above plots are the pre-trained function of two models.
(The prediction of meta-parameter of MAML,
The prediction of co-learned parameter of vanilla multi-task learning)
[MAML] [Pretraining + Fine-tuning]
41. Results of 10-Shot Learning Regression
5. Experiment
After 1 gradient step update.
[MAML] [Pretraining + Fine-tuning]
42. Results of 10-Shot Learning Regression
5. Experiment
[MAML] [Pretraining + Fine-tuning]
After 10 gradient step update.
43. Results of 10-Shot Learning Regression
5. Experiment
* step size 0.02
In the case of MAML, they quickly adapted to new samples.
But, adaptation failed in the case of pretraining model.
[MAML] [Pretraining + Fine-tuning]
44. Results of 5-Shot Learning Regression
5. Experiment
* step size 0.01
[MAML] [Pretraining + Fine-tuning]
In the 5-shot learning, the difference is pervasive.
45. Results of 5-Shot Learning Regression
5. Experiment
* step size 0.01
In the 5-shot learning, the difference is pervasive.
Good predictions are also made for ranges not particularly seen.
[MAML] [Pretraining + Fine-tuning]
46. MAML Needs Only One Gradient Step
5. Experiment
Vanilla pretrained model adapted slowly,
but, the MAML method quickly adapted even in one gradient step.
47. MAML Needs Only One Gradient Step
5. Experiment
Vanilla pretrained model adapted slowly,
but, the MAML method quickly adapted even in one gradient step.
Meta parameter theta of MAML is more sensitive
than pre-updated theta of vanilla pretrained model.
48. MSE of the Meta-Parameters
• The performance of the-meta
parameters was not improved
much in training.
• However, the performance of
the single gradient updated
parameters
started on meta-parameters
improved as training progressed.
5. Experiment
49. Experiments on Reinforcement Learning
• rllab benchmark suite
• Neural network policy with two hidden layers of size 100 with ReLU
• Gradients updates are computed using vanilla policy gradient
(REINFORCE)
and trust-region policy (TRPO) optimization as meta-optimizer.
• Comparison
• Pretraining one policy on all of the tasks and fine-tuning
• Training a policy from randomly initialized weights
• Oracle policy
5. Experiment
53. Why 1st Order Approximation Works?
• [Montufar14] The objective function of ReLU network is locally linear.
• So as the other piecewise linear activations (maxout, leaky relu, ...) are
used.
• Because, any composite function with linear or piecewise linear
functions should be locally linear.
•
6. Discussion
magnification
Solid: Shallow network with 20 hidden units
Dotted: deep network (2 hidden layers) with 10 hidden units at each layer
54. Why 1st Order Approximation Works?
• [Montufar14] The objective function of ReLU network is locally linear.
• So as the other piecewise linear activations (maxout, leaky relu, ...) are
used.
• Because, any composite function with linear or piecewise linear
functions should be locally linear.
• Hence, the Hessian matrix is almost zero.
= Hessian matrix is meaningless.
6. Discussion
magnification
Solid: Shallow network with 20 hidden units
Dotted: deep network (2 hidden layers) with 10 hidden units at each layer
55. Why 1st Order Approximation Works?
• If we change the activation function from ReLU to sigmoid?
6. Discussion
56. Why 1st Order Approximation Works?
• If we change the activation function from ReLU to sigmoid?
6. Discussion
ReLU sigmoid
w/o 1st order approx. 0.1183 0.1579
w 1st order approx. 0.1183 0.1931
ReLU sigmoid
w/o 1st order approx. 61 96
w 1st order approx. 53 (13% better) 52 (46% better)
[ MSE Comparison ]
[ Learning Time Comparison (minutes) ]
57. Why the Authors Focused on Deep Networks?
• The authors said their model (MAML) is model-agnostic.
• But, why they focused on deep networks?
• If it is not deep networks, what kind of problem can occur?
6. Discussion
58. Why the Authors Focused on Deep Networks?
• Simple linear regression model (y = θx) can be used in the MAML.
• Parametrized, differentiable loss function.
6. Discussion
59. Why the Authors Focused on Deep Networks?
• Simple linear regression model (y = θx) can be used in the MAML.
• Parametrized, differentiable loss function.
• Let
6. Discussion
60. Why the Authors Focused on Deep Networks?
• Simple linear regression model (y = θx) can be used in the MAML.
• Parametrized, differentiable loss function.
• Let
• Then, meta parameter θ will be 1.25.
6. Discussion
61. Why the Authors Focused on Deep Networks?
• Let’s extend the setting.
• The sample is drawn from for the task .
• Then the meta parameter .
6. Discussion
62. Why the Authors Focused on Deep Networks?
• Let’s extend the setting.
• The sample is drawn from for the task .
• Then the meta parameter .
• The MAML use fixed learning rate.
• More than 2 gradient steps are needed when is far from .
6. Discussion
63. Why the Authors Focused on Deep Networks?
• Let’s extend the setting.
• The sample is drawn from for the task .
• Then the meta parameter .
• The MAML use fixed learning rate.
• More than 2 gradient steps are needed when is far from .
• However, because the dimension of the deep model is large,
there may be a meta parameter
that can quickly adapt to any task parameter .
• This is why we think the authors focused on the deep models.
6. Discussion
64. • However, using the deep model doesn’t eliminate the problem
completely.
• The authors mentioned about the generalization problem for
extrapolated tasks.
• The closer the task is to out-of-distribution, the worse the performance.
• How can we solve this problem?
Why the Authors Focused on Deep Networks?
6. Discussion
65. Update Meta Parameters with N-steps
• The MAML updates using only one gradient step
when updating the meta parameter.
• The authors say that you can update meta parameters using n
gradient steps.
6. Discussion
66. Update Meta Parameters with N-steps
• Will it show better performance to update with N gradient steps?
• If so, why the authors didn’t use this method in experiments?
6. Discussion
67. Update Meta Parameters with N-steps
• Will it show better performance to update with N gradient steps?
• If so, why the authors didn’t use this method in experiments?
• If we apply this method to MAML, the MAML model will find a
sensitive meta parameter in the N-step update.
6. Discussion
68. Update Meta Parameters with N-steps
• Will it show better performance to update with N gradient steps?
• If so, why the authors didn’t use this method in experiments?
• If we apply this method to MAML, the MAML model will find a
sensitive meta parameter in the N-step update.
• Also, we should calculate (n + 1)-order partial derivatives.
• (c.f.) Hessian matrix is 2-order partial derivatives.
• The computation of (n + 1)-order partial derivatives is
computationally expensive.
6. Discussion
69. Update Meta Parameters with N-steps
• How about using 1st order approximation with ReLU activation
and updating meta parameter with N-steps?
6. Discussion
70. Update Meta Parameters with N-steps
• N-step(s) updates with 1st order approximation.
• Of course, we use ReLU for activation function.
6. Discussion
1-step 2-steps 3-steps
Learning Time 53 73 92
71. Update Meta Parameters with N-steps
• N-step(s) updates with 1st order approximation.
• Of course, we use ReLU for activation function.
6. Discussion
1-step 2-steps 3-steps
Learning Time 53 73 92
the performance of the meta-parameter
getting worser when n grows.
72. Update Meta Parameters with N-steps
• N-step(s) updates with 1st order approximation.
• Of course, we use ReLU for activation function.
6. Discussion
1-step 2-steps 3-steps
Learning Time 53 73 92
Final performance may be unrelated to N.
(More experiments are needed for confirming this issue.)
73. Unsupervised Learning
• The authors said that the MAML is model agnostic.
• They showed the MAML is worked on supervised learning.
• How about unsupervised learning?
6. Discussion
74. • The authors said that the MAML is model agnostic.
• They showed the MAML is worked on supervised learning.
• How about unsupervised learning?
• We can use autoencoder.
• Autoencoder can make unsupervised learning
be regarded as supervised learning.
Unsupervised Learning
6. Discussion
75. • MAML is really slow algorithm even with 1st order approximation.
• How can we make the MAML framework be faster?
How to Make the MAML be faster?
6. Discussion
76. • MAML is really slow algorithm even with 1st order approximation.
• How can we make the MAML framework be faster?
• [Zhenguo 2017] suggests
optimizing learning rate alpha.
• Instead of putting alpha
as a hyperparameter,
train the vector alpha
as the parameters
with other parameters.
• It outperforms the vanilla MAML.
How to Make the MAML be faster?
6. Discussion
77. Application
• The MAML is good for a few-shot learning problem.
• Can you guess any applications of few-shot learning problem?
6. Discussion
78. Application
• The MAML is good for a few-shot learning problem.
• Can you guess any applications of few-shot learning problem?
• We think one of the most effective few-shot learning problem is
real estate price prediction.
• There are a lot of transactions related to big apartments.
• But, there are many transactions in small apartments.
So, it is difficult to predict.
• Using MAML seems to solve the problem of insufficient sample size of
small apartments.
• Also, other problems would be solved by MAML.
6. Discussion
80. Reference
• Finn, Chelsea, Pieter Abbeel, and Sergey Levine. "Model-agnostic
meta-learning for fast adaptation of deep networks." arXiv preprint
arXiv:1703.03400 (2017).
• Montufar, Guido F., et al. "On the number of linear regions of deep
neural networks." Advances in neural information processing
systems. 2014.
• Baldi, Pierre. "Autoencoders, unsupervised learning, and deep
architectures." Proceedings of ICML workshop on unsupervised and
transfer learning. 2012.
• Li, Zhenguo, et al. "Meta-sgd: Learning to learn quickly for few shot
learning." arXiv preprint arXiv:1707.09835 (2017)