SlideShare a Scribd company logo
Lifelong Learning for Multi-task RL
2020. 10. 12
Jeong-Gwan Lee
1 / 17
Contents
2 / 17
• Lifelong Learning
• Efficient Lifelong Learning Algorithm (ELLA[1]) [ICML’13]
• Online Multi-Task Learning for PG Methods (PG-ELLA[2]) [ICML’14]
• Limitation & Future Direction
[2] Ammar, Haitham Bou, et al. "Online multi-task learning for policy gradient methods." ICML, 2014
[1] Ruvolo, Paul, and Eric Eaton. "ELLA: An efficient lifelong learning algorithm." ICML. 2013.
Motivation
3 / 17
• Transfer Learning
• Data in the source domain helps learning the target domain
• Less data is needed in the target domain
• Tasks must be similar
• Unidirectional: Source à Target
• Multi-Task Learning
• Given M tasks, joint-train them simultaneously
• Increase overall performance across all tasks
• Lifelong Learning
• Learn sequential multiple tasks in lifetime (Not parallel)
• Learn new tasks using previous learned knowledge
• Can be evaluated at any time on any previously seen tasks
No interest of
Performance
Task
Provider
Agent
Knowledge Base
Transfer Learning
Multi-task Learning
ELLA[1]: An Efficient Lifelong Learning Algorithm
4 / 17
• Multi-task Supervised Learning Problem
• Linear Regression
• Logistic Regression
[1] Ruvolo, Paul, and Eric Eaton. "ELLA: An efficient lifelong learning algorithm." ICML. 2013.
Task parameters
5 / 17
• Assume task parameters
• is latent matrix shared across all tasks
• 𝑑 is the input dimension
• Each of the 𝑘!" columns is a latent basis
• is a task weight vector
• Each weight selects how much to use each basis
d
k Run task
Wag tail &
scratch hind leg task
Wag tail & Run
& Bark taskLatent matrix 𝐿
Task weight vector 𝒔(")
(d, 1) (d, k) (k, 1)
Learning Objective
• Goal : Learn latent matrix 𝑳 and task weight vector 𝒔(&) efficiently
• Each task weight vector 𝒔(&) are encouraged to be sparse to ensure that latent
matrix 𝑳 captures a maximal reusable basis.
• Since Eq. (1) is not jointly convex in and ,
the alternative convex optimization is needed.
(a) While holding fixed, update
(b) While holding fixed, update
(a) Update 𝒔(")
(b) Update 𝑳
6 / 17
7 / 17
First Inefficiency
Sample a task
Update 𝒔($), while 𝑳% fixed through 𝐾 𝜽 $ + 𝑙1
Iteration 1
Iteration 2
Sample a task
Initialize
First Inefficiency : While updating 𝑳, 𝒔(!), the explicit dependence
for all previous tasks through inner summation is inefficient.
If we get the optimal parameter for task 𝑡 = 1,
the optimal value will not be changed until revisiting task 𝑡 = 1.
Let task-specific objective for task 𝑡 be
Update 𝑳$, while 𝒔($) fixed through 𝑒&(𝑳)
Update 𝒔(')
, while 𝑳$ fixed through 𝐾 𝜽 '
+ 𝑙1
Update 𝒔($)
, while 𝑳$ fixed through 𝐾 𝜽 $
+ 𝑙1
Update 𝑳', while 𝒔($), 𝒔(') fixed through 𝑒&(𝑳)
Iteration 0
inner summation
inner summation
𝒔($) 𝑳$ 𝒔($) 𝑳' …ß 𝒔($) 𝑳(
8 / 17
Resolving First Inefficiency
The optimal predictor for task 𝑡 is
Solution : Use second-order Taylor expansion form
The second-order Taylor Expansion of
The linear term is ignored,
since the gradient at the optimal point is 0.
Therefore, Eq. (1) is changed into
where
Once we calculate at that iteration, It’s easy to re-optimize until revisiting task t.
9 / 17
Second Inefficiency
• To update a single candidate 𝐿, the value of all 𝑠(!) ’s should be recomputed.
(which will become expensive as the number of tasks learned T increases.)
Solution : Calculate 𝑠(!) when the task t is last encountered. (𝑳 will converged as the iteration 𝒎 increases)
𝒔($)
𝑳$
𝒔(')
𝒔($)
𝑳'
𝒔(')𝒔($) 𝒔(()
𝑳(
…
𝑳%
Re-optimize
𝒔($)
𝑳$
𝒔(')
𝑳'
𝒔(()
𝑳(
…
𝑳%
Compute 𝒔(𝒕) solving LASSO
How to compute 𝑳 ?
Original Proposed
Re-optimize
10 / 17
How to Update 𝑳?
• This procedure yields the updated column-wised vectorization of 𝐿 as 𝑨01 𝒃
𝑨 𝒃
Matrix Differentiation
(d, d) (d, k) (k, 1) (1, k) (d, d) (d, 1) (1, k)
(d*k, 1)(d*k, d*k) (d*k, 1)
(d*k, d*k)
(d*k, 1)
(d*k, 1)
11 / 17
How to Update 𝑳?
1. Get Task 𝒕 and new data 𝑿*+,, 𝒚*+,
2. Get optimal predictor 𝜽⋆
(")
and Hessian 𝐃(")
𝜽⋆
(")
𝜽⋆
(")
𝜽⋆
(")
3. Update task weight vector 𝒔(")
4. Update latent matrix 𝑳
Remove previous 𝜽⋆
(")
, 𝑫("), 𝒔(")
Update with re-calculated 𝜽⋆
(")
, 𝑫("), 𝒔(")
Policy Gradient RL
12 / 17
• Policy Gradient RL
Goal : Learn optimal policy that maximizes the expected return
13 / 17
PG-ELLA[2]
• Following ELLA,
1. Get optimal policy 𝜶(!) for task 𝑡 and change Taylor expansion’s second-order term
2. To evaluate 𝐿, calculate the only last encountered 𝑠(!)
• Definition
[2] Ammar, Haitham Bou, et al. "Online multi-task learning for policy gradient methods." ICML, 2014
14 / 17
How to Update 𝑳?
2. Get optimal policy 𝜶(") and Hessian 𝐃(")
3. Update task weight vector 𝒔(")
4. Update latent matrix 𝑳
1. Get Task 𝒕 and new trajectories 𝕋(") and returns ℜ(")
Remove previous 𝜽⋆
(")
, 𝑫("), 𝒔(")
Update re-calculated 𝜽⋆
(")
, 𝑫(")
, 𝒔(")
PG-ELLA : Experiment
• Benchmarks
Simple Mass Damper
Cart-Pole
3-Link Inverted Pen.
Quadrotor
• Sample 30 tasks for domain by varying system parameter
• The dimension 𝒌 of latent matrix 𝑳 was chosen for each
domain via cross-validation (k < 30)
• M% tasks observed : Only M% tasks can update 𝑳
• Standard PG : Independent 30 policies
• System parameter ranges
Quadrotor
15 / 17
Average Performance
on all 30 tasks
Quadrotor
• The more tasks contribute to update latent matrix L,
the better the overall performance.
M% 1-M%
Limitation & Future Direction
16
• Limitation
• Linear Model
• Simple Environments & Number of tasks is not much (only 30)
• Future Direction
1. Learnable 𝑘 (# of basis) considering # of tasks and task complexity
• Dimension 𝑘 can increase or decrease while iterations.
2. Deep version of latent matrix 𝑳
• Hierarchical latent model

More Related Content

What's hot

Maximum Likelihood Calibration of the Hercules Data Set
Maximum Likelihood Calibration of the Hercules Data SetMaximum Likelihood Calibration of the Hercules Data Set
Maximum Likelihood Calibration of the Hercules Data Set
Christopher Garling
 
Memory Polynomial Based Adaptive Digital Predistorter
Memory Polynomial Based Adaptive Digital PredistorterMemory Polynomial Based Adaptive Digital Predistorter
Memory Polynomial Based Adaptive Digital Predistorter
IJERA Editor
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint febimu409
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
DB Tsai
 
presentation
presentationpresentation
presentationjie ren
 
A Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
A Baye's Theorem Based Node Selection for Load Balancing in Cloud EnvironmentA Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
A Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
neirew J
 
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENTA BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
hiij
 
Dce a novel delay correlation
Dce a novel delay correlationDce a novel delay correlation
Dce a novel delay correlation
ijdpsjournal
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
AlaaZ
 
Instance based learning
Instance based learningInstance based learning
Instance based learningSlideshare
 
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
IJCNCJournal
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learning
SeungHyeok Baek
 
Genetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingGenetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingLogin Technoligies
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
Kai-Wen Zhao
 
Cloud Computing and PSo
Cloud Computing and PSoCloud Computing and PSo
Cloud Computing and PSo
surya kumar palla
 
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONA COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
ijsc
 
Task Scheduling in Grid Computing.
Task Scheduling in Grid Computing.Task Scheduling in Grid Computing.
Task Scheduling in Grid Computing.
Ramandeep Kaur
 
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization TechniqueDynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
journalBEEI
 
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
Preferred Networks
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
Sangwoo Mo
 

What's hot (20)

Maximum Likelihood Calibration of the Hercules Data Set
Maximum Likelihood Calibration of the Hercules Data SetMaximum Likelihood Calibration of the Hercules Data Set
Maximum Likelihood Calibration of the Hercules Data Set
 
Memory Polynomial Based Adaptive Digital Predistorter
Memory Polynomial Based Adaptive Digital PredistorterMemory Polynomial Based Adaptive Digital Predistorter
Memory Polynomial Based Adaptive Digital Predistorter
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint feb
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
presentation
presentationpresentation
presentation
 
A Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
A Baye's Theorem Based Node Selection for Load Balancing in Cloud EnvironmentA Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
A Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
 
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENTA BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
 
Dce a novel delay correlation
Dce a novel delay correlationDce a novel delay correlation
Dce a novel delay correlation
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learning
 
Genetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingGenetic Algorithm for Process Scheduling
Genetic Algorithm for Process Scheduling
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Cloud Computing and PSo
Cloud Computing and PSoCloud Computing and PSo
Cloud Computing and PSo
 
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONA COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
 
Task Scheduling in Grid Computing.
Task Scheduling in Grid Computing.Task Scheduling in Grid Computing.
Task Scheduling in Grid Computing.
 
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization TechniqueDynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
 
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 

Similar to Lifelong learning for multi-task learning

Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
alpinedatalabs
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
DB Tsai
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Sangwoo Mo
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
Pierre de Lacaze
 
MULTI-OBJECTIVE ANALYSIS OF INTEGRATED SUPPLY CHAIN PROBLEM
MULTI-OBJECTIVE ANALYSIS OF INTEGRATED SUPPLY CHAIN PROBLEMMULTI-OBJECTIVE ANALYSIS OF INTEGRATED SUPPLY CHAIN PROBLEM
MULTI-OBJECTIVE ANALYSIS OF INTEGRATED SUPPLY CHAIN PROBLEM
National Institute of Technology Calicut
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
Gabriela Agustini
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
MLconf
 
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
Jeong-Gwan Lee
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
chandsek666
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
Young-Geun Choi
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
datamantra
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
Qingsong Guo
 
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
The Statistical and Applied Mathematical Sciences Institute
 
Chapter One.pdf
Chapter One.pdfChapter One.pdf
Chapter One.pdf
abay golla
 
19.pptx
19.pptx19.pptx
Linear Programming- Leacture-16-lp1.pptx
Linear Programming- Leacture-16-lp1.pptxLinear Programming- Leacture-16-lp1.pptx
Linear Programming- Leacture-16-lp1.pptx
SarahKoech1
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
InVID Project
 
Algorithms overview
Algorithms overviewAlgorithms overview
Algorithms overview
Deborah Akuoko
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
Jinwon Lee
 

Similar to Lifelong learning for multi-task learning (20)

Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
MULTI-OBJECTIVE ANALYSIS OF INTEGRATED SUPPLY CHAIN PROBLEM
MULTI-OBJECTIVE ANALYSIS OF INTEGRATED SUPPLY CHAIN PROBLEMMULTI-OBJECTIVE ANALYSIS OF INTEGRATED SUPPLY CHAIN PROBLEM
MULTI-OBJECTIVE ANALYSIS OF INTEGRATED SUPPLY CHAIN PROBLEM
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
 
MapReduce
MapReduceMapReduce
MapReduce
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
 
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
 
Chapter One.pdf
Chapter One.pdfChapter One.pdf
Chapter One.pdf
 
19.pptx
19.pptx19.pptx
19.pptx
 
Linear Programming- Leacture-16-lp1.pptx
Linear Programming- Leacture-16-lp1.pptxLinear Programming- Leacture-16-lp1.pptx
Linear Programming- Leacture-16-lp1.pptx
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
 
Algorithms overview
Algorithms overviewAlgorithms overview
Algorithms overview
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 

Recently uploaded

Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
VENKATESHvenky89705
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
ydteq
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
bakpo1
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
Jayaprasanna4
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
JoytuBarua2
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
Pipe Restoration Solutions
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation & Control
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
Kamal Acharya
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
SupreethSP4
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
WENKENLI1
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 

Recently uploaded (20)

Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
road safety engineering r s e unit 3.pdf
road safety engineering  r s e unit 3.pdfroad safety engineering  r s e unit 3.pdf
road safety engineering r s e unit 3.pdf
 
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
一比一原版(UofT毕业证)多伦多大学毕业证成绩单如何办理
 
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
一比一原版(SFU毕业证)西蒙菲莎大学毕业证成绩单如何办理
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
ethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.pptethical hacking-mobile hacking methods.ppt
ethical hacking-mobile hacking methods.ppt
 
Planning Of Procurement o different goods and services
Planning Of Procurement o different goods and servicesPlanning Of Procurement o different goods and services
Planning Of Procurement o different goods and services
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
Water Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdfWater Industry Process Automation and Control Monthly - May 2024.pdf
Water Industry Process Automation and Control Monthly - May 2024.pdf
 
Student information management system project report ii.pdf
Student information management system project report ii.pdfStudent information management system project report ii.pdf
Student information management system project report ii.pdf
 
Runway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptxRunway Orientation Based on the Wind Rose Diagram.pptx
Runway Orientation Based on the Wind Rose Diagram.pptx
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdfGoverning Equations for Fundamental Aerodynamics_Anderson2010.pdf
Governing Equations for Fundamental Aerodynamics_Anderson2010.pdf
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 

Lifelong learning for multi-task learning

  • 1. Lifelong Learning for Multi-task RL 2020. 10. 12 Jeong-Gwan Lee 1 / 17
  • 2. Contents 2 / 17 • Lifelong Learning • Efficient Lifelong Learning Algorithm (ELLA[1]) [ICML’13] • Online Multi-Task Learning for PG Methods (PG-ELLA[2]) [ICML’14] • Limitation & Future Direction [2] Ammar, Haitham Bou, et al. "Online multi-task learning for policy gradient methods." ICML, 2014 [1] Ruvolo, Paul, and Eric Eaton. "ELLA: An efficient lifelong learning algorithm." ICML. 2013.
  • 3. Motivation 3 / 17 • Transfer Learning • Data in the source domain helps learning the target domain • Less data is needed in the target domain • Tasks must be similar • Unidirectional: Source à Target • Multi-Task Learning • Given M tasks, joint-train them simultaneously • Increase overall performance across all tasks • Lifelong Learning • Learn sequential multiple tasks in lifetime (Not parallel) • Learn new tasks using previous learned knowledge • Can be evaluated at any time on any previously seen tasks No interest of Performance Task Provider Agent Knowledge Base Transfer Learning Multi-task Learning
  • 4. ELLA[1]: An Efficient Lifelong Learning Algorithm 4 / 17 • Multi-task Supervised Learning Problem • Linear Regression • Logistic Regression [1] Ruvolo, Paul, and Eric Eaton. "ELLA: An efficient lifelong learning algorithm." ICML. 2013.
  • 5. Task parameters 5 / 17 • Assume task parameters • is latent matrix shared across all tasks • 𝑑 is the input dimension • Each of the 𝑘!" columns is a latent basis • is a task weight vector • Each weight selects how much to use each basis d k Run task Wag tail & scratch hind leg task Wag tail & Run & Bark taskLatent matrix 𝐿 Task weight vector 𝒔(") (d, 1) (d, k) (k, 1)
  • 6. Learning Objective • Goal : Learn latent matrix 𝑳 and task weight vector 𝒔(&) efficiently • Each task weight vector 𝒔(&) are encouraged to be sparse to ensure that latent matrix 𝑳 captures a maximal reusable basis. • Since Eq. (1) is not jointly convex in and , the alternative convex optimization is needed. (a) While holding fixed, update (b) While holding fixed, update (a) Update 𝒔(") (b) Update 𝑳 6 / 17
  • 7. 7 / 17 First Inefficiency Sample a task Update 𝒔($), while 𝑳% fixed through 𝐾 𝜽 $ + 𝑙1 Iteration 1 Iteration 2 Sample a task Initialize First Inefficiency : While updating 𝑳, 𝒔(!), the explicit dependence for all previous tasks through inner summation is inefficient. If we get the optimal parameter for task 𝑡 = 1, the optimal value will not be changed until revisiting task 𝑡 = 1. Let task-specific objective for task 𝑡 be Update 𝑳$, while 𝒔($) fixed through 𝑒&(𝑳) Update 𝒔(') , while 𝑳$ fixed through 𝐾 𝜽 ' + 𝑙1 Update 𝒔($) , while 𝑳$ fixed through 𝐾 𝜽 $ + 𝑙1 Update 𝑳', while 𝒔($), 𝒔(') fixed through 𝑒&(𝑳) Iteration 0 inner summation inner summation 𝒔($) 𝑳$ 𝒔($) 𝑳' …ß 𝒔($) 𝑳(
  • 8. 8 / 17 Resolving First Inefficiency The optimal predictor for task 𝑡 is Solution : Use second-order Taylor expansion form The second-order Taylor Expansion of The linear term is ignored, since the gradient at the optimal point is 0. Therefore, Eq. (1) is changed into where Once we calculate at that iteration, It’s easy to re-optimize until revisiting task t.
  • 9. 9 / 17 Second Inefficiency • To update a single candidate 𝐿, the value of all 𝑠(!) ’s should be recomputed. (which will become expensive as the number of tasks learned T increases.) Solution : Calculate 𝑠(!) when the task t is last encountered. (𝑳 will converged as the iteration 𝒎 increases) 𝒔($) 𝑳$ 𝒔(') 𝒔($) 𝑳' 𝒔(')𝒔($) 𝒔(() 𝑳( … 𝑳% Re-optimize 𝒔($) 𝑳$ 𝒔(') 𝑳' 𝒔(() 𝑳( … 𝑳% Compute 𝒔(𝒕) solving LASSO How to compute 𝑳 ? Original Proposed Re-optimize
  • 10. 10 / 17 How to Update 𝑳? • This procedure yields the updated column-wised vectorization of 𝐿 as 𝑨01 𝒃 𝑨 𝒃 Matrix Differentiation (d, d) (d, k) (k, 1) (1, k) (d, d) (d, 1) (1, k) (d*k, 1)(d*k, d*k) (d*k, 1) (d*k, d*k) (d*k, 1) (d*k, 1)
  • 11. 11 / 17 How to Update 𝑳? 1. Get Task 𝒕 and new data 𝑿*+,, 𝒚*+, 2. Get optimal predictor 𝜽⋆ (") and Hessian 𝐃(") 𝜽⋆ (") 𝜽⋆ (") 𝜽⋆ (") 3. Update task weight vector 𝒔(") 4. Update latent matrix 𝑳 Remove previous 𝜽⋆ (") , 𝑫("), 𝒔(") Update with re-calculated 𝜽⋆ (") , 𝑫("), 𝒔(")
  • 12. Policy Gradient RL 12 / 17 • Policy Gradient RL Goal : Learn optimal policy that maximizes the expected return
  • 13. 13 / 17 PG-ELLA[2] • Following ELLA, 1. Get optimal policy 𝜶(!) for task 𝑡 and change Taylor expansion’s second-order term 2. To evaluate 𝐿, calculate the only last encountered 𝑠(!) • Definition [2] Ammar, Haitham Bou, et al. "Online multi-task learning for policy gradient methods." ICML, 2014
  • 14. 14 / 17 How to Update 𝑳? 2. Get optimal policy 𝜶(") and Hessian 𝐃(") 3. Update task weight vector 𝒔(") 4. Update latent matrix 𝑳 1. Get Task 𝒕 and new trajectories 𝕋(") and returns ℜ(") Remove previous 𝜽⋆ (") , 𝑫("), 𝒔(") Update re-calculated 𝜽⋆ (") , 𝑫(") , 𝒔(")
  • 15. PG-ELLA : Experiment • Benchmarks Simple Mass Damper Cart-Pole 3-Link Inverted Pen. Quadrotor • Sample 30 tasks for domain by varying system parameter • The dimension 𝒌 of latent matrix 𝑳 was chosen for each domain via cross-validation (k < 30) • M% tasks observed : Only M% tasks can update 𝑳 • Standard PG : Independent 30 policies • System parameter ranges Quadrotor 15 / 17 Average Performance on all 30 tasks Quadrotor • The more tasks contribute to update latent matrix L, the better the overall performance. M% 1-M%
  • 16. Limitation & Future Direction 16 • Limitation • Linear Model • Simple Environments & Number of tasks is not much (only 30) • Future Direction 1. Learnable 𝑘 (# of basis) considering # of tasks and task complexity • Dimension 𝑘 can increase or decrease while iterations. 2. Deep version of latent matrix 𝑳 • Hierarchical latent model