SlideShare a Scribd company logo
1 of 16
Download to read offline
Lifelong Learning for Multi-task RL
2020. 10. 12
Jeong-Gwan Lee
1 / 17
Contents
2 / 17
• Lifelong Learning
• Efficient Lifelong Learning Algorithm (ELLA[1]) [ICML’13]
• Online Multi-Task Learning for PG Methods (PG-ELLA[2]) [ICML’14]
• Limitation & Future Direction
[2] Ammar, Haitham Bou, et al. "Online multi-task learning for policy gradient methods." ICML, 2014
[1] Ruvolo, Paul, and Eric Eaton. "ELLA: An efficient lifelong learning algorithm." ICML. 2013.
Motivation
3 / 17
• Transfer Learning
• Data in the source domain helps learning the target domain
• Less data is needed in the target domain
• Tasks must be similar
• Unidirectional: Source à Target
• Multi-Task Learning
• Given M tasks, joint-train them simultaneously
• Increase overall performance across all tasks
• Lifelong Learning
• Learn sequential multiple tasks in lifetime (Not parallel)
• Learn new tasks using previous learned knowledge
• Can be evaluated at any time on any previously seen tasks
No interest of
Performance
Task
Provider
Agent
Knowledge Base
Transfer Learning
Multi-task Learning
ELLA[1]: An Efficient Lifelong Learning Algorithm
4 / 17
• Multi-task Supervised Learning Problem
• Linear Regression
• Logistic Regression
[1] Ruvolo, Paul, and Eric Eaton. "ELLA: An efficient lifelong learning algorithm." ICML. 2013.
Task parameters
5 / 17
• Assume task parameters
• is latent matrix shared across all tasks
• 𝑑 is the input dimension
• Each of the 𝑘!" columns is a latent basis
• is a task weight vector
• Each weight selects how much to use each basis
d
k Run task
Wag tail &
scratch hind leg task
Wag tail & Run
& Bark taskLatent matrix 𝐿
Task weight vector 𝒔(")
(d, 1) (d, k) (k, 1)
Learning Objective
• Goal : Learn latent matrix 𝑳 and task weight vector 𝒔(&) efficiently
• Each task weight vector 𝒔(&) are encouraged to be sparse to ensure that latent
matrix 𝑳 captures a maximal reusable basis.
• Since Eq. (1) is not jointly convex in and ,
the alternative convex optimization is needed.
(a) While holding fixed, update
(b) While holding fixed, update
(a) Update 𝒔(")
(b) Update 𝑳
6 / 17
7 / 17
First Inefficiency
Sample a task
Update 𝒔($), while 𝑳% fixed through 𝐾 𝜽 $ + 𝑙1
Iteration 1
Iteration 2
Sample a task
Initialize
First Inefficiency : While updating 𝑳, 𝒔(!), the explicit dependence
for all previous tasks through inner summation is inefficient.
If we get the optimal parameter for task 𝑡 = 1,
the optimal value will not be changed until revisiting task 𝑡 = 1.
Let task-specific objective for task 𝑡 be
Update 𝑳$, while 𝒔($) fixed through 𝑒&(𝑳)
Update 𝒔(')
, while 𝑳$ fixed through 𝐾 𝜽 '
+ 𝑙1
Update 𝒔($)
, while 𝑳$ fixed through 𝐾 𝜽 $
+ 𝑙1
Update 𝑳', while 𝒔($), 𝒔(') fixed through 𝑒&(𝑳)
Iteration 0
inner summation
inner summation
𝒔($) 𝑳$ 𝒔($) 𝑳' …ß 𝒔($) 𝑳(
8 / 17
Resolving First Inefficiency
The optimal predictor for task 𝑡 is
Solution : Use second-order Taylor expansion form
The second-order Taylor Expansion of
The linear term is ignored,
since the gradient at the optimal point is 0.
Therefore, Eq. (1) is changed into
where
Once we calculate at that iteration, It’s easy to re-optimize until revisiting task t.
9 / 17
Second Inefficiency
• To update a single candidate 𝐿, the value of all 𝑠(!) ’s should be recomputed.
(which will become expensive as the number of tasks learned T increases.)
Solution : Calculate 𝑠(!) when the task t is last encountered. (𝑳 will converged as the iteration 𝒎 increases)
𝒔($)
𝑳$
𝒔(')
𝒔($)
𝑳'
𝒔(')𝒔($) 𝒔(()
𝑳(
…
𝑳%
Re-optimize
𝒔($)
𝑳$
𝒔(')
𝑳'
𝒔(()
𝑳(
…
𝑳%
Compute 𝒔(𝒕) solving LASSO
How to compute 𝑳 ?
Original Proposed
Re-optimize
10 / 17
How to Update 𝑳?
• This procedure yields the updated column-wised vectorization of 𝐿 as 𝑨01 𝒃
𝑨 𝒃
Matrix Differentiation
(d, d) (d, k) (k, 1) (1, k) (d, d) (d, 1) (1, k)
(d*k, 1)(d*k, d*k) (d*k, 1)
(d*k, d*k)
(d*k, 1)
(d*k, 1)
11 / 17
How to Update 𝑳?
1. Get Task 𝒕 and new data 𝑿*+,, 𝒚*+,
2. Get optimal predictor 𝜽⋆
(")
and Hessian 𝐃(")
𝜽⋆
(")
𝜽⋆
(")
𝜽⋆
(")
3. Update task weight vector 𝒔(")
4. Update latent matrix 𝑳
Remove previous 𝜽⋆
(")
, 𝑫("), 𝒔(")
Update with re-calculated 𝜽⋆
(")
, 𝑫("), 𝒔(")
Policy Gradient RL
12 / 17
• Policy Gradient RL
Goal : Learn optimal policy that maximizes the expected return
13 / 17
PG-ELLA[2]
• Following ELLA,
1. Get optimal policy 𝜶(!) for task 𝑡 and change Taylor expansion’s second-order term
2. To evaluate 𝐿, calculate the only last encountered 𝑠(!)
• Definition
[2] Ammar, Haitham Bou, et al. "Online multi-task learning for policy gradient methods." ICML, 2014
14 / 17
How to Update 𝑳?
2. Get optimal policy 𝜶(") and Hessian 𝐃(")
3. Update task weight vector 𝒔(")
4. Update latent matrix 𝑳
1. Get Task 𝒕 and new trajectories 𝕋(") and returns ℜ(")
Remove previous 𝜽⋆
(")
, 𝑫("), 𝒔(")
Update re-calculated 𝜽⋆
(")
, 𝑫(")
, 𝒔(")
PG-ELLA : Experiment
• Benchmarks
Simple Mass Damper
Cart-Pole
3-Link Inverted Pen.
Quadrotor
• Sample 30 tasks for domain by varying system parameter
• The dimension 𝒌 of latent matrix 𝑳 was chosen for each
domain via cross-validation (k < 30)
• M% tasks observed : Only M% tasks can update 𝑳
• Standard PG : Independent 30 policies
• System parameter ranges
Quadrotor
15 / 17
Average Performance
on all 30 tasks
Quadrotor
• The more tasks contribute to update latent matrix L,
the better the overall performance.
M% 1-M%
Limitation & Future Direction
16
• Limitation
• Linear Model
• Simple Environments & Number of tasks is not much (only 30)
• Future Direction
1. Learnable 𝑘 (# of basis) considering # of tasks and task complexity
• Dimension 𝑘 can increase or decrease while iterations.
2. Deep version of latent matrix 𝑳
• Hierarchical latent model

More Related Content

What's hot

Maximum Likelihood Calibration of the Hercules Data Set
Maximum Likelihood Calibration of the Hercules Data SetMaximum Likelihood Calibration of the Hercules Data Set
Maximum Likelihood Calibration of the Hercules Data SetChristopher Garling
 
Memory Polynomial Based Adaptive Digital Predistorter
Memory Polynomial Based Adaptive Digital PredistorterMemory Polynomial Based Adaptive Digital Predistorter
Memory Polynomial Based Adaptive Digital PredistorterIJERA Editor
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint febimu409
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache SparkDB Tsai
 
presentation
presentationpresentation
presentationjie ren
 
A Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
A Baye's Theorem Based Node Selection for Load Balancing in Cloud EnvironmentA Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
A Baye's Theorem Based Node Selection for Load Balancing in Cloud Environmentneirew J
 
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENTA BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENThiij
 
Dce a novel delay correlation
Dce a novel delay correlationDce a novel delay correlation
Dce a novel delay correlationijdpsjournal
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
 
Instance based learning
Instance based learningInstance based learning
Instance based learningSlideshare
 
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...IJCNCJournal
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learningSeungHyeok Baek
 
Genetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingGenetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingLogin Technoligies
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningKai-Wen Zhao
 
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONA COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONijsc
 
Task Scheduling in Grid Computing.
Task Scheduling in Grid Computing.Task Scheduling in Grid Computing.
Task Scheduling in Grid Computing.Ramandeep Kaur
 
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization TechniqueDynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization TechniquejournalBEEI
 
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...Preferred Networks
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sangwoo Mo
 

What's hot (20)

Maximum Likelihood Calibration of the Hercules Data Set
Maximum Likelihood Calibration of the Hercules Data SetMaximum Likelihood Calibration of the Hercules Data Set
Maximum Likelihood Calibration of the Hercules Data Set
 
Memory Polynomial Based Adaptive Digital Predistorter
Memory Polynomial Based Adaptive Digital PredistorterMemory Polynomial Based Adaptive Digital Predistorter
Memory Polynomial Based Adaptive Digital Predistorter
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint feb
 
2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark2014-06-20 Multinomial Logistic Regression with Apache Spark
2014-06-20 Multinomial Logistic Regression with Apache Spark
 
presentation
presentationpresentation
presentation
 
A Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
A Baye's Theorem Based Node Selection for Load Balancing in Cloud EnvironmentA Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
A Baye's Theorem Based Node Selection for Load Balancing in Cloud Environment
 
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENTA BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
A BAYE'S THEOREM BASED NODE SELECTION FOR LOAD BALANCING IN CLOUD ENVIRONMENT
 
Dce a novel delay correlation
Dce a novel delay correlationDce a novel delay correlation
Dce a novel delay correlation
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
A CRITICAL IMPROVEMENT ON OPEN SHOP SCHEDULING ALGORITHM FOR ROUTING IN INTER...
 
deep reinforcement learning with double q learning
deep reinforcement learning with double q learningdeep reinforcement learning with double q learning
deep reinforcement learning with double q learning
 
Genetic Algorithm for Process Scheduling
Genetic Algorithm for Process SchedulingGenetic Algorithm for Process Scheduling
Genetic Algorithm for Process Scheduling
 
Deep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-LearningDeep Reinforcement Learning: Q-Learning
Deep Reinforcement Learning: Q-Learning
 
Cloud Computing and PSo
Cloud Computing and PSoCloud Computing and PSo
Cloud Computing and PSo
 
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTIONA COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
A COMPARISON OF PARTICLE SWARM OPTIMIZATION AND DIFFERENTIAL EVOLUTION
 
Task Scheduling in Grid Computing.
Task Scheduling in Grid Computing.Task Scheduling in Grid Computing.
Task Scheduling in Grid Computing.
 
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization TechniqueDynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
Dynamic Economic Dispatch Assessment Using Particle Swarm Optimization Technique
 
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
PFN Summer Internship 2021 / Kohei Shinohara: Charge Transfer Modeling in Neu...
 
Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)Sharpness-aware minimization (SAM)
Sharpness-aware minimization (SAM)
 

Similar to Lifelong learning for multi-task learning

Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technicalalpinedatalabs
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkDB Tsai
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksSangwoo Mo
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsPierre de Lacaze
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)Jeong-Gwan Lee
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsYoung-Geun Choi
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Sparkdatamantra
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptxQingsong Guo
 
Chapter One.pdf
Chapter One.pdfChapter One.pdf
Chapter One.pdfabay golla
 
Linear Programming- Leacture-16-lp1.pptx
Linear Programming- Leacture-16-lp1.pptxLinear Programming- Leacture-16-lp1.pptx
Linear Programming- Leacture-16-lp1.pptxSarahKoech1
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016InVID Project
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 

Similar to Lifelong learning for multi-task learning (20)

Alpine Spark Implementation - Technical
Alpine Spark Implementation - TechnicalAlpine Spark Implementation - Technical
Alpine Spark Implementation - Technical
 
Multinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache SparkMultinomial Logistic Regression with Apache Spark
Multinomial Logistic Regression with Apache Spark
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
MULTI-OBJECTIVE ANALYSIS OF INTEGRATED SUPPLY CHAIN PROBLEM
MULTI-OBJECTIVE ANALYSIS OF INTEGRATED SUPPLY CHAIN PROBLEMMULTI-OBJECTIVE ANALYSIS OF INTEGRATED SUPPLY CHAIN PROBLEM
MULTI-OBJECTIVE ANALYSIS OF INTEGRATED SUPPLY CHAIN PROBLEM
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
Unsupervised Curricula for Visual Meta Reinforcement Learning(CARML)
 
MapReduce
MapReduceMapReduce
MapReduce
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Chap 8. Optimization for training deep models
Chap 8. Optimization for training deep modelsChap 8. Optimization for training deep models
Chap 8. Optimization for training deep models
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
learned optimizer.pptx
learned optimizer.pptxlearned optimizer.pptx
learned optimizer.pptx
 
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
GDRR Opening Workshop - Deep Reinforcement Learning for Asset Based Modeling ...
 
Chapter One.pdf
Chapter One.pdfChapter One.pdf
Chapter One.pdf
 
19.pptx
19.pptx19.pptx
19.pptx
 
Linear Programming- Leacture-16-lp1.pptx
Linear Programming- Leacture-16-lp1.pptxLinear Programming- Leacture-16-lp1.pptx
Linear Programming- Leacture-16-lp1.pptx
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
 
Algorithms overview
Algorithms overviewAlgorithms overview
Algorithms overview
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 

Recently uploaded

Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxMustafa Ahmed
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisDr.Costas Sachpazis
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfJNTUA
 
DBMS-Report on Student management system.pptx
DBMS-Report on Student management system.pptxDBMS-Report on Student management system.pptx
DBMS-Report on Student management system.pptxrajjais1221
 
Databricks Generative AI Fundamentals .pdf
Databricks Generative AI Fundamentals  .pdfDatabricks Generative AI Fundamentals  .pdf
Databricks Generative AI Fundamentals .pdfVinayVadlagattu
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalSwarnaSLcse
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfEr.Sonali Nasikkar
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfJNTUA
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsMathias Magdowski
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsVIEW
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfSkNahidulIslamShrabo
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationEmaan Sharma
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...IJECEIAES
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...archanaece3
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...ronahami
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...Amil baba
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxCHAIRMAN M
 
Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfJNTUA
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentationsj9399037128
 
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...Christo Ananth
 

Recently uploaded (20)

Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdfInvolute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
Involute of a circle,Square, pentagon,HexagonInvolute_Engineering Drawing.pdf
 
DBMS-Report on Student management system.pptx
DBMS-Report on Student management system.pptxDBMS-Report on Student management system.pptx
DBMS-Report on Student management system.pptx
 
Databricks Generative AI Fundamentals .pdf
Databricks Generative AI Fundamentals  .pdfDatabricks Generative AI Fundamentals  .pdf
Databricks Generative AI Fundamentals .pdf
 
CLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference ModalCLOUD COMPUTING SERVICES - Cloud Reference Modal
CLOUD COMPUTING SERVICES - Cloud Reference Modal
 
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdfInstruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
Instruct Nirmaana 24-Smart and Lean Construction Through Technology.pdf
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Filters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility ApplicationsFilters for Electromagnetic Compatibility Applications
Filters for Electromagnetic Compatibility Applications
 
What is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, FunctionsWhat is Coordinate Measuring Machine? CMM Types, Features, Functions
What is Coordinate Measuring Machine? CMM Types, Features, Functions
 
Working Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdfWorking Principle of Echo Sounder and Doppler Effect.pdf
Working Principle of Echo Sounder and Doppler Effect.pdf
 
History of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & ModernizationHistory of Indian Railways - the story of Growth & Modernization
History of Indian Railways - the story of Growth & Modernization
 
Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...Developing a smart system for infant incubators using the internet of things ...
Developing a smart system for infant incubators using the internet of things ...
 
5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...5G and 6G refer to generations of mobile network technology, each representin...
5G and 6G refer to generations of mobile network technology, each representin...
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
NO1 Best Powerful Vashikaran Specialist Baba Vashikaran Specialist For Love V...
 
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptxSLIDESHARE PPT-DECISION MAKING METHODS.pptx
SLIDESHARE PPT-DECISION MAKING METHODS.pptx
 
Geometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdfGeometric constructions Engineering Drawing.pdf
Geometric constructions Engineering Drawing.pdf
 
engineering chemistry power point presentation
engineering chemistry  power point presentationengineering chemistry  power point presentation
engineering chemistry power point presentation
 
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
Call for Papers - Journal of Electrical Systems (JES), E-ISSN: 1112-5209, ind...
 

Lifelong learning for multi-task learning

  • 1. Lifelong Learning for Multi-task RL 2020. 10. 12 Jeong-Gwan Lee 1 / 17
  • 2. Contents 2 / 17 • Lifelong Learning • Efficient Lifelong Learning Algorithm (ELLA[1]) [ICML’13] • Online Multi-Task Learning for PG Methods (PG-ELLA[2]) [ICML’14] • Limitation & Future Direction [2] Ammar, Haitham Bou, et al. "Online multi-task learning for policy gradient methods." ICML, 2014 [1] Ruvolo, Paul, and Eric Eaton. "ELLA: An efficient lifelong learning algorithm." ICML. 2013.
  • 3. Motivation 3 / 17 • Transfer Learning • Data in the source domain helps learning the target domain • Less data is needed in the target domain • Tasks must be similar • Unidirectional: Source à Target • Multi-Task Learning • Given M tasks, joint-train them simultaneously • Increase overall performance across all tasks • Lifelong Learning • Learn sequential multiple tasks in lifetime (Not parallel) • Learn new tasks using previous learned knowledge • Can be evaluated at any time on any previously seen tasks No interest of Performance Task Provider Agent Knowledge Base Transfer Learning Multi-task Learning
  • 4. ELLA[1]: An Efficient Lifelong Learning Algorithm 4 / 17 • Multi-task Supervised Learning Problem • Linear Regression • Logistic Regression [1] Ruvolo, Paul, and Eric Eaton. "ELLA: An efficient lifelong learning algorithm." ICML. 2013.
  • 5. Task parameters 5 / 17 • Assume task parameters • is latent matrix shared across all tasks • 𝑑 is the input dimension • Each of the 𝑘!" columns is a latent basis • is a task weight vector • Each weight selects how much to use each basis d k Run task Wag tail & scratch hind leg task Wag tail & Run & Bark taskLatent matrix 𝐿 Task weight vector 𝒔(") (d, 1) (d, k) (k, 1)
  • 6. Learning Objective • Goal : Learn latent matrix 𝑳 and task weight vector 𝒔(&) efficiently • Each task weight vector 𝒔(&) are encouraged to be sparse to ensure that latent matrix 𝑳 captures a maximal reusable basis. • Since Eq. (1) is not jointly convex in and , the alternative convex optimization is needed. (a) While holding fixed, update (b) While holding fixed, update (a) Update 𝒔(") (b) Update 𝑳 6 / 17
  • 7. 7 / 17 First Inefficiency Sample a task Update 𝒔($), while 𝑳% fixed through 𝐾 𝜽 $ + 𝑙1 Iteration 1 Iteration 2 Sample a task Initialize First Inefficiency : While updating 𝑳, 𝒔(!), the explicit dependence for all previous tasks through inner summation is inefficient. If we get the optimal parameter for task 𝑡 = 1, the optimal value will not be changed until revisiting task 𝑡 = 1. Let task-specific objective for task 𝑡 be Update 𝑳$, while 𝒔($) fixed through 𝑒&(𝑳) Update 𝒔(') , while 𝑳$ fixed through 𝐾 𝜽 ' + 𝑙1 Update 𝒔($) , while 𝑳$ fixed through 𝐾 𝜽 $ + 𝑙1 Update 𝑳', while 𝒔($), 𝒔(') fixed through 𝑒&(𝑳) Iteration 0 inner summation inner summation 𝒔($) 𝑳$ 𝒔($) 𝑳' …ß 𝒔($) 𝑳(
  • 8. 8 / 17 Resolving First Inefficiency The optimal predictor for task 𝑡 is Solution : Use second-order Taylor expansion form The second-order Taylor Expansion of The linear term is ignored, since the gradient at the optimal point is 0. Therefore, Eq. (1) is changed into where Once we calculate at that iteration, It’s easy to re-optimize until revisiting task t.
  • 9. 9 / 17 Second Inefficiency • To update a single candidate 𝐿, the value of all 𝑠(!) ’s should be recomputed. (which will become expensive as the number of tasks learned T increases.) Solution : Calculate 𝑠(!) when the task t is last encountered. (𝑳 will converged as the iteration 𝒎 increases) 𝒔($) 𝑳$ 𝒔(') 𝒔($) 𝑳' 𝒔(')𝒔($) 𝒔(() 𝑳( … 𝑳% Re-optimize 𝒔($) 𝑳$ 𝒔(') 𝑳' 𝒔(() 𝑳( … 𝑳% Compute 𝒔(𝒕) solving LASSO How to compute 𝑳 ? Original Proposed Re-optimize
  • 10. 10 / 17 How to Update 𝑳? • This procedure yields the updated column-wised vectorization of 𝐿 as 𝑨01 𝒃 𝑨 𝒃 Matrix Differentiation (d, d) (d, k) (k, 1) (1, k) (d, d) (d, 1) (1, k) (d*k, 1)(d*k, d*k) (d*k, 1) (d*k, d*k) (d*k, 1) (d*k, 1)
  • 11. 11 / 17 How to Update 𝑳? 1. Get Task 𝒕 and new data 𝑿*+,, 𝒚*+, 2. Get optimal predictor 𝜽⋆ (") and Hessian 𝐃(") 𝜽⋆ (") 𝜽⋆ (") 𝜽⋆ (") 3. Update task weight vector 𝒔(") 4. Update latent matrix 𝑳 Remove previous 𝜽⋆ (") , 𝑫("), 𝒔(") Update with re-calculated 𝜽⋆ (") , 𝑫("), 𝒔(")
  • 12. Policy Gradient RL 12 / 17 • Policy Gradient RL Goal : Learn optimal policy that maximizes the expected return
  • 13. 13 / 17 PG-ELLA[2] • Following ELLA, 1. Get optimal policy 𝜶(!) for task 𝑡 and change Taylor expansion’s second-order term 2. To evaluate 𝐿, calculate the only last encountered 𝑠(!) • Definition [2] Ammar, Haitham Bou, et al. "Online multi-task learning for policy gradient methods." ICML, 2014
  • 14. 14 / 17 How to Update 𝑳? 2. Get optimal policy 𝜶(") and Hessian 𝐃(") 3. Update task weight vector 𝒔(") 4. Update latent matrix 𝑳 1. Get Task 𝒕 and new trajectories 𝕋(") and returns ℜ(") Remove previous 𝜽⋆ (") , 𝑫("), 𝒔(") Update re-calculated 𝜽⋆ (") , 𝑫(") , 𝒔(")
  • 15. PG-ELLA : Experiment • Benchmarks Simple Mass Damper Cart-Pole 3-Link Inverted Pen. Quadrotor • Sample 30 tasks for domain by varying system parameter • The dimension 𝒌 of latent matrix 𝑳 was chosen for each domain via cross-validation (k < 30) • M% tasks observed : Only M% tasks can update 𝑳 • Standard PG : Independent 30 policies • System parameter ranges Quadrotor 15 / 17 Average Performance on all 30 tasks Quadrotor • The more tasks contribute to update latent matrix L, the better the overall performance. M% 1-M%
  • 16. Limitation & Future Direction 16 • Limitation • Linear Model • Simple Environments & Number of tasks is not much (only 30) • Future Direction 1. Learnable 𝑘 (# of basis) considering # of tasks and task complexity • Dimension 𝑘 can increase or decrease while iterations. 2. Deep version of latent matrix 𝑳 • Hierarchical latent model