2. Contents
2 / 17
• Lifelong Learning
• Efficient Lifelong Learning Algorithm (ELLA[1]) [ICML’13]
• Online Multi-Task Learning for PG Methods (PG-ELLA[2]) [ICML’14]
• Limitation & Future Direction
[2] Ammar, Haitham Bou, et al. "Online multi-task learning for policy gradient methods." ICML, 2014
[1] Ruvolo, Paul, and Eric Eaton. "ELLA: An efficient lifelong learning algorithm." ICML. 2013.
3. Motivation
3 / 17
• Transfer Learning
• Data in the source domain helps learning the target domain
• Less data is needed in the target domain
• Tasks must be similar
• Unidirectional: Source à Target
• Multi-Task Learning
• Given M tasks, joint-train them simultaneously
• Increase overall performance across all tasks
• Lifelong Learning
• Learn sequential multiple tasks in lifetime (Not parallel)
• Learn new tasks using previous learned knowledge
• Can be evaluated at any time on any previously seen tasks
No interest of
Performance
Task
Provider
Agent
Knowledge Base
Transfer Learning
Multi-task Learning
4. ELLA[1]: An Efficient Lifelong Learning Algorithm
4 / 17
• Multi-task Supervised Learning Problem
• Linear Regression
• Logistic Regression
[1] Ruvolo, Paul, and Eric Eaton. "ELLA: An efficient lifelong learning algorithm." ICML. 2013.
5. Task parameters
5 / 17
• Assume task parameters
• is latent matrix shared across all tasks
• 𝑑 is the input dimension
• Each of the 𝑘!" columns is a latent basis
• is a task weight vector
• Each weight selects how much to use each basis
d
k Run task
Wag tail &
scratch hind leg task
Wag tail & Run
& Bark taskLatent matrix 𝐿
Task weight vector 𝒔(")
(d, 1) (d, k) (k, 1)
6. Learning Objective
• Goal : Learn latent matrix 𝑳 and task weight vector 𝒔(&) efficiently
• Each task weight vector 𝒔(&) are encouraged to be sparse to ensure that latent
matrix 𝑳 captures a maximal reusable basis.
• Since Eq. (1) is not jointly convex in and ,
the alternative convex optimization is needed.
(a) While holding fixed, update
(b) While holding fixed, update
(a) Update 𝒔(")
(b) Update 𝑳
6 / 17
7. 7 / 17
First Inefficiency
Sample a task
Update 𝒔($), while 𝑳% fixed through 𝐾 𝜽 $ + 𝑙1
Iteration 1
Iteration 2
Sample a task
Initialize
First Inefficiency : While updating 𝑳, 𝒔(!), the explicit dependence
for all previous tasks through inner summation is inefficient.
If we get the optimal parameter for task 𝑡 = 1,
the optimal value will not be changed until revisiting task 𝑡 = 1.
Let task-specific objective for task 𝑡 be
Update 𝑳$, while 𝒔($) fixed through 𝑒&(𝑳)
Update 𝒔(')
, while 𝑳$ fixed through 𝐾 𝜽 '
+ 𝑙1
Update 𝒔($)
, while 𝑳$ fixed through 𝐾 𝜽 $
+ 𝑙1
Update 𝑳', while 𝒔($), 𝒔(') fixed through 𝑒&(𝑳)
Iteration 0
inner summation
inner summation
𝒔($) 𝑳$ 𝒔($) 𝑳' …ß 𝒔($) 𝑳(
8. 8 / 17
Resolving First Inefficiency
The optimal predictor for task 𝑡 is
Solution : Use second-order Taylor expansion form
The second-order Taylor Expansion of
The linear term is ignored,
since the gradient at the optimal point is 0.
Therefore, Eq. (1) is changed into
where
Once we calculate at that iteration, It’s easy to re-optimize until revisiting task t.
9. 9 / 17
Second Inefficiency
• To update a single candidate 𝐿, the value of all 𝑠(!) ’s should be recomputed.
(which will become expensive as the number of tasks learned T increases.)
Solution : Calculate 𝑠(!) when the task t is last encountered. (𝑳 will converged as the iteration 𝒎 increases)
𝒔($)
𝑳$
𝒔(')
𝒔($)
𝑳'
𝒔(')𝒔($) 𝒔(()
𝑳(
…
𝑳%
Re-optimize
𝒔($)
𝑳$
𝒔(')
𝑳'
𝒔(()
𝑳(
…
𝑳%
Compute 𝒔(𝒕) solving LASSO
How to compute 𝑳 ?
Original Proposed
Re-optimize
10. 10 / 17
How to Update 𝑳?
• This procedure yields the updated column-wised vectorization of 𝐿 as 𝑨01 𝒃
𝑨 𝒃
Matrix Differentiation
(d, d) (d, k) (k, 1) (1, k) (d, d) (d, 1) (1, k)
(d*k, 1)(d*k, d*k) (d*k, 1)
(d*k, d*k)
(d*k, 1)
(d*k, 1)
11. 11 / 17
How to Update 𝑳?
1. Get Task 𝒕 and new data 𝑿*+,, 𝒚*+,
2. Get optimal predictor 𝜽⋆
(")
and Hessian 𝐃(")
𝜽⋆
(")
𝜽⋆
(")
𝜽⋆
(")
3. Update task weight vector 𝒔(")
4. Update latent matrix 𝑳
Remove previous 𝜽⋆
(")
, 𝑫("), 𝒔(")
Update with re-calculated 𝜽⋆
(")
, 𝑫("), 𝒔(")
13. 13 / 17
PG-ELLA[2]
• Following ELLA,
1. Get optimal policy 𝜶(!) for task 𝑡 and change Taylor expansion’s second-order term
2. To evaluate 𝐿, calculate the only last encountered 𝑠(!)
• Definition
[2] Ammar, Haitham Bou, et al. "Online multi-task learning for policy gradient methods." ICML, 2014
14. 14 / 17
How to Update 𝑳?
2. Get optimal policy 𝜶(") and Hessian 𝐃(")
3. Update task weight vector 𝒔(")
4. Update latent matrix 𝑳
1. Get Task 𝒕 and new trajectories 𝕋(") and returns ℜ(")
Remove previous 𝜽⋆
(")
, 𝑫("), 𝒔(")
Update re-calculated 𝜽⋆
(")
, 𝑫(")
, 𝒔(")
15. PG-ELLA : Experiment
• Benchmarks
Simple Mass Damper
Cart-Pole
3-Link Inverted Pen.
Quadrotor
• Sample 30 tasks for domain by varying system parameter
• The dimension 𝒌 of latent matrix 𝑳 was chosen for each
domain via cross-validation (k < 30)
• M% tasks observed : Only M% tasks can update 𝑳
• Standard PG : Independent 30 policies
• System parameter ranges
Quadrotor
15 / 17
Average Performance
on all 30 tasks
Quadrotor
• The more tasks contribute to update latent matrix L,
the better the overall performance.
M% 1-M%
16. Limitation & Future Direction
16
• Limitation
• Linear Model
• Simple Environments & Number of tasks is not much (only 30)
• Future Direction
1. Learnable 𝑘 (# of basis) considering # of tasks and task complexity
• Dimension 𝑘 can increase or decrease while iterations.
2. Deep version of latent matrix 𝑳
• Hierarchical latent model