SlideShare a Scribd company logo
Skill-Based Meta-Reinforcement Learning
Taewook Nam Shao-Hua Sun Karl Pertsch

Sung Ju Hwang Joseph J. Lim
Human Leverages Prior Knowledge
“Cook a pancake”
Human Leverages Prior Knowledge
“Cook a pancake”
SAC Policy
Human Leverages Prior Knowledge
“Cook a pancake”
Prior knowledge Prior knowledge
Human Leverages Prior Knowledge
“Cook a pancake”
How to hold frying pan
How to turn on the stove
Human Leverages Prior Knowledge
“Cook a pancake”
“Make a sandwich”
“Fry an egg”
Human Leverages Prior Knowledge
“Cook a pancake”
“Make a sandwich”
“Fry an egg”
How to hold frying pan
How to turn on the stove
Skill-based RL Meta-RL
Skill-Based Reinforcement Learning[1, 2]
Task-Agnostic

Dataset
How to turn on a stove
How to hold a frying pan
Skill
[1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020

[2] Opal: O
ffl
ine Primitive Discovery for Accelerating O
ffl
ine Reinforcement Learning. Ajay et al. ICLR 2021
Skill-Based Reinforcement Learning[1, 2]
Task-Agnostic

Dataset
How to turn on a stove
How to hold a frying pan
Skill
[1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020

[2] Opal: O
ffl
ine Primitive Discovery for Accelerating O
ffl
ine Reinforcement Learning. Ajay et al. ICLR 2021
Reward
Skill-Based Reinforcement Learning[1, 2]
T1
T2
T3
T4
TT
T5
Target Task
Task-Agnostic

Dataset
How to turn on a stove
How to hold a frying pan
Skill
[1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020

[2] Opal: O
ffl
ine Primitive Discovery for Accelerating O
ffl
ine Reinforcement Learning. Ajay et al. ICLR 2021
+ E
ffi
cient exploration

Reward
T1
T2
T3
T4
TT
T5
Target Task
Task-Agnostic

Dataset
How to turn on a stove
How to hold a frying pan
Skill
Skill-Based Reinforcement Learning[1, 2]
[1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020

[2] Opal: O
ffl
ine Primitive Discovery for Accelerating O
ffl
ine Reinforcement Learning. Ajay et al. ICLR 2021
+ E
ffi
cient exploration

- How to learn quickly
Reward
Meta Reinforcement Learning[1, 2]
T1
T2
T3
T4
TT
T5
Target Task
Training Tasks
T1
T2
T5
T3
T4
“Fry an egg”
“Make a sandwich”
[1] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Finn et al. ICML 2017

[2] E
ffi
cient O
ff
-Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019
“Cook a pancake”
+ How to learn quickly
T1
T2
T3
T4
TT
T5
Target Task
Training Tasks
T1
T2
T5
T3
T4
“Fry an egg” “Cook a pancake”
“Make a sandwich”
Meta Reinforcement Learning[1, 2]
[1] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Finn et al. ICML 2017

[2] E
ffi
cient O
ff
-Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019
+ How to learn quickly

- Limited to short-horizon task
This Work : Meta-RL + Skill-based RL
Training

Tasks
T1
T2
T5
T3
T4
Task-Agnostic

Dataset
T1
T2
T3
T4
TT
T5
Target Task
This Work : Meta-RL + Skill-based RL
Useful skill
T1
T2
T3
T4
TT
T5
Target Task
T1
T2
T5
T3
T4
Meta-RL + Skill-based RL
How to learn 

quickly
T1
T2
T3
T4
TT
T5
Target Task
T1
T2
T5
T3
T4
Skill-Based Meta-Reinforcement Learning
T1
T2
T3
T4
TT
T5
T1
T2
T5
T3
T4
Fast learning of

new long horizon task
Skill-Based Meta-Reinforcement Learning
T1
T2
T3
T4
TT
T5
T1
T2
T5
T3
T4
Fast learning of

new long horizon task
SiMPL
Phase 1 : Skill Extraction
Extract skill from task-agnostic o
ffl
ine data, following SPiRL[1].
Skill
Task-Agnostic Data
a0 a1 a2 a3
s0 s1 s2 s3 s4 …
[1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020
Phase 2 : Skill-based Meta-training
Meta-train based on extracted skill, following PEARL[1].
Meta Policy
T1
T2
Meta-Training Tasks
T5
T3
T4 Skill
[1] E
ffi
cient O
ff
-Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019
Phase 2 : Skill-based Meta-training
Meta-train based on extracted skill, following PEARL[1].
Transitions
Meta Policy
Task Encoder
T1
T2
Meta-Training Tasks
T5
T3
T4 Skill
[1] E
ffi
cient O
ff
-Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019
Phase 3 : Target Task Learning
Warm-start target task learning by task encoding.
Target Task
T1
T2
T3
T4
TT
T5
Phase 3 : Target Task Learning
Warm-start target task learning by task encoding.
Initial Exploration
Target Task
T1
T2
T3
T4
TT
T5
Task Encoder
Phase 3 : Target Task Learning
Warm-start target task learning by task encoding.
Policy
Task Encoder
Skill
Target Task
T1
T2
T3
T4
TT
T5
Initial Exploration
Phase 3 : Target Task Learning
Warm-start target task learning by task encoding.
Policy
Task Encoder
Skill
Target Task
T1
T2
T3
T4
TT
T5
Initial Exploration
Fine-tune
Environment
Maze Navigation

2000 steps / sparse reward for completion
Kitchen Manipulation

280 steps / sparse reward for subtask completions
Environment
Meta-Training Tasks
Target Tasks
arget Tasks Agent
Meta-training Tasks
Target Tasks
top burner
light switch
slide cabinet hinge cabinet
slide cabinet bottom burner
bottom burner
kettle
bottom burner light switch top burner
microwave
kettle slide cabinet hinge cabinet
light switch
1
2
3
4
vigation (b) Kitchen Manipulation
Target Tasks
Target Tasks
rget Tasks Agent
Meta-training Tasks
Target Tasks
top burner
light switch
slide cabinet hinge cabinet
slide cabinet bottom burner
bottom burner
kettle
bottom burner light switch top burner
microwave
kettle slide cabinet hinge cabinet
light switch
1
2
3
4
vigation (b) Kitchen Manipulation
Target Tasks
Meta-Training Tasks
SiMPL Learns Quickly
SiMPL can solve this task in 100 episode, but other baseline can’t.
Meta-training Tasks
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Target Task Agent Trajectory
Episode 0 Episode 20 Episode 100
SiMPL

(Ours)
SPiRL
Episode 0 Episode 20 Episode 100
PEARL-ft
MTRL
Target Location
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Meta-training Tasks
Agent Trajectory
Start Location
SiMPL Learns Quickly
SiMPL can solve this task in 100 episode, but other baseline can’t.
Meta-training Tasks
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Target Task Agent Trajectory
Episode 0 Episode 20 Episode 100
SiMPL

(Ours)
SPiRL
Episode 0 Episode 20 Episode 100
PEARL-ft
MTRL
Target Location
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Meta-training Tasks
Agent Trajectory
Start Location
SiMPL Learns Quickly
SiMPL can solve this task in 100 episode, but other baseline can’t.
Meta-training Tasks
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Target Task Agent Trajectory
Episode 0 Episode 20 Episode 100
SiMPL

(Ours)
SPiRL
Episode 0 Episode 20 Episode 100
PEARL-ft
MTRL
Target Location
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Meta-training Tasks
Agent Trajectory
Start Location
SiMPL Learns Quickly
SiMPL can solve this task in 100 episode, but other baseline can’t.
Meta-training Tasks
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Target Task Agent Trajectory
Episode 0 Episode 20 Episode 100
SiMPL

(Ours)
SPiRL
Episode 0 Episode 20 Episode 100
PEARL-ft
MTRL
Target Location
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Meta-training Tasks
Agent Trajectory
Start Location
SiMPL Learns Quickly
SiMPL can solve this task in 100 episode, but other baseline can’t.
Meta-training Tasks
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Target Task Agent Trajectory
Episode 0 Episode 20 Episode 100
SiMPL

(Ours)
SPiRL
Episode 0 Episode 20 Episode 100
PEARL-ft
MTRL
Target Location
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Meta-training Tasks
Agent Trajectory
Start Location
SiMPL Learns Quickly
SiMPL can solve this task in 100 episode, but other baseline can’t.
Meta-training Tasks
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Target Task Agent Trajectory
Episode 0 Episode 20 Episode 100
SiMPL

(Ours)
SPiRL
Episode 0 Episode 20 Episode 100
PEARL-ft
MTRL
Target Location
Episode 0 Episodes 20 Episodes 80
SPiRL
Ours
Meta-Training Task Target Task
Meta-training Tasks
Agent Trajectory
Start Location
SiMPL Learns Quickly
SiMPL converges faster than MTRL / Skill-based RL / Meta-RL baselines.
SiMPL (Ours) SPiRL MTRL PEARL-ft SAC
PEARL
SiMPL (Ours) SPiRL MTRL PEARL-ft SAC
PEARL
• SiMPL can leverage both o
ffl
ine dataset and tasks by combining

skill-based RL and meta-RL
• SiMPL can learn new long-horizon and sparse-reward tasks faster
Summary
Summary
• SiMPL can leverage both o
ffl
ine dataset and tasks by combining

skill-based RL and meta-RL
• SiMPL can learn new long-horizon and sparse-reward tasks faster
Skill-Based Meta-Reinforcement Learning
Taewook Nam Shao-Hua Sun Karl Pertsch

Sung Ju Hwang Joseph J. Lim
Paper & Code : namsan96.github.io/SiMPL

More Related Content

More from MLAI2

Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
MLAI2
 
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
MLAI2
 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
MLAI2
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingAccurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset Pooling
MLAI2
 
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
MLAI2
 
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
MLAI2
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
 
Adversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive LearningAdversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive Learning
MLAI2
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
MLAI2
 
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
MLAI2
 
Cost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessCost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention Process
MLAI2
 
Adversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability SuppressionAdversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability Suppression
MLAI2
 
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
MLAI2
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
MLAI2
 
Meta Dropout: Learning to Perturb Latent Features for Generalization
Meta Dropout: Learning to Perturb Latent Features for Generalization Meta Dropout: Learning to Perturb Latent Features for Generalization
Meta Dropout: Learning to Perturb Latent Features for Generalization
MLAI2
 
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
MLAI2
 

More from MLAI2 (16)

Task Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive LearningTask Adaptive Neural Network Search with Meta-Contrastive Learning
Task Adaptive Neural Network Search with Meta-Contrastive Learning
 
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
Federated Semi-Supervised Learning with Inter-Client Consistency & Disjoint L...
 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
 
Accurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset PoolingAccurate Learning of Graph Representations with Graph Multiset Pooling
Accurate Learning of Graph Representations with Graph Multiset Pooling
 
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
Contrastive Learning with Adversarial Perturbations for Conditional Text Gene...
 
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
Clinical Risk Prediction with Temporal Probabilistic Asymmetric Multi-Task Le...
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
Adversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive LearningAdversarial Self-Supervised Contrastive Learning
Adversarial Self-Supervised Contrastive Learning
 
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
Learning to Extrapolate Knowledge: Transductive Few-shot Out-of-Graph Link Pr...
 
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...Neural Mask Generator : Learning to Generate Adaptive WordMaskings for Langu...
Neural Mask Generator : Learning to Generate Adaptive Word Maskings for Langu...
 
Cost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention ProcessCost-effective Interactive Attention Learning with Neural Attention Process
Cost-effective Interactive Attention Learning with Neural Attention Process
 
Adversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability SuppressionAdversarial Neural Pruning with Latent Vulnerability Suppression
Adversarial Neural Pruning with Latent Vulnerability Suppression
 
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
Generating Diverse and Consistent QA pairs from Contexts with Information-Max...
 
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...
 
Meta Dropout: Learning to Perturb Latent Features for Generalization
Meta Dropout: Learning to Perturb Latent Features for Generalization Meta Dropout: Learning to Perturb Latent Features for Generalization
Meta Dropout: Learning to Perturb Latent Features for Generalization
 
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
Scalable and Order-robust Continual Learning with Additive Parameter Decompos...
 

Recently uploaded

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
FilipTomaszewski5
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Pitangent Analytics & Technology Solutions Pvt. Ltd
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
LizaNolte
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
christinelarrosa
 

Recently uploaded (20)

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeckPoznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
Poznań ACE event - 19.06.2024 Team 24 Wrapup slidedeck
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
Crafting Excellence: A Comprehensive Guide to iOS Mobile App Development Serv...
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillinQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
inQuba Webinar Mastering Customer Journey Management with Dr Graham Hill
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
Christine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptxChristine's Product Research Presentation.pptx
Christine's Product Research Presentation.pptx
 

Skill-Based Meta-Reinforcement Learning

  • 1. Skill-Based Meta-Reinforcement Learning Taewook Nam Shao-Hua Sun Karl Pertsch Sung Ju Hwang Joseph J. Lim
  • 2. Human Leverages Prior Knowledge “Cook a pancake”
  • 3. Human Leverages Prior Knowledge “Cook a pancake” SAC Policy
  • 4. Human Leverages Prior Knowledge “Cook a pancake” Prior knowledge Prior knowledge
  • 5. Human Leverages Prior Knowledge “Cook a pancake” How to hold frying pan How to turn on the stove
  • 6. Human Leverages Prior Knowledge “Cook a pancake” “Make a sandwich” “Fry an egg”
  • 7. Human Leverages Prior Knowledge “Cook a pancake” “Make a sandwich” “Fry an egg” How to hold frying pan How to turn on the stove Skill-based RL Meta-RL
  • 8. Skill-Based Reinforcement Learning[1, 2] Task-Agnostic Dataset How to turn on a stove How to hold a frying pan Skill [1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020
 [2] Opal: O ffl ine Primitive Discovery for Accelerating O ffl ine Reinforcement Learning. Ajay et al. ICLR 2021
  • 9. Skill-Based Reinforcement Learning[1, 2] Task-Agnostic Dataset How to turn on a stove How to hold a frying pan Skill [1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020
 [2] Opal: O ffl ine Primitive Discovery for Accelerating O ffl ine Reinforcement Learning. Ajay et al. ICLR 2021 Reward
  • 10. Skill-Based Reinforcement Learning[1, 2] T1 T2 T3 T4 TT T5 Target Task Task-Agnostic Dataset How to turn on a stove How to hold a frying pan Skill [1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020
 [2] Opal: O ffl ine Primitive Discovery for Accelerating O ffl ine Reinforcement Learning. Ajay et al. ICLR 2021 + E ffi cient exploration Reward
  • 11. T1 T2 T3 T4 TT T5 Target Task Task-Agnostic Dataset How to turn on a stove How to hold a frying pan Skill Skill-Based Reinforcement Learning[1, 2] [1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020
 [2] Opal: O ffl ine Primitive Discovery for Accelerating O ffl ine Reinforcement Learning. Ajay et al. ICLR 2021 + E ffi cient exploration - How to learn quickly Reward
  • 12. Meta Reinforcement Learning[1, 2] T1 T2 T3 T4 TT T5 Target Task Training Tasks T1 T2 T5 T3 T4 “Fry an egg” “Make a sandwich” [1] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Finn et al. ICML 2017
 [2] E ffi cient O ff -Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019 “Cook a pancake” + How to learn quickly
  • 13. T1 T2 T3 T4 TT T5 Target Task Training Tasks T1 T2 T5 T3 T4 “Fry an egg” “Cook a pancake” “Make a sandwich” Meta Reinforcement Learning[1, 2] [1] Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. Finn et al. ICML 2017
 [2] E ffi cient O ff -Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019 + How to learn quickly - Limited to short-horizon task
  • 14. This Work : Meta-RL + Skill-based RL Training Tasks T1 T2 T5 T3 T4 Task-Agnostic Dataset T1 T2 T3 T4 TT T5 Target Task
  • 15. This Work : Meta-RL + Skill-based RL Useful skill T1 T2 T3 T4 TT T5 Target Task T1 T2 T5 T3 T4
  • 16. Meta-RL + Skill-based RL How to learn quickly T1 T2 T3 T4 TT T5 Target Task T1 T2 T5 T3 T4
  • 19. Phase 1 : Skill Extraction Extract skill from task-agnostic o ffl ine data, following SPiRL[1]. Skill Task-Agnostic Data a0 a1 a2 a3 s0 s1 s2 s3 s4 … [1] Accelerating Reinforcement Learning with learned Skill Prior. Pertsch et al. CoRL 2020
  • 20. Phase 2 : Skill-based Meta-training Meta-train based on extracted skill, following PEARL[1]. Meta Policy T1 T2 Meta-Training Tasks T5 T3 T4 Skill [1] E ffi cient O ff -Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019
  • 21. Phase 2 : Skill-based Meta-training Meta-train based on extracted skill, following PEARL[1]. Transitions Meta Policy Task Encoder T1 T2 Meta-Training Tasks T5 T3 T4 Skill [1] E ffi cient O ff -Policy Meta-reinforcement Learning via Probabilistic Context Variables. Rakelly et al. ICML 2019
  • 22. Phase 3 : Target Task Learning Warm-start target task learning by task encoding. Target Task T1 T2 T3 T4 TT T5
  • 23. Phase 3 : Target Task Learning Warm-start target task learning by task encoding. Initial Exploration Target Task T1 T2 T3 T4 TT T5 Task Encoder
  • 24. Phase 3 : Target Task Learning Warm-start target task learning by task encoding. Policy Task Encoder Skill Target Task T1 T2 T3 T4 TT T5 Initial Exploration
  • 25. Phase 3 : Target Task Learning Warm-start target task learning by task encoding. Policy Task Encoder Skill Target Task T1 T2 T3 T4 TT T5 Initial Exploration Fine-tune
  • 26. Environment Maze Navigation 2000 steps / sparse reward for completion Kitchen Manipulation 280 steps / sparse reward for subtask completions
  • 27. Environment Meta-Training Tasks Target Tasks arget Tasks Agent Meta-training Tasks Target Tasks top burner light switch slide cabinet hinge cabinet slide cabinet bottom burner bottom burner kettle bottom burner light switch top burner microwave kettle slide cabinet hinge cabinet light switch 1 2 3 4 vigation (b) Kitchen Manipulation Target Tasks Target Tasks rget Tasks Agent Meta-training Tasks Target Tasks top burner light switch slide cabinet hinge cabinet slide cabinet bottom burner bottom burner kettle bottom burner light switch top burner microwave kettle slide cabinet hinge cabinet light switch 1 2 3 4 vigation (b) Kitchen Manipulation Target Tasks Meta-Training Tasks
  • 28. SiMPL Learns Quickly SiMPL can solve this task in 100 episode, but other baseline can’t. Meta-training Tasks Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Target Task Agent Trajectory Episode 0 Episode 20 Episode 100 SiMPL (Ours) SPiRL Episode 0 Episode 20 Episode 100 PEARL-ft MTRL Target Location Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Meta-training Tasks Agent Trajectory Start Location
  • 29. SiMPL Learns Quickly SiMPL can solve this task in 100 episode, but other baseline can’t. Meta-training Tasks Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Target Task Agent Trajectory Episode 0 Episode 20 Episode 100 SiMPL (Ours) SPiRL Episode 0 Episode 20 Episode 100 PEARL-ft MTRL Target Location Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Meta-training Tasks Agent Trajectory Start Location
  • 30. SiMPL Learns Quickly SiMPL can solve this task in 100 episode, but other baseline can’t. Meta-training Tasks Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Target Task Agent Trajectory Episode 0 Episode 20 Episode 100 SiMPL (Ours) SPiRL Episode 0 Episode 20 Episode 100 PEARL-ft MTRL Target Location Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Meta-training Tasks Agent Trajectory Start Location
  • 31. SiMPL Learns Quickly SiMPL can solve this task in 100 episode, but other baseline can’t. Meta-training Tasks Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Target Task Agent Trajectory Episode 0 Episode 20 Episode 100 SiMPL (Ours) SPiRL Episode 0 Episode 20 Episode 100 PEARL-ft MTRL Target Location Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Meta-training Tasks Agent Trajectory Start Location
  • 32. SiMPL Learns Quickly SiMPL can solve this task in 100 episode, but other baseline can’t. Meta-training Tasks Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Target Task Agent Trajectory Episode 0 Episode 20 Episode 100 SiMPL (Ours) SPiRL Episode 0 Episode 20 Episode 100 PEARL-ft MTRL Target Location Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Meta-training Tasks Agent Trajectory Start Location
  • 33. SiMPL Learns Quickly SiMPL can solve this task in 100 episode, but other baseline can’t. Meta-training Tasks Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Target Task Agent Trajectory Episode 0 Episode 20 Episode 100 SiMPL (Ours) SPiRL Episode 0 Episode 20 Episode 100 PEARL-ft MTRL Target Location Episode 0 Episodes 20 Episodes 80 SPiRL Ours Meta-Training Task Target Task Meta-training Tasks Agent Trajectory Start Location
  • 34. SiMPL Learns Quickly SiMPL converges faster than MTRL / Skill-based RL / Meta-RL baselines. SiMPL (Ours) SPiRL MTRL PEARL-ft SAC PEARL SiMPL (Ours) SPiRL MTRL PEARL-ft SAC PEARL
  • 35. • SiMPL can leverage both o ffl ine dataset and tasks by combining
 skill-based RL and meta-RL • SiMPL can learn new long-horizon and sparse-reward tasks faster Summary
  • 36. Summary • SiMPL can leverage both o ffl ine dataset and tasks by combining
 skill-based RL and meta-RL • SiMPL can learn new long-horizon and sparse-reward tasks faster
  • 37. Skill-Based Meta-Reinforcement Learning Taewook Nam Shao-Hua Sun Karl Pertsch Sung Ju Hwang Joseph J. Lim Paper & Code : namsan96.github.io/SiMPL