SlideShare a Scribd company logo
FeUdal Networks for
Hierarchical Reinforcement
Learning
Illia Polosukhin
Paper by Vezhnevets et al.
Motivation
● Deep Reinforcement Learning works really well when
rewards occur often
● Environments with long-term credit assignment and
sparse rewards are still a challenge
● Non-Markovian environments, that require memory -
particularly challenging
● Non-hierarchical models often overfit specific mapping of
input-outputs.
Feudal Reinforcement Learning
● Managerial hierarchy
observing world at different
resolution [Information
Hiding]
● Communicate via goals to
manager’s “workers” and
rewarding for meeting
them. [Reward Hiding]
Dayan & Hinton, 1993
Contributions
● FuNs: End-to-end differentiable model that implements
principles of Feudal RL [Dayan & Hinton, 1993]
● Novel, approximate transition policy gradient update for
training Manager
● Use of goals that are directional rather than absolute in
nature
● A novel dilated LSTM to extend longevity of memory for
Manager
Model
Model
Goal embedding
● Worker produces embedding for each action - matrix U.
● Last c goals from Manager are summed and projected
into vector w (Rk)
● Manager’s goal w modulates policy via a multiplicative
interaction in low k dim space.
Training
● Manager training to set goals in the advantageous
direction in state space:
● Worker trained intrinsic reward to follow Manager’s goals:
Training
● Using Actor-Critic setup for Worker training, using
weighted sum of an intrinsic reward and environment
reward for Advantage function:
Transition Policy Gradients
● Manager can be trained as if it had high-level policy, that
selects sub-policies ot
● High-level policy can be composed with the transition
distribution to give “transition policy” and can be applied
policy gradient to it:
Dilated LSTM
● Given dilation radius r, the network full state h -
combination of {hi}r
i=1 sub-states or “cores”
● LSTM at time t only uses and updates t % r core - ht%r
t-1,
while sharing parameters
● Output is pooled across previous c outputs.
● Allows to preserve the memories for long periods, and still
process from every input experience and update output at
every step.
Experiments
FeUdal Networks for Hierarchical Reinforcement Learning
FeUdal Networks for Hierarchical Reinforcement Learning
FeUdal Networks for Hierarchical Reinforcement Learning
FeUdal Networks for Hierarchical Reinforcement Learning
Ablative analysis
Action repeat transfer
Join Slack:
https://xixslack.herokuapp.com/
Quick survey about tools for Machine Learning:
http://bit.ly/ml-tools
Really, just a minute!

More Related Content

Similar to FeUdal Networks for Hierarchical Reinforcement Learning

[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
NAVER D2
 
Learning transfer FULL PRESENTATION
Learning transfer FULL PRESENTATIONLearning transfer FULL PRESENTATION
Learning transfer FULL PRESENTATION
Arjun Reghu
 
RajeevKumarK
RajeevKumarKRajeevKumarK
RajeevKumarK
aqua25ind
 
Paper presentation on LLM compression
Paper presentation on LLM compression Paper presentation on LLM compression
Paper presentation on LLM compression
SanjanaRajeshKothari
 
On the road to Engineering excellence
On the road to Engineering excellenceOn the road to Engineering excellence
On the road to Engineering excellence
Alexander Mrynskyi
 
20482-Sathyanarayana-FPM Assignment 1.pptx
20482-Sathyanarayana-FPM Assignment 1.pptx20482-Sathyanarayana-FPM Assignment 1.pptx
20482-Sathyanarayana-FPM Assignment 1.pptx
VarunSubramanyam
 
AI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxAI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptx
MohammadAsim91
 
Enterprise transformation models their utility, common pitfalls and adaptive IT
Enterprise transformation models their utility, common pitfalls and adaptive ITEnterprise transformation models their utility, common pitfalls and adaptive IT
Enterprise transformation models their utility, common pitfalls and adaptive IT
Puppet
 
Rajmohan_CV _Updated
Rajmohan_CV _UpdatedRajmohan_CV _Updated
Rajmohan_CV _Updated
Rajmohan A
 
PerformanceG2 Cognos Training Course Catalog 2011
PerformanceG2 Cognos Training Course Catalog 2011PerformanceG2 Cognos Training Course Catalog 2011
PerformanceG2 Cognos Training Course Catalog 2011
PerformanceG2, Inc.
 
Learning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentLearning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient Descent
Katy Lee
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
Ammar Rashed
 
Agile_Project_Management_Methods_for_ERP.pdf
Agile_Project_Management_Methods_for_ERP.pdfAgile_Project_Management_Methods_for_ERP.pdf
Agile_Project_Management_Methods_for_ERP.pdf
RupakSingh26
 
Yogananda-SAPSF
Yogananda-SAPSFYogananda-SAPSF
Yogananda-SAPSF
Yogananda Reddy Kareti
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learning
Cairo University
 
Best Practices from19 ERP Implementations
Best Practices from19 ERP ImplementationsBest Practices from19 ERP Implementations
Best Practices from19 ERP Implementations
Thomas Danford
 
Rkresume
RkresumeRkresume
Rkresume
Raushan Kumar
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic Programming
Seung Jae Lee
 
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
patiladiti752
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
DongHyun Kwak
 

Similar to FeUdal Networks for Hierarchical Reinforcement Learning (20)

[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
 
Learning transfer FULL PRESENTATION
Learning transfer FULL PRESENTATIONLearning transfer FULL PRESENTATION
Learning transfer FULL PRESENTATION
 
RajeevKumarK
RajeevKumarKRajeevKumarK
RajeevKumarK
 
Paper presentation on LLM compression
Paper presentation on LLM compression Paper presentation on LLM compression
Paper presentation on LLM compression
 
On the road to Engineering excellence
On the road to Engineering excellenceOn the road to Engineering excellence
On the road to Engineering excellence
 
20482-Sathyanarayana-FPM Assignment 1.pptx
20482-Sathyanarayana-FPM Assignment 1.pptx20482-Sathyanarayana-FPM Assignment 1.pptx
20482-Sathyanarayana-FPM Assignment 1.pptx
 
AI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptxAI_Unit-4_Learning.pptx
AI_Unit-4_Learning.pptx
 
Enterprise transformation models their utility, common pitfalls and adaptive IT
Enterprise transformation models their utility, common pitfalls and adaptive ITEnterprise transformation models their utility, common pitfalls and adaptive IT
Enterprise transformation models their utility, common pitfalls and adaptive IT
 
Rajmohan_CV _Updated
Rajmohan_CV _UpdatedRajmohan_CV _Updated
Rajmohan_CV _Updated
 
PerformanceG2 Cognos Training Course Catalog 2011
PerformanceG2 Cognos Training Course Catalog 2011PerformanceG2 Cognos Training Course Catalog 2011
PerformanceG2 Cognos Training Course Catalog 2011
 
Learning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient DescentLearning to Learn by Gradient Descent by Gradient Descent
Learning to Learn by Gradient Descent by Gradient Descent
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
 
Agile_Project_Management_Methods_for_ERP.pdf
Agile_Project_Management_Methods_for_ERP.pdfAgile_Project_Management_Methods_for_ERP.pdf
Agile_Project_Management_Methods_for_ERP.pdf
 
Yogananda-SAPSF
Yogananda-SAPSFYogananda-SAPSF
Yogananda-SAPSF
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learning
 
Best Practices from19 ERP Implementations
Best Practices from19 ERP ImplementationsBest Practices from19 ERP Implementations
Best Practices from19 ERP Implementations
 
Rkresume
RkresumeRkresume
Rkresume
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic Programming
 
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language...
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 

Recently uploaded

UMiami degree offer diploma Transcript
UMiami degree offer diploma TranscriptUMiami degree offer diploma Transcript
UMiami degree offer diploma Transcript
attueb
 
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
revolutionary575
 
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
Srinivas Dukka
 
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdfA Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
kalichargn70th171
 
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
bhumivarma35300
 
Crafting highly scalable and performant Modern Data Platforms
Crafting highly scalable and performant Modern Data PlatformsCrafting highly scalable and performant Modern Data Platforms
Crafting highly scalable and performant Modern Data Platforms
Sameer Paradkar
 
Authentication Review-June -2024 AP & TS.pptx
Authentication Review-June -2024 AP & TS.pptxAuthentication Review-June -2024 AP & TS.pptx
Authentication Review-June -2024 AP & TS.pptx
DEMONDUOS
 
welcome to presentation on Google Apps
welcome to   presentation on Google Appswelcome to   presentation on Google Apps
welcome to presentation on Google Apps
AsifKarimJim
 
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
當測試開始左移
當測試開始左移當測試開始左移
當測試開始左移
Jersey (CHE-PING) Su
 
BATber53 AWS Modernize your applications with purpose-built AWS databases
BATber53 AWS Modernize your applications with purpose-built AWS databasesBATber53 AWS Modernize your applications with purpose-built AWS databases
BATber53 AWS Modernize your applications with purpose-built AWS databases
BATbern
 
Predicting Test Results without Execution (FSE 2024)
Predicting Test Results without Execution (FSE 2024)Predicting Test Results without Execution (FSE 2024)
Predicting Test Results without Execution (FSE 2024)
andrehoraa
 
Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech
 
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
jealousviolet
 
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
norina2645
 
Applitools Autonomous 2.0 Sneak Peek.pdf
Applitools Autonomous 2.0 Sneak Peek.pdfApplitools Autonomous 2.0 Sneak Peek.pdf
Applitools Autonomous 2.0 Sneak Peek.pdf
Applitools
 
GT degree offer diploma Transcript
GT degree offer diploma TranscriptGT degree offer diploma Transcript
GT degree offer diploma Transcript
attueb
 
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
shanihomely
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
confluent
 
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdfAI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
Daniel Zivkovic
 

Recently uploaded (20)

UMiami degree offer diploma Transcript
UMiami degree offer diploma TranscriptUMiami degree offer diploma Transcript
UMiami degree offer diploma Transcript
 
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
Busty Girls Call Mumbai 9930245274 Unlimited Short Providing Girls Service Av...
 
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
 
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdfA Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
A Step-by-Step Guide to Selecting the Right Automated Software Testing Tools.pdf
 
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
 
Crafting highly scalable and performant Modern Data Platforms
Crafting highly scalable and performant Modern Data PlatformsCrafting highly scalable and performant Modern Data Platforms
Crafting highly scalable and performant Modern Data Platforms
 
Authentication Review-June -2024 AP & TS.pptx
Authentication Review-June -2024 AP & TS.pptxAuthentication Review-June -2024 AP & TS.pptx
Authentication Review-June -2024 AP & TS.pptx
 
welcome to presentation on Google Apps
welcome to   presentation on Google Appswelcome to   presentation on Google Apps
welcome to presentation on Google Apps
 
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
 
當測試開始左移
當測試開始左移當測試開始左移
當測試開始左移
 
BATber53 AWS Modernize your applications with purpose-built AWS databases
BATber53 AWS Modernize your applications with purpose-built AWS databasesBATber53 AWS Modernize your applications with purpose-built AWS databases
BATber53 AWS Modernize your applications with purpose-built AWS databases
 
Predicting Test Results without Execution (FSE 2024)
Predicting Test Results without Execution (FSE 2024)Predicting Test Results without Execution (FSE 2024)
Predicting Test Results without Execution (FSE 2024)
 
Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.Mobile App Development Company in Noida - Drona Infotech.
Mobile App Development Company in Noida - Drona Infotech.
 
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
 
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
Celebrity Girls Call Mumbai 🛵🚡9910780858 💃 Choose Best And Top Girl Service A...
 
Applitools Autonomous 2.0 Sneak Peek.pdf
Applitools Autonomous 2.0 Sneak Peek.pdfApplitools Autonomous 2.0 Sneak Peek.pdf
Applitools Autonomous 2.0 Sneak Peek.pdf
 
GT degree offer diploma Transcript
GT degree offer diploma TranscriptGT degree offer diploma Transcript
GT degree offer diploma Transcript
 
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
Russian Girls Call Mumbai 🎈🔥9930687706 🔥💋🎈 Provide Best And Top Girl Service ...
 
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
Unleashing the Future: Building a Scalable and Up-to-Date GenAI Chatbot with ...
 
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdfAI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
 

FeUdal Networks for Hierarchical Reinforcement Learning

  • 1. FeUdal Networks for Hierarchical Reinforcement Learning Illia Polosukhin Paper by Vezhnevets et al.
  • 2. Motivation ● Deep Reinforcement Learning works really well when rewards occur often ● Environments with long-term credit assignment and sparse rewards are still a challenge ● Non-Markovian environments, that require memory - particularly challenging ● Non-hierarchical models often overfit specific mapping of input-outputs.
  • 3. Feudal Reinforcement Learning ● Managerial hierarchy observing world at different resolution [Information Hiding] ● Communicate via goals to manager’s “workers” and rewarding for meeting them. [Reward Hiding] Dayan & Hinton, 1993
  • 4. Contributions ● FuNs: End-to-end differentiable model that implements principles of Feudal RL [Dayan & Hinton, 1993] ● Novel, approximate transition policy gradient update for training Manager ● Use of goals that are directional rather than absolute in nature ● A novel dilated LSTM to extend longevity of memory for Manager
  • 7. Goal embedding ● Worker produces embedding for each action - matrix U. ● Last c goals from Manager are summed and projected into vector w (Rk) ● Manager’s goal w modulates policy via a multiplicative interaction in low k dim space.
  • 8. Training ● Manager training to set goals in the advantageous direction in state space: ● Worker trained intrinsic reward to follow Manager’s goals:
  • 9. Training ● Using Actor-Critic setup for Worker training, using weighted sum of an intrinsic reward and environment reward for Advantage function:
  • 10. Transition Policy Gradients ● Manager can be trained as if it had high-level policy, that selects sub-policies ot ● High-level policy can be composed with the transition distribution to give “transition policy” and can be applied policy gradient to it:
  • 11. Dilated LSTM ● Given dilation radius r, the network full state h - combination of {hi}r i=1 sub-states or “cores” ● LSTM at time t only uses and updates t % r core - ht%r t-1, while sharing parameters ● Output is pooled across previous c outputs. ● Allows to preserve the memories for long periods, and still process from every input experience and update output at every step.
  • 19. Join Slack: https://xixslack.herokuapp.com/ Quick survey about tools for Machine Learning: http://bit.ly/ml-tools Really, just a minute!

Editor's Notes

  1. It is symptomatic that the standard approach on the ATARI benchmark suite (Bellemare et al., 2012) is to use an actionrepeat heuristic, where each action translates into several (usually 4)
  2. No biases makes sure there is no way to produce constant non-zero vector. Due to pooling, the conditioning from Manager varies smoothly
  3. We use directions because it is more feasible for the Worker to be able to reliably cause directional shifts in the latent state than it is to assume that the Worker can take us to (potentially) arbitrary new absolute locations.
  4. Learning curve on Montezuma’s Revenge
  5. This is a visualisation of sub-goals learnt by FuN in the first room. Tall bars - number of states for which current state maximized the cos(s - st, gt)
  6. Visualisation of sub-policies learnt on sea quest game.
  7. Ablative analysis: Non feudal FuN: training policy gradient with gradient going via g from Worker and no intrinsic reward. Manager’s g trained via standard policy gradient G is absolute goal instead of direction. Pure feudal: worker has only intrinsic reward
  8. Testing separation between worker and manager: Initialize on agent that was trained with action repeat = 4 on environment without action repeat. Increase dilation by 4, manager’s horizon c by 4. Train for 200 episodes.