SlideShare a Scribd company logo
1 of 59
Hierarchical Reinforcement Learning Mausam [A Survey and Comparison of HRL techniques]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Decision Making Slide courtesy Dan Weld Environment Percept Action What action next?
Personal Printerbot ,[object Object],[object Object],[object Object],[object Object],[object Object]
Episodic Markov Decision Process ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* Markovian assumption. ** bounds  R  for  infinite horizon. Episodic MDP  ´  MDP with absorbing goals
Goal of an Episodic MDP ,[object Object],[object Object],[object Object],[object Object],* Non-noisy complete information perceptors
Solution of an Episodic MDP ,[object Object],[object Object]
Complexity of Value Iteration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* Bellman’s curse of dimensionality
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Learning Environment Data ,[object Object],[object Object],[object Object],[object Object]
Decision Making while Learning* Environment Percepts Datum Action * Known as  Reinforcement Learning What action next?   ,[object Object],[object Object],[object Object],[object Object]
Reinforcement Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Planning vs. MDP vs. RL ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Exploration vs. Exploitation ,[object Object],[object Object],[object Object],[object Object]
Model Based Learning ,[object Object],[object Object],[object Object],[object Object],[object Object]
Model Free Learning ,[object Object],[object Object],[object Object],[object Object]
Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Q-Learning ,[object Object],[object Object],[object Object],Optimal policy is the action with maximum  Q * value.
Q-Learning ,[object Object],[object Object],New estimate of  Q  value Old estimate of  Q  value
Semi-MDP: When actions take time. ,[object Object],[object Object],[object Object],[object Object]
Printerbot ,[object Object],[object Object],[object Object],[object Object],[object Object]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. The Mathematical Perspective A Structure Paradigm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
2. Modular Decision Making
2. Modular Decision Making ,[object Object],[object Object],[object Object]
2. Modular Decision Making ,[object Object],[object Object],[object Object]
3. Background Knowledge ,[object Object],[object Object],[object Object],[object Object],[object Object]
A mechanism that exploits all three avenues : Hierarchies ,[object Object],[object Object],[object Object]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hierarchy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hierarchical Algos  ´  Gating Mechanism ,[object Object],[object Object],[object Object],[object Object],* *Can be a multi-  level hierarchy. g is a gate b i  is a behaviour
Option : Move e  until end of hallway ,[object Object],[object Object],[object Object]
Options  [Sutton, Precup, Singh’99] ,[object Object],[object Object],[object Object],[object Object],[object Object],*Can be a policy over lower level options.
Learning ,[object Object],[object Object],[object Object],[object Object]
Machine: Move e  + Collision Avoidance Move e Choose Return End of hallway :  End of hallway Obstacle Call M1 Call M2 M1 M2 Move w Move n Move n Return Move w Move s Move s Return
Hierarchies of Abstract Machines [Parr, Russell’97] ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hierarchies of Abstract Machines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Learning ,[object Object],[object Object],[object Object],[object Object]
Task Hierarchy: MAXQ Decomposition [Dietterich’00] Root Take Give Navigate(loc) Deliver Fetch Extend-arm Extend-arm Grab Release Move e Move w Move s Move n Children of a task are unordered
MAXQ Decomposition ,[object Object],[object Object],[object Object],[object Object],[object Object],*Observe the context-free nature of Q -value Reward received while navigating Reward received after navigation
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. State Abstraction ,[object Object],[object Object],[object Object]
State Abstraction in MAXQ ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
State Abstraction in Options, HAM ,[object Object],[object Object],[object Object],[object Object],[object Object],*[Andre,Russell’02]
2. Optimality Hierarchical Optimality vs. Recursive Optimality
Optimality ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* Can define eqns for both optimalities **Adv. of using macro-actions maybe lost.
3. Language Expressiveness ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
4. Knowledge Requirements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
5. Models advanced ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
6. Structure Paradigm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Directions for Future Research ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Directions for Future Research ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Applications ,[object Object],[object Object],[object Object],[object Object],Images courtesy various sources Parts Assemblies Ware-house P2 P1 P3 P4 D2 D3 D4 D1                    
Thinking Big… ,[object Object],[object Object]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to choose appropriate hierarchy ,[object Object],[object Object],[object Object],[object Object],[object Object]
The Structure Paradigm ,[object Object],[object Object]
Main ideas in HRL community ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Edureka!
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataYao-Chieh Hu
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Taehoon Kim
 
Deep deterministic policy gradient
Deep deterministic policy gradientDeep deterministic policy gradient
Deep deterministic policy gradientSlobodan Blazeski
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision processVARUN KUMAR
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
2-Agents- Artificial Intelligence
2-Agents- Artificial Intelligence2-Agents- Artificial Intelligence
2-Agents- Artificial IntelligenceMhd Sb
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기Woong won Lee
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingSeung Jae Lee
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkYan Xu
 
Reinforcement Learning, Application and Q-Learning
Reinforcement Learning, Application and Q-LearningReinforcement Learning, Application and Q-Learning
Reinforcement Learning, Application and Q-LearningAbdullah al Mamun
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 

What's hot (20)

An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
RNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential DataRNN & LSTM: Neural Network for Sequential Data
RNN & LSTM: Neural Network for Sequential Data
 
Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)Continuous control with deep reinforcement learning (DDPG)
Continuous control with deep reinforcement learning (DDPG)
 
Deep deterministic policy gradient
Deep deterministic policy gradientDeep deterministic policy gradient
Deep deterministic policy gradient
 
Lecture 9 Markov decision process
Lecture 9 Markov decision processLecture 9 Markov decision process
Lecture 9 Markov decision process
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
2-Agents- Artificial Intelligence
2-Agents- Artificial Intelligence2-Agents- Artificial Intelligence
2-Agents- Artificial Intelligence
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic ProgrammingReinforcement Learning 4. Dynamic Programming
Reinforcement Learning 4. Dynamic Programming
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Reinforcement Learning, Application and Q-Learning
Reinforcement Learning, Application and Q-LearningReinforcement Learning, Application and Q-Learning
Reinforcement Learning, Application and Q-Learning
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
5 csp
5 csp5 csp
5 csp
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 

Viewers also liked

Aggregate planning
Aggregate planningAggregate planning
Aggregate planningAtif Ghayas
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 

Viewers also liked (6)

Htn in videogames
Htn in videogamesHtn in videogames
Htn in videogames
 
強化学習勉強会・論文紹介(Kulkarni et al., 2016)
強化学習勉強会・論文紹介(Kulkarni et al., 2016)強化学習勉強会・論文紹介(Kulkarni et al., 2016)
強化学習勉強会・論文紹介(Kulkarni et al., 2016)
 
Hierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement LearningHierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement Learning
 
Aggregate Planning
Aggregate  PlanningAggregate  Planning
Aggregate Planning
 
Aggregate planning
Aggregate planningAggregate planning
Aggregate planning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 

Similar to Hierarchical Reinforcement Learning

reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysYasutoTamura1
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.pptssuser43a599
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.pptPOOJASHREEC1
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
lecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationlecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationbutest
 
Lecture notes
Lecture notesLecture notes
Lecture notesbutest
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfssuseradaf5f
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNINGpradiprahul
 
AI_03_Solving Problems by Searching.pptx
AI_03_Solving Problems by Searching.pptxAI_03_Solving Problems by Searching.pptx
AI_03_Solving Problems by Searching.pptxYousef Aburawi
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 

Similar to Hierarchical Reinforcement Learning (20)

reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
RL intro
RL introRL intro
RL intro
 
YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
lecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationlecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentation
 
Lecture notes
Lecture notesLecture notes
Lecture notes
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
AI_Planning.pdf
AI_Planning.pdfAI_Planning.pdf
AI_Planning.pdf
 
Cs221 rl
Cs221 rlCs221 rl
Cs221 rl
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
Goprez sg
Goprez  sgGoprez  sg
Goprez sg
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdf
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNING
 
AI_03_Solving Problems by Searching.pptx
AI_03_Solving Problems by Searching.pptxAI_03_Solving Problems by Searching.pptx
AI_03_Solving Problems by Searching.pptx
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 

More from ahmad bassiouny (20)

Work Study & Productivity
Work Study & ProductivityWork Study & Productivity
Work Study & Productivity
 
Work Study
Work StudyWork Study
Work Study
 
Motion And Time Study
Motion And Time StudyMotion And Time Study
Motion And Time Study
 
Motion Study
Motion StudyMotion Study
Motion Study
 
The Christmas Story
The Christmas StoryThe Christmas Story
The Christmas Story
 
Turkey Photos
Turkey PhotosTurkey Photos
Turkey Photos
 
Mission Bo Kv3
Mission Bo Kv3Mission Bo Kv3
Mission Bo Kv3
 
Miramar
MiramarMiramar
Miramar
 
Mom
MomMom
Mom
 
Linearization
LinearizationLinearization
Linearization
 
Kblmt B000 Intro Kaizen Based Lean Manufacturing
Kblmt B000 Intro Kaizen Based Lean ManufacturingKblmt B000 Intro Kaizen Based Lean Manufacturing
Kblmt B000 Intro Kaizen Based Lean Manufacturing
 
How To Survive
How To SurviveHow To Survive
How To Survive
 
Dad
DadDad
Dad
 
Ancient Hieroglyphics
Ancient HieroglyphicsAncient Hieroglyphics
Ancient Hieroglyphics
 
Dubai In 2009
Dubai In 2009Dubai In 2009
Dubai In 2009
 
DesignPeopleSystem
DesignPeopleSystemDesignPeopleSystem
DesignPeopleSystem
 
Organizational Behavior
Organizational BehaviorOrganizational Behavior
Organizational Behavior
 
Work Study Workshop
Work Study WorkshopWork Study Workshop
Work Study Workshop
 
Workstudy
WorkstudyWorkstudy
Workstudy
 
Time And Motion Study
Time And  Motion  StudyTime And  Motion  Study
Time And Motion Study
 

Recently uploaded

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfRich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfJerry Chew
 
Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of PlayPooky Knightsmith
 
Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Celine George
 
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdf
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdfContoh Aksi Nyata Refleksi Diri ( NUR ).pdf
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdfcupulin
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111GangaMaiya1
 
Orientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdfOrientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdfElizabeth Walsh
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxneillewis46
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...EADTU
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningMarc Dusseiller Dusjagr
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptxJoelynRubio1
 
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdfPUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdfMinawBelay
 
Diuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdf
Diuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdfDiuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdf
Diuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdfKartik Tiwari
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...EduSkills OECD
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024Borja Sotomayor
 
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MysoreMuleSoftMeetup
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....Ritu480198
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi RajagopalEADTU
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsSandeep D Chaudhary
 

Recently uploaded (20)

HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfRich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
 
Play hard learn harder: The Serious Business of Play
Play hard learn harder:  The Serious Business of PlayPlay hard learn harder:  The Serious Business of Play
Play hard learn harder: The Serious Business of Play
 
Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17
 
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdf
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdfContoh Aksi Nyata Refleksi Diri ( NUR ).pdf
Contoh Aksi Nyata Refleksi Diri ( NUR ).pdf
 
Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111Details on CBSE Compartment Exam.pptx1111
Details on CBSE Compartment Exam.pptx1111
 
Orientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdfOrientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdf
 
Graduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptxGraduate Outcomes Presentation Slides - English (v3).pptx
Graduate Outcomes Presentation Slides - English (v3).pptx
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
dusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learningdusjagr & nano talk on open tools for agriculture research and learning
dusjagr & nano talk on open tools for agriculture research and learning
 
21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx21st_Century_Skills_Framework_Final_Presentation_2.pptx
21st_Century_Skills_Framework_Final_Presentation_2.pptx
 
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdfPUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
PUBLIC FINANCE AND TAXATION COURSE-1-4.pdf
 
Diuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdf
Diuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdfDiuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdf
Diuretic, Hypoglycemic and Limit test of Heavy metals and Arsenic.-1.pdf
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
MuleSoft Integration with AWS Textract | Calling AWS Textract API |AWS - Clou...
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopal
 
OSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & SystemsOSCM Unit 2_Operations Processes & Systems
OSCM Unit 2_Operations Processes & Systems
 

Hierarchical Reinforcement Learning

  • 1. Hierarchical Reinforcement Learning Mausam [A Survey and Comparison of HRL techniques]
  • 2.
  • 3. Decision Making Slide courtesy Dan Weld Environment Percept Action What action next?
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35. Machine: Move e + Collision Avoidance Move e Choose Return End of hallway : End of hallway Obstacle Call M1 Call M2 M1 M2 Move w Move n Move n Return Move w Move s Move s Return
  • 36.
  • 37.
  • 38.
  • 39. Task Hierarchy: MAXQ Decomposition [Dietterich’00] Root Take Give Navigate(loc) Deliver Fetch Extend-arm Extend-arm Grab Release Move e Move w Move s Move n Children of a task are unordered
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45. 2. Optimality Hierarchical Optimality vs. Recursive Optimality
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.