SlideShare a Scribd company logo
1 of 59
Hierarchical Reinforcement Learning Mausam [A Survey and Comparison of HRL techniques]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Decision Making Slide courtesy Dan Weld Environment Percept Action What action next?
Personal Printerbot ,[object Object],[object Object],[object Object],[object Object],[object Object]
Episodic Markov Decision Process ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* Markovian assumption. ** bounds  R  for  infinite horizon. Episodic MDP  ´  MDP with absorbing goals
Goal of an Episodic MDP ,[object Object],[object Object],[object Object],[object Object],* Non-noisy complete information perceptors
Solution of an Episodic MDP ,[object Object],[object Object]
Complexity of Value Iteration ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* Bellman’s curse of dimensionality
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Learning Environment Data ,[object Object],[object Object],[object Object],[object Object]
Decision Making while Learning* Environment Percepts Datum Action * Known as  Reinforcement Learning What action next?   ,[object Object],[object Object],[object Object],[object Object]
Reinforcement Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Planning vs. MDP vs. RL ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Exploration vs. Exploitation ,[object Object],[object Object],[object Object],[object Object]
Model Based Learning ,[object Object],[object Object],[object Object],[object Object],[object Object]
Model Free Learning ,[object Object],[object Object],[object Object],[object Object]
Learning ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Q-Learning ,[object Object],[object Object],[object Object],Optimal policy is the action with maximum  Q * value.
Q-Learning ,[object Object],[object Object],New estimate of  Q  value Old estimate of  Q  value
Semi-MDP: When actions take time. ,[object Object],[object Object],[object Object],[object Object]
Printerbot ,[object Object],[object Object],[object Object],[object Object],[object Object]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. The Mathematical Perspective A Structure Paradigm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
2. Modular Decision Making
2. Modular Decision Making ,[object Object],[object Object],[object Object]
2. Modular Decision Making ,[object Object],[object Object],[object Object]
3. Background Knowledge ,[object Object],[object Object],[object Object],[object Object],[object Object]
A mechanism that exploits all three avenues : Hierarchies ,[object Object],[object Object],[object Object]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hierarchy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hierarchical Algos  ´  Gating Mechanism ,[object Object],[object Object],[object Object],[object Object],* *Can be a multi-  level hierarchy. g is a gate b i  is a behaviour
Option : Move e  until end of hallway ,[object Object],[object Object],[object Object]
Options  [Sutton, Precup, Singh’99] ,[object Object],[object Object],[object Object],[object Object],[object Object],*Can be a policy over lower level options.
Learning ,[object Object],[object Object],[object Object],[object Object]
Machine: Move e  + Collision Avoidance Move e Choose Return End of hallway :  End of hallway Obstacle Call M1 Call M2 M1 M2 Move w Move n Move n Return Move w Move s Move s Return
Hierarchies of Abstract Machines [Parr, Russell’97] ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hierarchies of Abstract Machines ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Learning ,[object Object],[object Object],[object Object],[object Object]
Task Hierarchy: MAXQ Decomposition [Dietterich’00] Root Take Give Navigate(loc) Deliver Fetch Extend-arm Extend-arm Grab Release Move e Move w Move s Move n Children of a task are unordered
MAXQ Decomposition ,[object Object],[object Object],[object Object],[object Object],[object Object],*Observe the context-free nature of Q -value Reward received while navigating Reward received after navigation
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
1. State Abstraction ,[object Object],[object Object],[object Object]
State Abstraction in MAXQ ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
State Abstraction in Options, HAM ,[object Object],[object Object],[object Object],[object Object],[object Object],*[Andre,Russell’02]
2. Optimality Hierarchical Optimality vs. Recursive Optimality
Optimality ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],* Can define eqns for both optimalities **Adv. of using macro-actions maybe lost.
3. Language Expressiveness ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
4. Knowledge Requirements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
5. Models advanced ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
6. Structure Paradigm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Directions for Future Research ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Directions for Future Research ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Applications ,[object Object],[object Object],[object Object],[object Object],Images courtesy various sources Parts Assemblies Ware-house P2 P1 P3 P4 D2 D3 D4 D1                    
Thinking Big… ,[object Object],[object Object]
The Outline of the Talk ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
How to choose appropriate hierarchy ,[object Object],[object Object],[object Object],[object Object],[object Object]
The Structure Paradigm ,[object Object],[object Object]
Main ideas in HRL community ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

More Related Content

What's hot

Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!Dongmin Lee
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
[RLkorea] 각잡고 로봇팔 발표
[RLkorea] 각잡고 로봇팔 발표[RLkorea] 각잡고 로봇팔 발표
[RLkorea] 각잡고 로봇팔 발표ashley ryu
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Apprentissage par renforcement
Apprentissage par renforcement Apprentissage par renforcement
Apprentissage par renforcement seml147
 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningSeolhokim
 
강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2Dongmin Lee
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기Woong won Lee
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
분산 강화학습 논문(DeepMind IMPALA) 구현
분산 강화학습 논문(DeepMind IMPALA) 구현분산 강화학습 논문(DeepMind IMPALA) 구현
분산 강화학습 논문(DeepMind IMPALA) 구현정주 김
 

What's hot (20)

Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
안.전.제.일. 강화학습!
안.전.제.일. 강화학습!안.전.제.일. 강화학습!
안.전.제.일. 강화학습!
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
[RLkorea] 각잡고 로봇팔 발표
[RLkorea] 각잡고 로봇팔 발표[RLkorea] 각잡고 로봇팔 발표
[RLkorea] 각잡고 로봇팔 발표
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Apprentissage par renforcement
Apprentissage par renforcement Apprentissage par renforcement
Apprentissage par renforcement
 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning
 
강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2강화학습 알고리즘의 흐름도 Part 2
강화학습 알고리즘의 흐름도 Part 2
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Introduction of Faster R-CNN
Introduction of Faster R-CNNIntroduction of Faster R-CNN
Introduction of Faster R-CNN
 
RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기RLCode와 A3C 쉽고 깊게 이해하기
RLCode와 A3C 쉽고 깊게 이해하기
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
분산 강화학습 논문(DeepMind IMPALA) 구현
분산 강화학습 논문(DeepMind IMPALA) 구현분산 강화학습 논문(DeepMind IMPALA) 구현
분산 강화학습 논문(DeepMind IMPALA) 구현
 

Viewers also liked

Aggregate planning
Aggregate planningAggregate planning
Aggregate planningAtif Ghayas
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 

Viewers also liked (6)

Htn in videogames
Htn in videogamesHtn in videogames
Htn in videogames
 
強化学習勉強会・論文紹介(Kulkarni et al., 2016)
強化学習勉強会・論文紹介(Kulkarni et al., 2016)強化学習勉強会・論文紹介(Kulkarni et al., 2016)
強化学習勉強会・論文紹介(Kulkarni et al., 2016)
 
Hierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement LearningHierarchical Object Detection with Deep Reinforcement Learning
Hierarchical Object Detection with Deep Reinforcement Learning
 
Aggregate Planning
Aggregate  PlanningAggregate  Planning
Aggregate Planning
 
Aggregate planning
Aggregate planningAggregate planning
Aggregate planning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 

Similar to Hierarchical Reinforcement Learning

reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysYasutoTamura1
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.pptssuser43a599
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.pptPOOJASHREEC1
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
lecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationlecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationbutest
 
Lecture notes
Lecture notesLecture notes
Lecture notesbutest
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfssuseradaf5f
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNINGpradiprahul
 
AI_03_Solving Problems by Searching.pptx
AI_03_Solving Problems by Searching.pptxAI_03_Solving Problems by Searching.pptx
AI_03_Solving Problems by Searching.pptxYousef Aburawi
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 

Similar to Hierarchical Reinforcement Learning (20)

reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
RL intro
RL introRL intro
RL intro
 
YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
lecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentationlecture_21.pptx - PowerPoint Presentation
lecture_21.pptx - PowerPoint Presentation
 
Lecture notes
Lecture notesLecture notes
Lecture notes
 
AI_Planning.pdf
AI_Planning.pdfAI_Planning.pdf
AI_Planning.pdf
 
Cs221 rl
Cs221 rlCs221 rl
Cs221 rl
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
Goprez sg
Goprez  sgGoprez  sg
Goprez sg
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdf
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNING
 
AI_03_Solving Problems by Searching.pptx
AI_03_Solving Problems by Searching.pptxAI_03_Solving Problems by Searching.pptx
AI_03_Solving Problems by Searching.pptx
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 

More from ahmad bassiouny (20)

Work Study & Productivity
Work Study & ProductivityWork Study & Productivity
Work Study & Productivity
 
Work Study
Work StudyWork Study
Work Study
 
Motion And Time Study
Motion And Time StudyMotion And Time Study
Motion And Time Study
 
Motion Study
Motion StudyMotion Study
Motion Study
 
The Christmas Story
The Christmas StoryThe Christmas Story
The Christmas Story
 
Turkey Photos
Turkey PhotosTurkey Photos
Turkey Photos
 
Mission Bo Kv3
Mission Bo Kv3Mission Bo Kv3
Mission Bo Kv3
 
Miramar
MiramarMiramar
Miramar
 
Mom
MomMom
Mom
 
Linearization
LinearizationLinearization
Linearization
 
Kblmt B000 Intro Kaizen Based Lean Manufacturing
Kblmt B000 Intro Kaizen Based Lean ManufacturingKblmt B000 Intro Kaizen Based Lean Manufacturing
Kblmt B000 Intro Kaizen Based Lean Manufacturing
 
How To Survive
How To SurviveHow To Survive
How To Survive
 
Dad
DadDad
Dad
 
Ancient Hieroglyphics
Ancient HieroglyphicsAncient Hieroglyphics
Ancient Hieroglyphics
 
Dubai In 2009
Dubai In 2009Dubai In 2009
Dubai In 2009
 
DesignPeopleSystem
DesignPeopleSystemDesignPeopleSystem
DesignPeopleSystem
 
Organizational Behavior
Organizational BehaviorOrganizational Behavior
Organizational Behavior
 
Work Study Workshop
Work Study WorkshopWork Study Workshop
Work Study Workshop
 
Workstudy
WorkstudyWorkstudy
Workstudy
 
Time And Motion Study
Time And  Motion  StudyTime And  Motion  Study
Time And Motion Study
 

Recently uploaded

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 

Recently uploaded (20)

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 

Hierarchical Reinforcement Learning

  • 1. Hierarchical Reinforcement Learning Mausam [A Survey and Comparison of HRL techniques]
  • 2.
  • 3. Decision Making Slide courtesy Dan Weld Environment Percept Action What action next?
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35. Machine: Move e + Collision Avoidance Move e Choose Return End of hallway : End of hallway Obstacle Call M1 Call M2 M1 M2 Move w Move n Move n Return Move w Move s Move s Return
  • 36.
  • 37.
  • 38.
  • 39. Task Hierarchy: MAXQ Decomposition [Dietterich’00] Root Take Give Navigate(loc) Deliver Fetch Extend-arm Extend-arm Grab Release Move e Move w Move s Move n Children of a task are unordered
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45. 2. Optimality Hierarchical Optimality vs. Recursive Optimality
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.