SlideShare a Scribd company logo
1 of 19
Download to read offline
Chapter 4: Dynamic Programming
Seungjae Ryan Lee
Dynamic Programming
● Algorithms to compute optimal policies with a perfect model of environment
● Use value functions to structure searching for good policies
● Foundation of all methods hereafter
Dynamic
Programming
Monte Carlo
Temporal
Difference
Policy Evaluation (Prediction)
● Compute state-value for some policy
● Use the Bellman Equation:
Iterative Policy Evaluation
● Solving linear systems is tedious → Use iterative methods
● Define sequence of approximate value functions
● Expected update using the Bellman equation:
○ Update based on expectation of all possible next states
Iterative Policy Evaluation in Practice
● In-place methods usually converge faster than keeping two arrays
● Terminate policy evaluation when is sufficiently small
Gridworld Example
● Deterministic state transition
● Off-the-grid actions leave the state unchanged
● Undiscounted, episodic task
Policy Evaluation in Gridworld
● Random policy
Policy Improvement - One state
● Suppose we know for some policy
● For a state , see if there is a better action
● Check if
○ If true, greedily selecting is better than
○ Special case of Policy Improvement Theorem
0
-1
-1
-1
0
-1
-1
-1
Policy Improvement Theorem
For policies , if for all state ,
Then, is at least as good a policy as .
(Strict inequality if )
Policy Improvement
● Find better policies with the computed value function
● Use a new greedy policy
● Satisfies the conditions of Policy Improvement Theorem
Guarantees of Policy Improvement
● If , then the Bellman Optimality Equation holds.
→ Policy Improvement always returns a better policy unless already optimal
Policy Iteration
● Repeat Policy Evaluation and Policy Improvement
● Guaranteed improvement for each policy
● Guaranteed convergence in finite number of steps for finite MDPs
Policy Iteration in Practice
● Initialize with for quicker policy evaluation
● Often converges in surprisingly few iterations
Value Iteration
● “Truncate” policy evaluation
○ Don’t wait until is sufficiently small
○ Update state values once for each state
● Evaluation and improvement can be simplified to one update operation
○ Bellman optimality equation turned into an update rule
Value Iteration in Practice
● Terminate when is sufficiently small
Asynchronous Dynamic Programming
● Don’t sweep over the entire state set systematically
○ Some states are updated multiple times before other state is updated once
○ Order/skip states to propagate information efficiently
● Can intermix with real-time interaction
○ Update states according to the agent’s experience
○ Allow focusing updates to relevant states
● To converge, all states must be continuously updated
Generalized Policy Iteration
● Idea of interaction between policy evaluation and policy improvement
○ Policy improved w.r.t. value function
○ Value function updated for new policy
● Describes most RL methods
● Stabilized process guarantees optimal policy
Efficiency of Dynamic Programming
● Polynomial in and
○ Exponentially faster than direct search in policy space
● More practical than linear programming methods in larger problems
○ Asynchronous DP preferred for large state spaces
● Typically converge faster than their worst-case guarantee
○ Initial values can help faster convergence
Thank you!
Original content from
● Reinforcement Learning: An Introduction by Sutton and Barto
You can find more content in
● github.com/seungjaeryanlee
● www.endtoend.ai

More Related Content

What's hot

An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learningBig Data Colombia
 
Markov decision process
Markov decision processMarkov decision process
Markov decision processJie-Han Chen
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Markov decision process
Markov decision processMarkov decision process
Markov decision processHamed Abdi
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed banditJie-Han Chen
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningJungyeol
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsSangwoo Mo
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learningJie-Han Chen
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Reinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step BootstrappingReinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step BootstrappingSeung Jae Lee
 

What's hot (20)

An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed bandit
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and Applications
 
Temporal difference learning
Temporal difference learningTemporal difference learning
Temporal difference learning
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
 
Reinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step BootstrappingReinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step Bootstrapping
 

Similar to Reinforcement Learning 4. Dynamic Programming

Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)Bean Yen
 
Structured prediction with reinforcement learning
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learningguruprasad110
 
Proximal Policy Optimization
Proximal Policy OptimizationProximal Policy Optimization
Proximal Policy OptimizationShubhaManikarnike
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introductionConnorShorten2
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learningCairo University
 
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Lviv Startup Club
 
Intro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIIntro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIMikko Mäkipää
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmSupun Abeysinghe
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Chris Ohk
 
Intro to Reinforcement learning - part I
Intro to Reinforcement learning - part IIntro to Reinforcement learning - part I
Intro to Reinforcement learning - part IMikko Mäkipää
 
Power Laws: Optimizing Demand-side Strategies. Second Prize Solution
Power Laws: Optimizing Demand-side Strategies. Second Prize SolutionPower Laws: Optimizing Demand-side Strategies. Second Prize Solution
Power Laws: Optimizing Demand-side Strategies. Second Prize SolutionGuillermo Barbadillo Villanueva
 
FeUdal Networks for Hierarchical Reinforcement Learning
FeUdal Networks for Hierarchical Reinforcement LearningFeUdal Networks for Hierarchical Reinforcement Learning
FeUdal Networks for Hierarchical Reinforcement LearningIllia Polosukhin
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginnersgokulprasath06
 
Dexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIDexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIAnand Joshi
 
Reinforcement Learning for Algorithmic Trading
Reinforcement Learning for Algorithmic TradingReinforcement Learning for Algorithmic Trading
Reinforcement Learning for Algorithmic TradingSynaptonIncorporated
 
[DSC Croatia 22] Modeling the Dynamics of User Engagement - Enes Deumic
[DSC Croatia 22] Modeling the Dynamics of User Engagement - Enes Deumic[DSC Croatia 22] Modeling the Dynamics of User Engagement - Enes Deumic
[DSC Croatia 22] Modeling the Dynamics of User Engagement - Enes DeumicDataScienceConferenc1
 
Aaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAminaRepo
 

Similar to Reinforcement Learning 4. Dynamic Programming (20)

Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)
 
Structured prediction with reinforcement learning
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learning
 
Proximal Policy Optimization
Proximal Policy OptimizationProximal Policy Optimization
Proximal Policy Optimization
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learning
 
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
 
Intro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIIntro to Reinforcement learning - part II
Intro to Reinforcement learning - part II
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
 
Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015Trust Region Policy Optimization, Schulman et al, 2015
Trust Region Policy Optimization, Schulman et al, 2015
 
Intro to Reinforcement learning - part I
Intro to Reinforcement learning - part IIntro to Reinforcement learning - part I
Intro to Reinforcement learning - part I
 
Rules engine.pptx
Rules engine.pptxRules engine.pptx
Rules engine.pptx
 
Power Laws: Optimizing Demand-side Strategies. Second Prize Solution
Power Laws: Optimizing Demand-side Strategies. Second Prize SolutionPower Laws: Optimizing Demand-side Strategies. Second Prize Solution
Power Laws: Optimizing Demand-side Strategies. Second Prize Solution
 
Grad presentation
Grad presentationGrad presentation
Grad presentation
 
FeUdal Networks for Hierarchical Reinforcement Learning
FeUdal Networks for Hierarchical Reinforcement LearningFeUdal Networks for Hierarchical Reinforcement Learning
FeUdal Networks for Hierarchical Reinforcement Learning
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginners
 
Dexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAIDexterous In-hand Manipulation by OpenAI
Dexterous In-hand Manipulation by OpenAI
 
Reinforcement Learning for Algorithmic Trading
Reinforcement Learning for Algorithmic TradingReinforcement Learning for Algorithmic Trading
Reinforcement Learning for Algorithmic Trading
 
[DSC Croatia 22] Modeling the Dynamics of User Engagement - Enes Deumic
[DSC Croatia 22] Modeling the Dynamics of User Engagement - Enes Deumic[DSC Croatia 22] Modeling the Dynamics of User Engagement - Enes Deumic
[DSC Croatia 22] Modeling the Dynamics of User Engagement - Enes Deumic
 
Aaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement Learning
 
Recommender systems
Recommender systems Recommender systems
Recommender systems
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Reinforcement Learning 4. Dynamic Programming

  • 1. Chapter 4: Dynamic Programming Seungjae Ryan Lee
  • 2. Dynamic Programming ● Algorithms to compute optimal policies with a perfect model of environment ● Use value functions to structure searching for good policies ● Foundation of all methods hereafter Dynamic Programming Monte Carlo Temporal Difference
  • 3. Policy Evaluation (Prediction) ● Compute state-value for some policy ● Use the Bellman Equation:
  • 4. Iterative Policy Evaluation ● Solving linear systems is tedious → Use iterative methods ● Define sequence of approximate value functions ● Expected update using the Bellman equation: ○ Update based on expectation of all possible next states
  • 5. Iterative Policy Evaluation in Practice ● In-place methods usually converge faster than keeping two arrays ● Terminate policy evaluation when is sufficiently small
  • 6. Gridworld Example ● Deterministic state transition ● Off-the-grid actions leave the state unchanged ● Undiscounted, episodic task
  • 7. Policy Evaluation in Gridworld ● Random policy
  • 8. Policy Improvement - One state ● Suppose we know for some policy ● For a state , see if there is a better action ● Check if ○ If true, greedily selecting is better than ○ Special case of Policy Improvement Theorem 0 -1 -1 -1 0 -1 -1 -1
  • 9. Policy Improvement Theorem For policies , if for all state , Then, is at least as good a policy as . (Strict inequality if )
  • 10. Policy Improvement ● Find better policies with the computed value function ● Use a new greedy policy ● Satisfies the conditions of Policy Improvement Theorem
  • 11. Guarantees of Policy Improvement ● If , then the Bellman Optimality Equation holds. → Policy Improvement always returns a better policy unless already optimal
  • 12. Policy Iteration ● Repeat Policy Evaluation and Policy Improvement ● Guaranteed improvement for each policy ● Guaranteed convergence in finite number of steps for finite MDPs
  • 13. Policy Iteration in Practice ● Initialize with for quicker policy evaluation ● Often converges in surprisingly few iterations
  • 14. Value Iteration ● “Truncate” policy evaluation ○ Don’t wait until is sufficiently small ○ Update state values once for each state ● Evaluation and improvement can be simplified to one update operation ○ Bellman optimality equation turned into an update rule
  • 15. Value Iteration in Practice ● Terminate when is sufficiently small
  • 16. Asynchronous Dynamic Programming ● Don’t sweep over the entire state set systematically ○ Some states are updated multiple times before other state is updated once ○ Order/skip states to propagate information efficiently ● Can intermix with real-time interaction ○ Update states according to the agent’s experience ○ Allow focusing updates to relevant states ● To converge, all states must be continuously updated
  • 17. Generalized Policy Iteration ● Idea of interaction between policy evaluation and policy improvement ○ Policy improved w.r.t. value function ○ Value function updated for new policy ● Describes most RL methods ● Stabilized process guarantees optimal policy
  • 18. Efficiency of Dynamic Programming ● Polynomial in and ○ Exponentially faster than direct search in policy space ● More practical than linear programming methods in larger problems ○ Asynchronous DP preferred for large state spaces ● Typically converge faster than their worst-case guarantee ○ Initial values can help faster convergence
  • 19. Thank you! Original content from ● Reinforcement Learning: An Introduction by Sutton and Barto You can find more content in ● github.com/seungjaeryanlee ● www.endtoend.ai