SlideShare a Scribd company logo
1 of 18
Tech Talk: Reinforcement Learning
Tamura Yasuto
Table of Contents
• Theme of This Tech Talk: Stop Saying “Trial and Errors”
• Rough Definition of RL (*basic settings)
• Planning in Markov Decision Process (MDP)
• Interactive Optimization of Policies and Values
• Wrapping Up
Theme of This Tech Talk: Stop Saying “Trial and Errors“
With these charts,
you will miss the point in the beginning
From “Trial and Errors“ to Interactive Value-Policy Updates
Agent
Environment
Action
Reward
Value
Policy
This part should be
emphasized more
Table of Contents
• Theme of This Tech Talk: Stop Saying “Trial and Errors”
• Rough Definition of RL (*basic settings)
• Planning in Markov Decision Process (MDP)
• Interactive Optimization of Policies and Values
• Wrapping Up
Role of Reinforcement Learnig (RL) in AI
Machine learning
AI
Machine learning
Classical
models
Neural
networks
Supervised
learning
Unsupervised
learning
Reinforcement
learning
Models How to train
Rough Definition of RL: Planning Problem
• Sequential decision making: optimizing a sequence of actions
• Optimizing a “policy”: a “policy” means how to move in a given “state”
• Assuming Markov decision processes: next action only depends on the current state
Policy Action State
Example of planning: navigating a robot
Table of Contents
• Theme of This Tech Talk: Stop Saying “Trial and Errors”
• Rough Definition of RL (*basic settings)
• Planning in Markov Decision Process (MDP)
• Interactive Optimization of Policies and Values
• Wrapping Up
Markov Decision Process (MDP) in Some Expressions
Agent Env
Action
Reward
• Typical RL diagram
• State transition diagram • Backup diagram (closed)
• Graphical model
MDP: with an Example of Balancing a Bike
Or
State 0
State 1
State 2
State 3
State 4
Leaning left
No move
Leaning right
Plannign in MDP: Some Expressions
• Learning how to move optimally in each state
No move
Lean left
Lean right
Table of Contents
• Theme of This Tech Talk: Stop Saying “Trial and Errors”
• Rough Definition of RL (*basic settings)
• Planning in Markov Decision Process (MDP)
• Interactive Optimization of Policies and Values
• Wrapping UP
Values and Policies: with an Example of Balancing a Bike
• Value: how good it is to be in a state
• Policies: a probability of taking an action in a state
State 0:
minus reward
State 1:
low value
State 2:
high value
State 3:
low value
State 4:
minus reward
Action 0:
Low probability
Action 2:
High probability
Action 1
Policy updates
• Higher probability on actions to the direction of high values
State 0:
minus reward
State 1:
low value
State 2:
high value
Action 0:
leaning left
Action 1:
leaning right
Then how can a vlaue be learned?
Giving higher probability
Value update: Temporal Difference (TD) Learning
• TD learning: updating values by filling a gap between expectation
and actual rewards
If you lean left, the
values is low. As expected!
TD loss is low
Leaning right would
not be good because
value is low.
I was wrong.
There is no bad reward.
Let’s update the value.
TD loss is high
Learning could happen without explicit rewards
Interactive Updates of Value and Policy
Value updates (TD learning)
Policy updates
Table of Contents
• Theme of This Tech Talk: Stop Saying “Trial and Errors”
• Rough Definition of RL (*basic settings)
• Planning in Markov Decision Process (MDP)
• Interactive Optimization of Policies and Values
• Wrapping Up
Wrapping Up
• RL formulation: a planning problem by optimizing a policy
• Simple assumption of MDP : an action only depends on the current state
• Importance of a value: updating a policy by evaluating how good to be in
• TD learning: updating values by filling a gap between estimations on
values and actual rewards

More Related Content

What's hot

Problem solving in Artificial Intelligence.pptx
Problem solving in Artificial Intelligence.pptxProblem solving in Artificial Intelligence.pptx
Problem solving in Artificial Intelligence.pptx
kitsenthilkumarcse
 

What's hot (20)

Artificial Intelligence_ Knowledge Representation
Artificial Intelligence_ Knowledge RepresentationArtificial Intelligence_ Knowledge Representation
Artificial Intelligence_ Knowledge Representation
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
 
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
PyTorch Python Tutorial | Deep Learning Using PyTorch | Image Classifier Usin...
 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
 
Lecture 23 alpha beta pruning
Lecture 23 alpha beta pruningLecture 23 alpha beta pruning
Lecture 23 alpha beta pruning
 
Problem solving in Artificial Intelligence.pptx
Problem solving in Artificial Intelligence.pptxProblem solving in Artificial Intelligence.pptx
Problem solving in Artificial Intelligence.pptx
 
Graph Neural Network in practice
Graph Neural Network in practiceGraph Neural Network in practice
Graph Neural Network in practice
 
Parametric and nonparametric
Parametric and nonparametricParametric and nonparametric
Parametric and nonparametric
 
AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)AI Lecture 3 (solving problems by searching)
AI Lecture 3 (solving problems by searching)
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Computational Complexity
Computational ComplexityComputational Complexity
Computational Complexity
 
Activation function
Activation functionActivation function
Activation function
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Double Q-learning Paper Reading
Double Q-learning Paper ReadingDouble Q-learning Paper Reading
Double Q-learning Paper Reading
 
Python strings presentation
Python strings presentationPython strings presentation
Python strings presentation
 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
Multi Layer Network
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
ppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptxppt on machine learning to deep learning (1).pptx
ppt on machine learning to deep learning (1).pptx
 

Similar to RL_in_10_min.pptx

Basics of Data Analysis
Basics of Data AnalysisBasics of Data Analysis
Basics of Data Analysis
ankurjain1909
 

Similar to RL_in_10_min.pptx (20)

Reinforcement course material samples: lecture 1
Reinforcement course material samples: lecture 1Reinforcement course material samples: lecture 1
Reinforcement course material samples: lecture 1
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
Intro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIIntro to Reinforcement learning - part II
Intro to Reinforcement learning - part II
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learning
 
Modern Recommendation for Advanced Practitioners part2
Modern Recommendation for Advanced Practitioners part2Modern Recommendation for Advanced Practitioners part2
Modern Recommendation for Advanced Practitioners part2
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
 
reinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of universityreinforcement-learning its based on the slide of university
reinforcement-learning its based on the slide of university
 
reinforcement-learning.ppt
reinforcement-learning.pptreinforcement-learning.ppt
reinforcement-learning.ppt
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Intro to Reinforcement learning - part I
Intro to Reinforcement learning - part IIntro to Reinforcement learning - part I
Intro to Reinforcement learning - part I
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Shanghai deep learning meetup 4
Shanghai deep learning meetup 4Shanghai deep learning meetup 4
Shanghai deep learning meetup 4
 
Horizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at ScaleHorizon: Deep Reinforcement Learning at Scale
Horizon: Deep Reinforcement Learning at Scale
 
Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016
Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016
Harm van Seijen, Research Scientist, Maluuba at MLconf SF 2016
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
anintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdfanintroductiontoreinforcementlearning-180912151720.pdf
anintroductiontoreinforcementlearning-180912151720.pdf
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Basics of Data Analysis
Basics of Data AnalysisBasics of Data Analysis
Basics of Data Analysis
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 

Recently uploaded

Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
HyderabadDolls
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 

Recently uploaded (20)

High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
High Profile Call Girls Service in Jalore { 9332606886 } VVIP NISHA Call Girl...
 
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
Diamond Harbour \ Russian Call Girls Kolkata | Book 8005736733 Extreme Naught...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...Top Call Girls in Balaghat  9332606886Call Girls Advance Cash On Delivery Ser...
Top Call Girls in Balaghat 9332606886Call Girls Advance Cash On Delivery Ser...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service AvailableVastral Call Girls Book Now 7737669865 Top Class Escort Service Available
Vastral Call Girls Book Now 7737669865 Top Class Escort Service Available
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
Kalyani ? Call Girl in Kolkata | Service-oriented sexy call girls 8005736733 ...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Nandurbar [ 7014168258 ] Call Me For Genuine Models...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 

RL_in_10_min.pptx

  • 1. Tech Talk: Reinforcement Learning Tamura Yasuto
  • 2. Table of Contents • Theme of This Tech Talk: Stop Saying “Trial and Errors” • Rough Definition of RL (*basic settings) • Planning in Markov Decision Process (MDP) • Interactive Optimization of Policies and Values • Wrapping Up
  • 3. Theme of This Tech Talk: Stop Saying “Trial and Errors“ With these charts, you will miss the point in the beginning
  • 4. From “Trial and Errors“ to Interactive Value-Policy Updates Agent Environment Action Reward Value Policy This part should be emphasized more
  • 5. Table of Contents • Theme of This Tech Talk: Stop Saying “Trial and Errors” • Rough Definition of RL (*basic settings) • Planning in Markov Decision Process (MDP) • Interactive Optimization of Policies and Values • Wrapping Up
  • 6. Role of Reinforcement Learnig (RL) in AI Machine learning AI Machine learning Classical models Neural networks Supervised learning Unsupervised learning Reinforcement learning Models How to train
  • 7. Rough Definition of RL: Planning Problem • Sequential decision making: optimizing a sequence of actions • Optimizing a “policy”: a “policy” means how to move in a given “state” • Assuming Markov decision processes: next action only depends on the current state Policy Action State Example of planning: navigating a robot
  • 8. Table of Contents • Theme of This Tech Talk: Stop Saying “Trial and Errors” • Rough Definition of RL (*basic settings) • Planning in Markov Decision Process (MDP) • Interactive Optimization of Policies and Values • Wrapping Up
  • 9. Markov Decision Process (MDP) in Some Expressions Agent Env Action Reward • Typical RL diagram • State transition diagram • Backup diagram (closed) • Graphical model
  • 10. MDP: with an Example of Balancing a Bike Or State 0 State 1 State 2 State 3 State 4 Leaning left No move Leaning right
  • 11. Plannign in MDP: Some Expressions • Learning how to move optimally in each state No move Lean left Lean right
  • 12. Table of Contents • Theme of This Tech Talk: Stop Saying “Trial and Errors” • Rough Definition of RL (*basic settings) • Planning in Markov Decision Process (MDP) • Interactive Optimization of Policies and Values • Wrapping UP
  • 13. Values and Policies: with an Example of Balancing a Bike • Value: how good it is to be in a state • Policies: a probability of taking an action in a state State 0: minus reward State 1: low value State 2: high value State 3: low value State 4: minus reward Action 0: Low probability Action 2: High probability Action 1
  • 14. Policy updates • Higher probability on actions to the direction of high values State 0: minus reward State 1: low value State 2: high value Action 0: leaning left Action 1: leaning right Then how can a vlaue be learned? Giving higher probability
  • 15. Value update: Temporal Difference (TD) Learning • TD learning: updating values by filling a gap between expectation and actual rewards If you lean left, the values is low. As expected! TD loss is low Leaning right would not be good because value is low. I was wrong. There is no bad reward. Let’s update the value. TD loss is high Learning could happen without explicit rewards
  • 16. Interactive Updates of Value and Policy Value updates (TD learning) Policy updates
  • 17. Table of Contents • Theme of This Tech Talk: Stop Saying “Trial and Errors” • Rough Definition of RL (*basic settings) • Planning in Markov Decision Process (MDP) • Interactive Optimization of Policies and Values • Wrapping Up
  • 18. Wrapping Up • RL formulation: a planning problem by optimizing a policy • Simple assumption of MDP : an action only depends on the current state • Importance of a value: updating a policy by evaluating how good to be in • TD learning: updating values by filling a gap between estimations on values and actual rewards