Successfully reported this slideshow.
Upcoming SlideShare
×

# Reinforcement Learning 10. On-policy Control with Approximation

29 views

Published on

A summary of Chapter 10: On-policy Control with Approximation of the book 'Reinforcement Learning: An Introduction' by Sutton and Barto. You can find the full book in Professor Sutton's website: http://incompleteideas.net/book/the-book-2nd.html

Published in: Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### Reinforcement Learning 10. On-policy Control with Approximation

1. 1. Chapter 10: On-policy Control with Approximation Seungjae Ryan Lee
2. 2. Episodic 1-step semi-gradient Sarsa ● Approximate action values (instead of state values) ● Use Sarsa to define target ● Converges the same ways as TD(0) with same error bound
3. 3. Control with Episodic 1-step semi-gradient Sarsa ● Select action and improve policy using an ε-greedy action w.r.t.
4. 4. Mountain Car Example ● Task: Drive an underpowered car up a steep mountain road ○ Gravity is stronger than car’s engine ○ Must swing back and forth to build enough inertia ● State: position , velocity ● Actions: Forward (+1), Reverse (-1), No-op (0) ● Reward: -1 until the goal is reached
5. 5. Approximation for Mountain Car ● Tile coding used to select binary features (8 tiles)
6. 6. Results of Mountain Car ● Plot the cost-to-go function: ● Initial action values set to 0 ○ Very optimistic
7. 7. Results of Mountain Car
8. 8. Episodic n-step Semi-gradient Sarsa ● Use n-step return as the update target
9. 9. Episodic n-step Semi-gradient Sarsa in Practice
10. 10. Episodic n-step Semi-gradient Sarsa Results ● Faster learning ● Better asymptotic performance
11. 11. Episodic n-step Semi-gradient Sarsa Results ● Best performance for intermediate values of n-step
12. 12. Average Reward Setting ● Quality of policy defined by the average reward following policy ● Continuing tasks without discounting
13. 13. Differential Return and Value Functions Differential Return: differences between rewards and average reward Differential Value Functions: Expected differential returns
14. 14. Bellman Equations ● Remove all ● Replace rewards with difference of rewards
15. 15. Differential semi-gradient Sarsa ● Same update rule with differential TD error ● Original TD error: ● Differential TD error: