Advertisement

Sep. 8, 2017•0 likes## 0 likes

•190 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Download to read offline

Report

Education

Reinforcement Learning Concepts, presented at RISECamp 2017, UC Berkeley

Alexey TumanovFollow

PostDoctoral Researcher at UC BerkeleyAdvertisement

Advertisement

Advertisement

The Student's Guide to LinkedInLinkedIn

Different Roles in Machine Learning CareerIntellipaat

Defining a Tech Project Vision in Eight Quick Steps pdfTechSoup

The Hero's Journey (For movie fans, Lego fans, and presenters!)Dan Roam

10 Inspirational Quotes for GraduationGuy Kawasaki

The Health Benefits of DogsThe Presentation Designer

- Roy Fox RISECamp 7 Sep, 2017 Reinforcement Learning Concepts
- Sequential Decision Making 2 Nonsequential Sequential
- Sequential Decision Making 2 Nonsequential Sequential
- Sequential Decision Making 2 Nonsequential Sequential
- Sequential Decision Making 2 Nonsequential Sequential
- Sequential Decision Making 2 Nonsequential Sequential
- Sequential Decision Making 2 Nonsequential Sequential
- Sequential Decision Making 2 Nonsequential Sequential
- Example: Maze Navigation 3
- Example: Maze Navigation • State: agent location 3 st
- Example: Maze Navigation • State: agent location • Action: where to move 3 st at
- Example: Maze Navigation • State: agent location • Action: where to move • Reward: • Prize for reaching target • Cost for hitting wall 3 st at rt
- • Trajectory: sequence of states, actions and rewards • State dynamics: • Policy: deterministic ; stochastic • Reward: Markov Decision Process (MDP) ppst`1|st, atq ⇡pat|stq 4 at“⇡pstq rt“rpst, atq s0, a0, r0, s1, a1, r1, s2, . . .
- • Environment • Reset() → get initial state • Step( ) → get reward draw next state • Agent policy • Action( ) → draw next action "Rolling Out" 5 s0 ppst`1|st, atq at st ⇡pat|stq rpst, atq
- • Environment • Reset() → get initial state • Step( ) → get reward draw next state • Agent policy • Action( ) → draw next action "Rolling Out" 5 s0 ppst`1|st, atq at st ⇡pat|stq rpst, atq
- • Environment • Reset() → get initial state • Step( ) → get reward draw next state • Agent policy • Action( ) → draw next action "Rolling Out" 5 s0 ppst`1|st, atq at st ⇡pat|stq rpst, atq
- Example: Game of Go • State: board • Action: place stone • Reward: captures • Environment: can be simulated 6
- Example: Autonomous Vehicle • State: cameras, GPS • Action: steer, accelerate • Reward: speed, constraint satisfaction • Environment: physical 7
- Example: Surgical Robot • State: endoscope image, joint angles • Action: change in joint angles • Reward: task success • Environment: physical 8
- Example: Surgical Robot • State: endoscope image, joint angles • Action: change in joint angles • Reward: task success • Environment: physical 8
- Policy Evaluation • Return: • Discount prefer early rewards, late costs • Policy value is its expected return 9 R “ r0 ` r1 ` 2 r2 ` ¨ ¨ ¨ V⇡ ErRs 0 § § 1
- 6.3 Policy Evaluation • Return: • Discount prefer early rewards, late costs • Policy value is its expected return 9 R “ r0 ` r1 ` 2 r2 ` ¨ ¨ ¨ V⇡ ErRs 0 § § 1 7 5 7 7 5 7 R:
- Value Function • Policy value function is its expected return given current state, action • Optimal policy satisﬁes 10 Q⇡ps, aq ErR|s, as ⇡psq “ arg max a Q⇡ps, aq
- Value Iteration • Bellman equation: • Iterate to convergence • Final policy is 11 Qpst, atq “ rpst, atq` Ermax at`1 Qpst`1, at`1qs ⇡psq “ arg max a Qps, aq
- Value Iteration • Bellman equation: • Iterate to convergence • Final policy is 11 Qpst, atq “ rpst, atq` Ermax at`1 Qpst`1, at`1qs ⇡psq “ arg max a Qps, aq
- Representing Value • How to represent Q(s,a) for large state space? • Approximate Q with deep representations • Deep Q Net generalizes to unseen states 12
- Policy-Based RL • Represent with deep network • Iteratively evaluate and improve 13 Value Policy Improve Evaluate ⇡pa|sq ⇡
- RISECamp 7 Sep, 2017 Questions?

Advertisement