Reinforcement learning


Quick introduction to RL using Exploding Kittens card game as an example.

  1. 1. Reinforcement Learning The Exploding Kittens Edition Tarek Amr
  2. 2. Why Reinforcement Learning? I learned after playing many times; That I‘m more likely to win if I played this move after that one. No one kept telling me make this or that move!
  3. 3. States, Actions and Rewards St St+1 At At+1 St+2 Goal State R
  4. 4. What’s a good reward If getting an Exploding Kitten card gives me a reward of -1; What reward do I get if I get a Defuse card? And for a Nope card?
  5. 5. From Rewards, States get Values And from values comes policies!
  6. 6. a State has a value (V) St St+1 At At+1 St+2 Goal State R Vt Vt+1
  7. 7. or State/Action pair have a value (Q) St St+1 At At+1 St+2 Goal State R Qt Qt+1
  8. 8. Temporal Difference; S-A-R-S-A St St+1 At At+1 St+2 Goal State R Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)
  9. 9. Epsilon Greedy St St+1At At+1 St+2 Goal State RExploration vs Exploitation Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)
  10. 10. Deep Q Learning State Feature1 State Feature2 Action Value 10 20 JUMP 0.5 20 15 DUCK 0.6 15 25 JUMP 0.8 Warning:Over simplification Ahead This is a Q-Table; What if there are too many States & Actions?
  11. 11. MDP, MC and TD Markov Decision Process: ● You need to know the states and the transitions between them. Monte Carlo (variance ↑): ● You wait till episode’s end, and re-assign values to states. ● No need to even know the states, we sample from the environment. Temporal Difference (bias ↑): ● Update on the go. No need to even have goal states.
  12. 12. Let’s play the RL vs SL game for (i=0; i<3; i++) { ● Pick a catawiki problem ● Should it be solved via ○ Reinforcement learning? ○ Supervised learning? }