Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Reinforcement
Learning
The Exploding Kittens
Edition
Tarek Amr
Why Reinforcement Learning?
I learned after
playing many times;
That I‘m more likely to
win if I played this move
after th...
States, Actions and Rewards
St St+1
At At+1 St+2
Goal State
R
What’s a good reward
If getting an
Exploding Kitten card
gives me a reward of
-1;
What reward do I get
if I get a Defuse c...
From Rewards, States get Values
And from
values comes
policies!
a State has a value (V)
St St+1
At At+1 St+2
Goal State
R
Vt Vt+1
or State/Action pair have a value (Q)
St St+1
At At+1 St+2
Goal State
R
Qt Qt+1
Temporal Difference; S-A-R-S-A
St St+1
At At+1 St+2
Goal State
R
Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)
Epsilon Greedy
St
St+1At At+1 St+2
Goal State
RExploration vs Exploitation
Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)
Deep Q Learning
State Feature1 State Feature2 Action Value
10 20 JUMP 0.5
20 15 DUCK 0.6
15 25 JUMP 0.8
Warning:Over simpl...
MDP, MC and TD
Markov Decision Process:
● You need to know the states and the transitions between them.
Monte Carlo (varia...
Let’s play the RL vs SL game
for (i=0; i<3; i++) {
● Pick a catawiki problem
● Should it be solved via
○ Reinforcement lea...
Upcoming SlideShare
Loading in …5
×

Reinforcement learning

144 views

Published on

Quick introduction to RL using Exploding Kittens card game as an example.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Reinforcement learning

  1. 1. Reinforcement Learning The Exploding Kittens Edition Tarek Amr
  2. 2. Why Reinforcement Learning? I learned after playing many times; That I‘m more likely to win if I played this move after that one. No one kept telling me make this or that move!
  3. 3. States, Actions and Rewards St St+1 At At+1 St+2 Goal State R
  4. 4. What’s a good reward If getting an Exploding Kitten card gives me a reward of -1; What reward do I get if I get a Defuse card? And for a Nope card?
  5. 5. From Rewards, States get Values And from values comes policies!
  6. 6. a State has a value (V) St St+1 At At+1 St+2 Goal State R Vt Vt+1
  7. 7. or State/Action pair have a value (Q) St St+1 At At+1 St+2 Goal State R Qt Qt+1
  8. 8. Temporal Difference; S-A-R-S-A St St+1 At At+1 St+2 Goal State R Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)
  9. 9. Epsilon Greedy St St+1At At+1 St+2 Goal State RExploration vs Exploitation Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)
  10. 10. Deep Q Learning State Feature1 State Feature2 Action Value 10 20 JUMP 0.5 20 15 DUCK 0.6 15 25 JUMP 0.8 Warning:Over simplification Ahead This is a Q-Table; What if there are too many States & Actions?
  11. 11. MDP, MC and TD Markov Decision Process: ● You need to know the states and the transitions between them. Monte Carlo (variance ↑): ● You wait till episode’s end, and re-assign values to states. ● No need to even know the states, we sample from the environment. Temporal Difference (bias ↑): ● Update on the go. No need to even have goal states.
  12. 12. Let’s play the RL vs SL game for (i=0; i<3; i++) { ● Pick a catawiki problem ● Should it be solved via ○ Reinforcement learning? ○ Supervised learning? }

×