SlideShare a Scribd company logo
1 of 12
Download to read offline
Reinforcement
Learning
The Exploding Kittens
Edition
Tarek Amr
Why Reinforcement Learning?
I learned after
playing many times;
That I‘m more likely to
win if I played this move
after that one.
No one kept telling me
make this or that move!
States, Actions and Rewards
St St+1
At At+1 St+2
Goal State
R
What’s a good reward
If getting an
Exploding Kitten card
gives me a reward of
-1;
What reward do I get
if I get a Defuse card?
And for a Nope card?
From Rewards, States get Values
And from
values comes
policies!
a State has a value (V)
St St+1
At At+1 St+2
Goal State
R
Vt Vt+1
or State/Action pair have a value (Q)
St St+1
At At+1 St+2
Goal State
R
Qt Qt+1
Temporal Difference; S-A-R-S-A
St St+1
At At+1 St+2
Goal State
R
Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)
Epsilon Greedy
St
St+1At At+1 St+2
Goal State
RExploration vs Exploitation
Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)
Deep Q Learning
State Feature1 State Feature2 Action Value
10 20 JUMP 0.5
20 15 DUCK 0.6
15 25 JUMP 0.8
Warning:Over simplification Ahead
This is a Q-Table;
What if there are too many States & Actions?
MDP, MC and TD
Markov Decision Process:
● You need to know the states and the transitions between them.
Monte Carlo (variance ↑):
● You wait till episode’s end, and re-assign values to states.
● No need to even know the states, we sample from the environment.
Temporal Difference (bias ↑):
● Update on the go. No need to even have goal states.
Let’s play the RL vs SL game
for (i=0; i<3; i++) {
● Pick a catawiki problem
● Should it be solved via
○ Reinforcement learning?
○ Supervised learning?
}

More Related Content

Recently uploaded

Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbas73678sri
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdfDSP Mutual Fund
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
Data Discovery With Power Query in excel
Data Discovery With Power Query in excelData Discovery With Power Query in excel
Data Discovery With Power Query in excelKapilSidhpuria3
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...ThinkInnovation
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvws73678sri
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Inference rules in artificial intelligence
Inference rules in artificial intelligenceInference rules in artificial intelligence
Inference rules in artificial intelligencePriyadharshiniG41
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 

Recently uploaded (20)

Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsbaAdobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
Adobe Scan 06-Mar-2024 (1).pdfwvsbbsbsba
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
testingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdftestingsdadadadaaddadadadadadadadaad.pdf
testingsdadadadaaddadadadadadadadaad.pdf
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
Data Discovery With Power Query in excel
Data Discovery With Power Query in excelData Discovery With Power Query in excel
Data Discovery With Power Query in excel
 
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
Predictive Analysis - Using Insight-informed Data to Plan Inventory in Next 6...
 
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvwAdobe Scan 06-Mar-2024 (1).pdf shavashwvw
Adobe Scan 06-Mar-2024 (1).pdf shavashwvw
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Inference rules in artificial intelligence
Inference rules in artificial intelligenceInference rules in artificial intelligence
Inference rules in artificial intelligence
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 

Reinforcement learning

  • 2. Why Reinforcement Learning? I learned after playing many times; That I‘m more likely to win if I played this move after that one. No one kept telling me make this or that move!
  • 3. States, Actions and Rewards St St+1 At At+1 St+2 Goal State R
  • 4. What’s a good reward If getting an Exploding Kitten card gives me a reward of -1; What reward do I get if I get a Defuse card? And for a Nope card?
  • 5. From Rewards, States get Values And from values comes policies!
  • 6. a State has a value (V) St St+1 At At+1 St+2 Goal State R Vt Vt+1
  • 7. or State/Action pair have a value (Q) St St+1 At At+1 St+2 Goal State R Qt Qt+1
  • 8. Temporal Difference; S-A-R-S-A St St+1 At At+1 St+2 Goal State R Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)
  • 9. Epsilon Greedy St St+1At At+1 St+2 Goal State RExploration vs Exploitation Qt := Qt + α (Rt+1 + γ Qt+1 - Qt)
  • 10. Deep Q Learning State Feature1 State Feature2 Action Value 10 20 JUMP 0.5 20 15 DUCK 0.6 15 25 JUMP 0.8 Warning:Over simplification Ahead This is a Q-Table; What if there are too many States & Actions?
  • 11. MDP, MC and TD Markov Decision Process: ● You need to know the states and the transitions between them. Monte Carlo (variance ↑): ● You wait till episode’s end, and re-assign values to states. ● No need to even know the states, we sample from the environment. Temporal Difference (bias ↑): ● Update on the go. No need to even have goal states.
  • 12. Let’s play the RL vs SL game for (i=0; i<3; i++) { ● Pick a catawiki problem ● Should it be solved via ○ Reinforcement learning? ○ Supervised learning? }

Editor's Notes

  1. We expect, in general, that the environment will be nondeterministic; that is, that taking the same action in the same state on two different occasions may result in different next states and/or different reinforcement values. However, we assume the environment is stationary; that is, that the probabilities of making state transitions or receiving specific reinforcement signals do not change over time.
  2. Reinforcement learning differs from the more widely studied problem of supervised learning in several ways. The most important difference is that there is no presentation of input/output pairs. Instead, after choosing an action the agent is told the immediate reward and the subsequent state, but is not told which action would have been in its best long-term interests. It is necessary for the agent to gather useful experience about the possible system states, actions, transitions and rewards actively to act optimally. Another difference from supervised learning is that on-line performance is important: the evaluation of the system is often concurrent with learning. Use cases for RL: if there is path dependence (i.e. the order of your moves matter, like in chess), if you have a budget (e.g. max # emails to send, money), or if your decisions select your future training examples (e.g. (greedily) not bidding on new websites in programmatic advertising will never allow you acquire data about them). (via Peter Tegelaar)