SlideShare a Scribd company logo
1 of 15
Download to read offline
Chapter 1: Introduction
Seungjae Ryan Lee
● Learning by interacting with the environment
● Goal: maximize a numerical reward signal by choosing correct actions
○ Trial and error: learner is not told the best action
○ Delayed rewards: actions can affect all future rewards
Reinforcement Learning
● No external supervisor / teacher
○ No training set with labeled examples (answers)
○ Need to interact with environment in uncharted territories
● Different goals
○ Supervised Learning: Generalize existing data to minimize test set error
○ Reinforcement Learning: Maximize reward through interactions
○ Unsupervised Learning: Find hidden structure
→ Reinforcement Learning is a new paradigm of Machine Learning
vs. Supervised and Unsupervised Learning
● Interactions between agent and environment
● Uncertainty about the environment
○ Effects of actions cannot be fully predicted
○ Monitor environment and react appropriately
● Defined goal
○ Judge progress through rewards
● Present affects the future
○ Effect can be delayed
● Experience improves performance
Characteristics of Reinforcement Learning
● Complex sequence of interactions to achieve goal
● Need to observe and react to the uncertainty of the environment
○ Grab different bowl if current bowl is dirty
○ Stop pouring if the bowl is about to overflow
● Actions have delayed consequences
○ Failing to get spoon does not matter until you start eating
● Experience improves performance
Example: Preparing Breakfast
Get
cereal
Get
bowl
Get
spoon
Get
milk
Pour
cereal
Pour
milk
Grab
spoon
● Exploration: Try different actions
● Exploitation: Choose best known action
● Need both to obtain high reward
Exploration vs Exploitation
Try a new cereal? Eat the usual cereal?Agent
● Policy defines the agent’s behavior
● Reward Signal defines the goal of the problem
● Value Function indicates the long-term desirability of state
● Model of the environment mimics behavior of environment
Elements of Reinforcement Learning
● Mapping from observation to action
● Defines the agent’s behavior
● Can be stochastic
Policy
s1
s2
s3
s4
a1
a2
S A
● Reward
○ Immediate reward of action
○ Defines good/bad events for the agent
○ Given by the environment
● Value Function
○ Sum of future rewards from a state
○ Long-term desirability of states
○ Difficult to estimate
○ Primary basis of choosing action
Reward Signal vs. Value Function
● Mimics the behavior of environment
● Allow planning a future course of actions
● Not necessary for all RL methods
○ Model-based methods use the model for planning
○ Model-free methods only use trial-and-error
Model
https://worldmodels.github.io/
● Assume imperfect opponent
● Agent needs to find and exploit imperfections
Example: Tic Tac Toe
Tic Tac Toe with Reinforcement Learning
● Initialize value functions to 0.5 (except terminal states)
● Learn by playing games
○ Move greedily most times, but explore sometimes
● Incrementally update value functions by playing games
● Decrease learning rate over time to converge
● Minimax algorithm
○ Assumes best play for opponent → Cannot exploit opponent
● Classic optimization
○ Require complete specification of opponent → Impractical
○ ex. Dynamic Programming
● Evolutionary methods
○ Finds optimal algorithm
○ Ignores useful structure of RL problems
○ Works best when good policy can be found easily
Tic Tac Toe with other algorithms
● Can be applied to:
○ more complex games (ex. Backgammon)
○ problems without enemies (“games against nature”)
○ problems with partially observable environments
○ non-episodic problems
○ continuous-time problems
Reinforcement Learning beyond Tic-Tac-Toe
https://blog.openai.com/evolution-strategies/
Thank you!
Original content from
● Reinforcement Learning: An Introduction by Sutton and Barto
You can find more content in
● github.com/seungjaeryanlee
● www.endtoend.ai

More Related Content

What's hot

Reinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step BootstrappingReinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step BootstrappingSeung Jae Lee
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed banditJie-Han Chen
 
Reinforcement Learning 10. On-policy Control with Approximation
Reinforcement Learning 10. On-policy Control with ApproximationReinforcement Learning 10. On-policy Control with Approximation
Reinforcement Learning 10. On-policy Control with ApproximationSeung Jae Lee
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesThomas da Silva Paula
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introductionConnorShorten2
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement LearningUsman Qayyum
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learningbutest
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialOmar Enayet
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSalem-Kabbani
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)Dong Guo
 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningSeolhokim
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed BanditsDongmin Lee
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
Multi-agent systems
Multi-agent systemsMulti-agent systems
Multi-agent systemsR A Akerkar
 

What's hot (20)

Reinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step BootstrappingReinforcement Learning 7. n-step Bootstrapping
Reinforcement Learning 7. n-step Bootstrapping
 
Multi armed bandit
Multi armed banditMulti armed bandit
Multi armed bandit
 
Reinforcement Learning 10. On-policy Control with Approximation
Reinforcement Learning 10. On-policy Control with ApproximationReinforcement Learning 10. On-policy Control with Approximation
Reinforcement Learning 10. On-policy Control with Approximation
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
DQN (Deep Q-Network)
DQN (Deep Q-Network)DQN (Deep Q-Network)
DQN (Deep Q-Network)
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
Multi-Agent Reinforcement Learning
Multi-Agent Reinforcement LearningMulti-Agent Reinforcement Learning
Multi-Agent Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Multi-armed Bandits
Multi-armed BanditsMulti-armed Bandits
Multi-armed Bandits
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Multi-agent systems
Multi-agent systemsMulti-agent systems
Multi-agent systems
 

Similar to Reinforcement Learning 1. Introduction

Structured prediction with reinforcement learning
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learningguruprasad110
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxMohibKhan79
 
Big Data and algorithms
Big Data and algorithmsBig Data and algorithms
Big Data and algorithmsmichele minno
 
Aleksej Šipulia - Retrospective – heart of scrum
Aleksej Šipulia - Retrospective – heart of scrumAleksej Šipulia - Retrospective – heart of scrum
Aleksej Šipulia - Retrospective – heart of scrumAgile Lietuva
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
 
Discover What Your Users Really Think (and what they really do)
Discover What Your Users Really Think (and what they really do)Discover What Your Users Really Think (and what they really do)
Discover What Your Users Really Think (and what they really do)Kristin Valentine
 
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingDonal Byrne
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...
Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...
Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...mpavi257
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learningJie-Han Chen
 
Language Understanding for Text-based Games using Deep Reinforcement Learning
Language Understanding for Text-based Games using Deep Reinforcement LearningLanguage Understanding for Text-based Games using Deep Reinforcement Learning
Language Understanding for Text-based Games using Deep Reinforcement LearningMark Chang
 
Reinforcement learning in Machine learning
 Reinforcement learning in Machine learning Reinforcement learning in Machine learning
Reinforcement learning in Machine learningMegha Sharma
 
Formative Assessment: Tasks and Feedback
Formative Assessment: Tasks and Feedback Formative Assessment: Tasks and Feedback
Formative Assessment: Tasks and Feedback Cassandra Gustafson
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDAmmar Rashed
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningYigit UNALLAR
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learningSKS
 
Expert Secrets For (Im)proving Learning Impact: Assessment to Analytics
Expert Secrets For (Im)proving Learning Impact: Assessment to AnalyticsExpert Secrets For (Im)proving Learning Impact: Assessment to Analytics
Expert Secrets For (Im)proving Learning Impact: Assessment to AnalyticsLambda Solutions
 
Intelligent AGent class.pptx
Intelligent AGent class.pptxIntelligent AGent class.pptx
Intelligent AGent class.pptxAdJamesJohn
 

Similar to Reinforcement Learning 1. Introduction (20)

Structured prediction with reinforcement learning
Structured prediction with reinforcement learningStructured prediction with reinforcement learning
Structured prediction with reinforcement learning
 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptx
 
Big Data and algorithms
Big Data and algorithmsBig Data and algorithms
Big Data and algorithms
 
Aleksej Šipulia - Retrospective – heart of scrum
Aleksej Šipulia - Retrospective – heart of scrumAleksej Šipulia - Retrospective – heart of scrum
Aleksej Šipulia - Retrospective – heart of scrum
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
Discover What Your Users Really Think (and what they really do)
Discover What Your Users Really Think (and what they really do)Discover What Your Users Really Think (and what they really do)
Discover What Your Users Really Think (and what they really do)
 
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous Driving
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...
Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...
Evaluating the training programs PPT: The Four levels ( Kirkpatrick's Four-Le...
 
An introduction to reinforcement learning
An introduction to  reinforcement learningAn introduction to  reinforcement learning
An introduction to reinforcement learning
 
Language Understanding for Text-based Games using Deep Reinforcement Learning
Language Understanding for Text-based Games using Deep Reinforcement LearningLanguage Understanding for Text-based Games using Deep Reinforcement Learning
Language Understanding for Text-based Games using Deep Reinforcement Learning
 
Reinforcement learning in Machine learning
 Reinforcement learning in Machine learning Reinforcement learning in Machine learning
Reinforcement learning in Machine learning
 
Formative Assessment: Tasks and Feedback
Formative Assessment: Tasks and Feedback Formative Assessment: Tasks and Feedback
Formative Assessment: Tasks and Feedback
 
Deep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfDDeep Q-learning from Demonstrations DQfD
Deep Q-learning from Demonstrations DQfD
 
CS3013 -MACHINE LEARNING.pptx
CS3013 -MACHINE LEARNING.pptxCS3013 -MACHINE LEARNING.pptx
CS3013 -MACHINE LEARNING.pptx
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learning
 
Expert Secrets For (Im)proving Learning Impact: Assessment to Analytics
Expert Secrets For (Im)proving Learning Impact: Assessment to AnalyticsExpert Secrets For (Im)proving Learning Impact: Assessment to Analytics
Expert Secrets For (Im)proving Learning Impact: Assessment to Analytics
 
Intelligent AGent class.pptx
Intelligent AGent class.pptxIntelligent AGent class.pptx
Intelligent AGent class.pptx
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 

Reinforcement Learning 1. Introduction

  • 2. ● Learning by interacting with the environment ● Goal: maximize a numerical reward signal by choosing correct actions ○ Trial and error: learner is not told the best action ○ Delayed rewards: actions can affect all future rewards Reinforcement Learning
  • 3. ● No external supervisor / teacher ○ No training set with labeled examples (answers) ○ Need to interact with environment in uncharted territories ● Different goals ○ Supervised Learning: Generalize existing data to minimize test set error ○ Reinforcement Learning: Maximize reward through interactions ○ Unsupervised Learning: Find hidden structure → Reinforcement Learning is a new paradigm of Machine Learning vs. Supervised and Unsupervised Learning
  • 4. ● Interactions between agent and environment ● Uncertainty about the environment ○ Effects of actions cannot be fully predicted ○ Monitor environment and react appropriately ● Defined goal ○ Judge progress through rewards ● Present affects the future ○ Effect can be delayed ● Experience improves performance Characteristics of Reinforcement Learning
  • 5. ● Complex sequence of interactions to achieve goal ● Need to observe and react to the uncertainty of the environment ○ Grab different bowl if current bowl is dirty ○ Stop pouring if the bowl is about to overflow ● Actions have delayed consequences ○ Failing to get spoon does not matter until you start eating ● Experience improves performance Example: Preparing Breakfast Get cereal Get bowl Get spoon Get milk Pour cereal Pour milk Grab spoon
  • 6. ● Exploration: Try different actions ● Exploitation: Choose best known action ● Need both to obtain high reward Exploration vs Exploitation Try a new cereal? Eat the usual cereal?Agent
  • 7. ● Policy defines the agent’s behavior ● Reward Signal defines the goal of the problem ● Value Function indicates the long-term desirability of state ● Model of the environment mimics behavior of environment Elements of Reinforcement Learning
  • 8. ● Mapping from observation to action ● Defines the agent’s behavior ● Can be stochastic Policy s1 s2 s3 s4 a1 a2 S A
  • 9. ● Reward ○ Immediate reward of action ○ Defines good/bad events for the agent ○ Given by the environment ● Value Function ○ Sum of future rewards from a state ○ Long-term desirability of states ○ Difficult to estimate ○ Primary basis of choosing action Reward Signal vs. Value Function
  • 10. ● Mimics the behavior of environment ● Allow planning a future course of actions ● Not necessary for all RL methods ○ Model-based methods use the model for planning ○ Model-free methods only use trial-and-error Model https://worldmodels.github.io/
  • 11. ● Assume imperfect opponent ● Agent needs to find and exploit imperfections Example: Tic Tac Toe
  • 12. Tic Tac Toe with Reinforcement Learning ● Initialize value functions to 0.5 (except terminal states) ● Learn by playing games ○ Move greedily most times, but explore sometimes ● Incrementally update value functions by playing games ● Decrease learning rate over time to converge
  • 13. ● Minimax algorithm ○ Assumes best play for opponent → Cannot exploit opponent ● Classic optimization ○ Require complete specification of opponent → Impractical ○ ex. Dynamic Programming ● Evolutionary methods ○ Finds optimal algorithm ○ Ignores useful structure of RL problems ○ Works best when good policy can be found easily Tic Tac Toe with other algorithms
  • 14. ● Can be applied to: ○ more complex games (ex. Backgammon) ○ problems without enemies (“games against nature”) ○ problems with partially observable environments ○ non-episodic problems ○ continuous-time problems Reinforcement Learning beyond Tic-Tac-Toe https://blog.openai.com/evolution-strategies/
  • 15. Thank you! Original content from ● Reinforcement Learning: An Introduction by Sutton and Barto You can find more content in ● github.com/seungjaeryanlee ● www.endtoend.ai