SlideShare a Scribd company logo
1 of 16
1
What can RL do?
백승언
03 Apr, 2023
2
 Introduction
 What is Reinforcement Learning(RL)?
 Problems that RL focuses on
 Control problem
 Multi-armed bandit
 Combinatorial optimization
 Cooperative behavior learning
 Competitive behavior learning
 Mixed behavior learning
 Learning from human experts
 Learning from human feedback
Contents
3
Introduction
4
 Definition and objective of RL
 Type of machine learning technique that enables an agent to learn in an interactive environment by trial
and error using feedback from its action and experience
 Agent aim to maximize expected return(sum of rewards)
• 𝜋∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝜋 𝔼𝜏~𝜌𝜋 𝜏 Σ𝑡=0
∞
𝑟𝑡 𝑠𝑡, 𝑎𝑡 𝜋
What is reinforcement learning(RL)
 Components in RL
 Agent: The learner and decision-maker in RL
 Environment: The thing it interacts with, comprising everything
outside the agent
 Step: Atomic environmental interactions.
 Episode: Length of the simulation at the end of which the system
ends in a terminal state.
Data flow of the reinforcement learning
5
 Components in RL
 Action 𝑎𝑡: All the possible moves that the agent can exert
 State 𝑠𝑡: Current situation returned by the environment.
 Reward 𝑟𝑡: An immediate return sent back from the environment to evaluate the last action.
 Policy 𝜋𝜃: The strategy that the agent employs to determine the next action based on the current state.
• Policy 𝜋𝜃, parameterized with 𝜃 is a mapping from state space 𝕊 to action space 𝔸,
What is reinforcement learning(RL)
Data flow of the reinforcement learning
6
Problems that RL focuses on
7
Control problem
 Description
 Control of the object in a specific environment
 RL can handle this problem about any level
• Perception  decision-making  control
– End-to-end control
• Decision-making  control
– Decision and control
• Only control
 Example problem
 In the robot-arm domain, end-to-end control
problems have been studied with RL
 In the autonomous vehicle domain, decision and
control problems have been studied with RL
Control problem Example problems
8
Multi-armed bandit
 Description
 Selection of the action in a specific set
 RL can handle this problem about any horizon
• Finite-horizon problem
• Infinite-horizon problem
 Example problem
 In the board game domain, the RL agent selects
the empty cell at every step-time
 In the recommender system domain, the RL
agent suggests the item to the user at every
trigger time
 In the computer science domain, the RL agent
assigns the job to the machine at every step-time
Multi-armed bandit problem Example problems
9
Combinatorial optimization
 Description
 Multiple selections of the action in a
specific set
 RL can handle this problem in one-step
 Example problem
 In the chip placement domain, the RL agent placement
semi-conductors in the empty wafer in just one-step
 In the routing problem domain, the RL agent calculates
the order of the driving route in just one-step
 In the math problem domain, the RL agent optimizes the
symbolic component or operation order
 In the chemistry domain, the RL agent optimizes the
reaction process
Combinatorial optimization problem Example problems
10
Cooperative behavior learning
 Description
 Control of the multi-objects in a specific
environment
 RL can handle this problem in any setting
• Individual reward problem
• Team reward problem
 Example problem
 In the communication domain, the RL agent
distributes the resource for achieving the team
goal
 In the game domain, the commander RL agent
controls the multiple units to achieve triumph
Cooperative behavior learning problem Example problems
11
Competitive behavior learning
 Description
 Control of the multi-objects in a specific
environment
 RL can handle this problem in zero-sum game
setting
 Example problem
 In the game domain, the RL agent learns the
competitive behavior in various games such as
Chess, Go, StarCraft II, and so on
Competitive behavior learning problem Example problems
12
Mixed behavior learning
 Description
 Control of the multi-objects in a specific
environment
 RL can handle this problem in general sum
game setting
• Cooperative behavior learning in the same group
• Competitive behavior learning between different
groups
 Example problem
 In the game domain, group battles have been
studied with RL
 In the autonomous vehicle domain, the RL agent
controls the multiple autonomous vehicles in
mixed autonomy
Mixed behavior learning problem Example problems
13
Learning from human experts
 Description
 Learning the agent from the demonstration
trajectories
 RL can handle the complex problem
through human experts
• Problem that has complex rules, such as Go
• Problem that faces complex scenarios such
as autonomous vehicle driving
 Example problem
 In the autonomous vehicle domain, the RL agent
controls the autonomous vehicle in complex scenarios
 In the finance domain, the RL agent determines the
buy/sell stocks in complex scenarios
 In the game domain, the RL agent, which is
constructed with a robust neural network such as the
transformer, could handle multiple games(DeepMind
GATO)
Learning from human experts problem Example problems
14
Learning from human feedback(preference)
 Description
 Learning the reward model of the agent from human
feedback(pos/neg), and then learning the policy of
the agent through the learned reward model
 RL can handle the humanistic problem through
human feedback
• NLP problems that require humanistic feedback
• Problem that faces complex scenarios such as solving
the cube, autonomous vehicle driving
 Example problem
 In the robotics domain, the RL agent could
be learned by human feedback to solve the
cube(Open AI DAGGER)
 In the NLP domain, the RL agent could be
learned by human feedback to involve human
values or preferences(Open AI ChatGPT)
• 혐오 발언 자제, 문맥의 자연스러움 등을 학습
Learning from human feedback problem Example problems
15
Thank you!
16
Q&A

More Related Content

Similar to What Can RL do.pptx

Shanghai deep learning meetup 4
Shanghai deep learning meetup 4Shanghai deep learning meetup 4
Shanghai deep learning meetup 4Xiaohu ZHU
 
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...Ashwin Rao
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?M Waleed Kadous
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017MLconf
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
Agents-and-Problem-Solving-20022024-094442am.pdf
Agents-and-Problem-Solving-20022024-094442am.pdfAgents-and-Problem-Solving-20022024-094442am.pdf
Agents-and-Problem-Solving-20022024-094442am.pdfsyedhasanali293
 
Online learning & adaptive game playing
Online learning & adaptive game playingOnline learning & adaptive game playing
Online learning & adaptive game playingSaeid Ghafouri
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithmJie-Han Chen
 
Hibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning AgentsHibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning Agentsbutest
 
Hierarchical Reinforcement Learning
Hierarchical Reinforcement LearningHierarchical Reinforcement Learning
Hierarchical Reinforcement Learningahmad bassiouny
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowDatabricks
 
Imitation Learning and Direct Perception for Autonomous Driving
Imitation Learning and Direct Perception for Autonomous DrivingImitation Learning and Direct Perception for Autonomous Driving
Imitation Learning and Direct Perception for Autonomous DrivingRocky Liang
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysYasutoTamura1
 
reinforcement learning in artificial intelligence
reinforcement learning in artificial intelligencereinforcement learning in artificial intelligence
reinforcement learning in artificial intelligencepanditadesh123
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.pptPOOJASHREEC1
 
Using FME for Topographical Data Generalization at Natural Resources Canada
Using FME for Topographical Data Generalization at Natural Resources CanadaUsing FME for Topographical Data Generalization at Natural Resources Canada
Using FME for Topographical Data Generalization at Natural Resources CanadaSafe Software
 

Similar to What Can RL do.pptx (20)

Shanghai deep learning meetup 4
Shanghai deep learning meetup 4Shanghai deep learning meetup 4
Shanghai deep learning meetup 4
 
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
Stanford CME 241 - Reinforcement Learning for Stochastic Control Problems in ...
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
 
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
Matineh Shaker, Artificial Intelligence Scientist, Bonsai at MLconf SF 2017
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Agents-and-Problem-Solving-20022024-094442am.pdf
Agents-and-Problem-Solving-20022024-094442am.pdfAgents-and-Problem-Solving-20022024-094442am.pdf
Agents-and-Problem-Solving-20022024-094442am.pdf
 
TransDreamer.pptx
TransDreamer.pptxTransDreamer.pptx
TransDreamer.pptx
 
Online learning & adaptive game playing
Online learning & adaptive game playingOnline learning & adaptive game playing
Online learning & adaptive game playing
 
Intro rl
Intro rlIntro rl
Intro rl
 
Actor critic algorithm
Actor critic algorithmActor critic algorithm
Actor critic algorithm
 
Hibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning AgentsHibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning Agents
 
Hierarchical Reinforcement Learning
Hierarchical Reinforcement LearningHierarchical Reinforcement Learning
Hierarchical Reinforcement Learning
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflowProductionizing Deep Reinforcement Learning with Spark and MLflow
Productionizing Deep Reinforcement Learning with Spark and MLflow
 
Imitation Learning and Direct Perception for Autonomous Driving
Imitation Learning and Direct Perception for Autonomous DrivingImitation Learning and Direct Perception for Autonomous Driving
Imitation Learning and Direct Perception for Autonomous Driving
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
reinforcement learning in artificial intelligence
reinforcement learning in artificial intelligencereinforcement learning in artificial intelligence
reinforcement learning in artificial intelligence
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
 
Using FME for Topographical Data Generalization at Natural Resources Canada
Using FME for Topographical Data Generalization at Natural Resources CanadaUsing FME for Topographical Data Generalization at Natural Resources Canada
Using FME for Topographical Data Generalization at Natural Resources Canada
 

Recently uploaded

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

What Can RL do.pptx

  • 1. 1 What can RL do? 백승언 03 Apr, 2023
  • 2. 2  Introduction  What is Reinforcement Learning(RL)?  Problems that RL focuses on  Control problem  Multi-armed bandit  Combinatorial optimization  Cooperative behavior learning  Competitive behavior learning  Mixed behavior learning  Learning from human experts  Learning from human feedback Contents
  • 4. 4  Definition and objective of RL  Type of machine learning technique that enables an agent to learn in an interactive environment by trial and error using feedback from its action and experience  Agent aim to maximize expected return(sum of rewards) • 𝜋∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝜋 𝔼𝜏~𝜌𝜋 𝜏 Σ𝑡=0 ∞ 𝑟𝑡 𝑠𝑡, 𝑎𝑡 𝜋 What is reinforcement learning(RL)  Components in RL  Agent: The learner and decision-maker in RL  Environment: The thing it interacts with, comprising everything outside the agent  Step: Atomic environmental interactions.  Episode: Length of the simulation at the end of which the system ends in a terminal state. Data flow of the reinforcement learning
  • 5. 5  Components in RL  Action 𝑎𝑡: All the possible moves that the agent can exert  State 𝑠𝑡: Current situation returned by the environment.  Reward 𝑟𝑡: An immediate return sent back from the environment to evaluate the last action.  Policy 𝜋𝜃: The strategy that the agent employs to determine the next action based on the current state. • Policy 𝜋𝜃, parameterized with 𝜃 is a mapping from state space 𝕊 to action space 𝔸, What is reinforcement learning(RL) Data flow of the reinforcement learning
  • 6. 6 Problems that RL focuses on
  • 7. 7 Control problem  Description  Control of the object in a specific environment  RL can handle this problem about any level • Perception  decision-making  control – End-to-end control • Decision-making  control – Decision and control • Only control  Example problem  In the robot-arm domain, end-to-end control problems have been studied with RL  In the autonomous vehicle domain, decision and control problems have been studied with RL Control problem Example problems
  • 8. 8 Multi-armed bandit  Description  Selection of the action in a specific set  RL can handle this problem about any horizon • Finite-horizon problem • Infinite-horizon problem  Example problem  In the board game domain, the RL agent selects the empty cell at every step-time  In the recommender system domain, the RL agent suggests the item to the user at every trigger time  In the computer science domain, the RL agent assigns the job to the machine at every step-time Multi-armed bandit problem Example problems
  • 9. 9 Combinatorial optimization  Description  Multiple selections of the action in a specific set  RL can handle this problem in one-step  Example problem  In the chip placement domain, the RL agent placement semi-conductors in the empty wafer in just one-step  In the routing problem domain, the RL agent calculates the order of the driving route in just one-step  In the math problem domain, the RL agent optimizes the symbolic component or operation order  In the chemistry domain, the RL agent optimizes the reaction process Combinatorial optimization problem Example problems
  • 10. 10 Cooperative behavior learning  Description  Control of the multi-objects in a specific environment  RL can handle this problem in any setting • Individual reward problem • Team reward problem  Example problem  In the communication domain, the RL agent distributes the resource for achieving the team goal  In the game domain, the commander RL agent controls the multiple units to achieve triumph Cooperative behavior learning problem Example problems
  • 11. 11 Competitive behavior learning  Description  Control of the multi-objects in a specific environment  RL can handle this problem in zero-sum game setting  Example problem  In the game domain, the RL agent learns the competitive behavior in various games such as Chess, Go, StarCraft II, and so on Competitive behavior learning problem Example problems
  • 12. 12 Mixed behavior learning  Description  Control of the multi-objects in a specific environment  RL can handle this problem in general sum game setting • Cooperative behavior learning in the same group • Competitive behavior learning between different groups  Example problem  In the game domain, group battles have been studied with RL  In the autonomous vehicle domain, the RL agent controls the multiple autonomous vehicles in mixed autonomy Mixed behavior learning problem Example problems
  • 13. 13 Learning from human experts  Description  Learning the agent from the demonstration trajectories  RL can handle the complex problem through human experts • Problem that has complex rules, such as Go • Problem that faces complex scenarios such as autonomous vehicle driving  Example problem  In the autonomous vehicle domain, the RL agent controls the autonomous vehicle in complex scenarios  In the finance domain, the RL agent determines the buy/sell stocks in complex scenarios  In the game domain, the RL agent, which is constructed with a robust neural network such as the transformer, could handle multiple games(DeepMind GATO) Learning from human experts problem Example problems
  • 14. 14 Learning from human feedback(preference)  Description  Learning the reward model of the agent from human feedback(pos/neg), and then learning the policy of the agent through the learned reward model  RL can handle the humanistic problem through human feedback • NLP problems that require humanistic feedback • Problem that faces complex scenarios such as solving the cube, autonomous vehicle driving  Example problem  In the robotics domain, the RL agent could be learned by human feedback to solve the cube(Open AI DAGGER)  In the NLP domain, the RL agent could be learned by human feedback to involve human values or preferences(Open AI ChatGPT) • 혐오 발언 자제, 문맥의 자연스러움 등을 학습 Learning from human feedback problem Example problems

Editor's Notes

  1. 이제 본격적인 제가 선정한 논문의 알고리즘에 대해서 발표를 시작하겠습니다.
  2. 여기까지가 제가 준비한 gSDE의 발표였습니다. 제 발표를 들어 주셔서 감사합니다!
  3. 혹시 궁금하신 사항 있으시면 자유롭게 질문 부탁 드립니다.