What Can RL do.pptx

1
What can RL do?
백승언
03 Apr, 2023

2
 Introduction
 What is Reinforcement Learning(RL)?
 Problems that RL focuses on
 Control problem
 Multi-armed bandit
 Combinatorial optimization
 Cooperative behavior learning
 Competitive behavior learning
 Mixed behavior learning
 Learning from human experts
 Learning from human feedback
Contents

4
 Definition and objective of RL
 Type of machine learning technique that enables an agent to learn in an interactive environment by trial
and error using feedback from its action and experience
 Agent aim to maximize expected return(sum of rewards)
• 𝜋∗ = 𝑎𝑟𝑔𝑚𝑎𝑥𝜋 𝔼𝜏~𝜌𝜋 𝜏 Σ𝑡=0
∞
𝑟𝑡 𝑠𝑡, 𝑎𝑡 𝜋
What is reinforcement learning(RL)
 Components in RL
 Agent: The learner and decision-maker in RL
 Environment: The thing it interacts with, comprising everything
outside the agent
 Step: Atomic environmental interactions.
 Episode: Length of the simulation at the end of which the system
ends in a terminal state.
Data flow of the reinforcement learning

5
 Components in RL
 Action 𝑎𝑡: All the possible moves that the agent can exert
 State 𝑠𝑡: Current situation returned by the environment.
 Reward 𝑟𝑡: An immediate return sent back from the environment to evaluate the last action.
 Policy 𝜋𝜃: The strategy that the agent employs to determine the next action based on the current state.
• Policy 𝜋𝜃, parameterized with 𝜃 is a mapping from state space 𝕊 to action space 𝔸,
What is reinforcement learning(RL)
Data flow of the reinforcement learning

7
Control problem
 Description
 Control of the object in a specific environment
 RL can handle this problem about any level
• Perception  decision-making  control
– End-to-end control
• Decision-making  control
– Decision and control
• Only control
 Example problem
 In the robot-arm domain, end-to-end control
problems have been studied with RL
 In the autonomous vehicle domain, decision and
control problems have been studied with RL
Control problem Example problems

8
Multi-armed bandit
 Description
 Selection of the action in a specific set
 RL can handle this problem about any horizon
• Finite-horizon problem
• Infinite-horizon problem
 Example problem
 In the board game domain, the RL agent selects
the empty cell at every step-time
 In the recommender system domain, the RL
agent suggests the item to the user at every
trigger time
 In the computer science domain, the RL agent
assigns the job to the machine at every step-time
Multi-armed bandit problem Example problems

9
Combinatorial optimization
 Description
 Multiple selections of the action in a
specific set
 RL can handle this problem in one-step
 Example problem
 In the chip placement domain, the RL agent placement
semi-conductors in the empty wafer in just one-step
 In the routing problem domain, the RL agent calculates
the order of the driving route in just one-step
 In the math problem domain, the RL agent optimizes the
symbolic component or operation order
 In the chemistry domain, the RL agent optimizes the
reaction process
Combinatorial optimization problem Example problems

10
Cooperative behavior learning
 Description
 Control of the multi-objects in a specific
environment
 RL can handle this problem in any setting
• Individual reward problem
• Team reward problem
 Example problem
 In the communication domain, the RL agent
distributes the resource for achieving the team
goal
 In the game domain, the commander RL agent
controls the multiple units to achieve triumph
Cooperative behavior learning problem Example problems

11
Competitive behavior learning
 Description
environment
 RL can handle this problem in zero-sum game
setting
 Example problem
 In the game domain, the RL agent learns the
competitive behavior in various games such as
Chess, Go, StarCraft II, and so on
Competitive behavior learning problem Example problems

12
Mixed behavior learning
 Description
environment
 RL can handle this problem in general sum
game setting
• Cooperative behavior learning in the same group
• Competitive behavior learning between different
groups
 Example problem
 In the game domain, group battles have been
studied with RL
 In the autonomous vehicle domain, the RL agent
controls the multiple autonomous vehicles in
mixed autonomy
Mixed behavior learning problem Example problems

13
Learning from human experts
 Description
 Learning the agent from the demonstration
trajectories
 RL can handle the complex problem
through human experts
• Problem that has complex rules, such as Go
• Problem that faces complex scenarios such
as autonomous vehicle driving
 Example problem
 In the autonomous vehicle domain, the RL agent
controls the autonomous vehicle in complex scenarios
 In the finance domain, the RL agent determines the
buy/sell stocks in complex scenarios
 In the game domain, the RL agent, which is
constructed with a robust neural network such as the
transformer, could handle multiple games(DeepMind
GATO)
Learning from human experts problem Example problems

14
Learning from human feedback(preference)
 Description
 Learning the reward model of the agent from human
feedback(pos/neg), and then learning the policy of
the agent through the learned reward model
 RL can handle the humanistic problem through
human feedback
• NLP problems that require humanistic feedback
• Problem that faces complex scenarios such as solving
the cube, autonomous vehicle driving
 Example problem
 In the robotics domain, the RL agent could
be learned by human feedback to solve the
cube(Open AI DAGGER)
 In the NLP domain, the RL agent could be
learned by human feedback to involve human
values or preferences(Open AI ChatGPT)
• 혐오 발언 자제, 문맥의 자연스러움 등을 학습
Learning from human feedback problem Example problems

What Can RL do.pptx

Recommended

Recommended

More Related Content

Similar to What Can RL do.pptx

Similar to What Can RL do.pptx (20)

Recently uploaded

Recently uploaded (20)

What Can RL do.pptx

Editor's Notes