Agenda
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
What is reinforcement learning?
Reward Action
What is reinforcement learning?
Reward Action
What is reinforcement learning?
action-reward feedback
loop of a generic RL
model
What is reinforcement learning?
Reinforcement learning is a branch
of machine learning that relies on
learning through the mechanism of
rewards and punishments.
Policy
How does Agent decide which action to take?
Policy determines a probability that Agent will do Action At when in State St
Policy: π(a|s)
Goal == maximize total reward
𝜸 == discount factor
Determines how much is a reward
in distant future is less important
that reward in near future
Gt (Return)
total reward in the future
Learning is done in discrete steps
Rk == reward in step k
The number of steps can be
fixed (T) or infinite (∞)
Reinforcement learning in the the world of AI
Artificial Intelligence
Machine Learning
… …
Supervised learning
Unsupervised learning
Reinforcement learning
Reinforcement learning in the the world of ML
Supervised learning vs reinforcement learning
- Supervised learning relies on labeled data set
Unsupervised learning vs reinforcement learning
- Unsupervised learning == training based on unlabeled data
== finding patterns in
data
- Reinforcement learning == learning through the mechanism of
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
Robotics
RL is used for building robust robots
Industrial robots for more complex applications
Sophisticated grasping strategies, object manipulation techniques, and
enhance hand-eye coordination
RL can be used to teach a robot to walk on 2 or 4 legs
RL can be used to teach a robot to walk on two/four legs
https://www.freethink.com/hard-tech/robot-legs https://bostondynamics.com/blog/starting-
on-the-right-foot-with-reinforcement-learning
https://youtu.be/goxCjGPQH7U
Gaming
RL can be used for testing games
RL can perform many iterations
without human input
Reinforcement learning and Atari games
Deep Q Learning was used to teach AI how to play Atari 2600 games
Reinforcement learning and Atari games
AI system did not get a domain knowledge how to play games (rules)
System only sees pixels and was instructed to maximize points
Implemented for many Atari 2600 games: Pong, Breakout …
In 2013. Deepmind has published „Playing Atari with Deep Reinforcement
Learning (Mnih et. al)”: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
Reinforcement learning and Atari games
Game: Breakout
After 240 minutes RL system has learned the
best strategy:
Create a tunnel, and send ball above the blocks
-> The ball bounces between roof and blocks
„The implications go far beyond my
beloved chessboard... Not only do these
self-taught expert machines perform
incredibly well, but we can actually learn
from the new knowledge they produce.”
Garry Kasparov
former world chess champion
AlphaGo
Presented in 2015. by Google
DeepMind (https://deepmind.google)
The first program that won a match
against world champion in Go
- Chinese strategy board game
- Bigger challenge than chess
AlphaZero
2017 AlphaZero == a single AI system that is an expert in:
Go
Chess
Shogi (Japanese chess)
https://deepmind.google/discover/blog/alphazero-shedding-new-light-on-
chess-shogi-and-go
Healthcare
Reinforcement learning is applied to:
- Development of the new drugs
- Diagnostics
- Dynamic treatment regimes (DTRs)
- Surgery
- …
Trading and Finance
Reinforcement learning achieves better
results than supervised learning when
applied to trading and finance
IBM has developed a sophisticated RL-
based platform that has ability to make
financial trades
Autonomous driving
RL can be used for:
Trajectory optimization
Avoiding collision
Lane changing
Automatic parking
…
More info: https://wayve.ai | https://youtu.be/eRwTbRtnT1I
And other areas …
Cooling of data center (Google has reduced energy usage by 40%)
News recommendation
Marketing
…
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
Advantages of Reinforcement Learning
✅RL can solve complex problems that cannot be solved using other
methods.
✅It functions in dynamic environments
✅RL does not need a separate step of preparing data
Difference between RL and supervised learning
✅It can be used when the only way to collect data from an environment is
for an agent to interact with that environment
…
Disadvantages of Reinforcement Learning
⚠ Sparse-reward environment - an agent receives a reward only when the
goal is reached
Harder to known which steps were actually useful
Popular solution == reward shaping -> adding additional hand-crafted
rewards to help RL
Hand-crafted additional awards require human expert to design them
correctly, and additionally humans can be bias
Disadvantages of Reinforcement Learning
⚠ RL needs to collect a lot of data from environment, and it needs a lot of
calculations (data hungry)
Not a problem when RL is applied to gaming because it can play the
same game many times and collect a lot of data.
⚠ It can be expensive to learn by trying (and failing)
For example: in robotics where robots are expensive and can get
damaged when used (for learning)
Solution to the disadvantages - general advice
Combine RL with other techniques
For example:
RL + Deep Learning
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
RL Algorithms
Source: https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html
Q-Learning Algorithm
Most famous RL algorithm
“Q” in “Q-Learning” stands for quality
Example (Python):
https://www.datacamp.com/tutorial/introduction-q-learning-beginner-
tutorial
Q-Table
Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python
Q-Learning Algorithm
Source: https://www.cse.unsw.edu.au/~cs9417ml/RL1/algorithms.html
Deep Q-Learning Algorithm
Deep neural network instead of „simple” Q-Table
Used in case of large environments
Example (Python):
https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-
learning-python
Deep Q-Learning Algorithm
Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-
python
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
API for reinforcement learning
Python
One Agent is used
Different environments
https://gymnasium.farama.org
Key points
Reinforcement learning is a branch of machine learning where
agent learns about its environment using the mechanism of rewards and
punishments.
RL doesn’t rely on labeled data set.
RL learns by trial-and-error through interacting with its environment so it
can come to conclusions / knowledge that humans didn’t reach.
@MarkoLohert

Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024

  • 2.
    Agenda What is reinforcement learning? Where is RLused? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 3.
    What is reinforcement learning? Where is RLused? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 4.
    What is reinforcementlearning? Reward Action
  • 5.
    What is reinforcementlearning? Reward Action
  • 6.
    What is reinforcementlearning? action-reward feedback loop of a generic RL model
  • 7.
    What is reinforcementlearning? Reinforcement learning is a branch of machine learning that relies on learning through the mechanism of rewards and punishments.
  • 8.
    Policy How does Agentdecide which action to take? Policy determines a probability that Agent will do Action At when in State St Policy: π(a|s)
  • 9.
    Goal == maximizetotal reward 𝜸 == discount factor Determines how much is a reward in distant future is less important that reward in near future Gt (Return) total reward in the future Learning is done in discrete steps Rk == reward in step k The number of steps can be fixed (T) or infinite (∞)
  • 10.
    Reinforcement learning inthe the world of AI Artificial Intelligence Machine Learning … … Supervised learning Unsupervised learning Reinforcement learning
  • 11.
    Reinforcement learning inthe the world of ML Supervised learning vs reinforcement learning - Supervised learning relies on labeled data set Unsupervised learning vs reinforcement learning - Unsupervised learning == training based on unlabeled data == finding patterns in data - Reinforcement learning == learning through the mechanism of
  • 12.
    What is reinforcement learning? Where is RLused? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 13.
    Robotics RL is usedfor building robust robots Industrial robots for more complex applications Sophisticated grasping strategies, object manipulation techniques, and enhance hand-eye coordination RL can be used to teach a robot to walk on 2 or 4 legs
  • 14.
    RL can beused to teach a robot to walk on two/four legs https://www.freethink.com/hard-tech/robot-legs https://bostondynamics.com/blog/starting- on-the-right-foot-with-reinforcement-learning https://youtu.be/goxCjGPQH7U
  • 15.
    Gaming RL can beused for testing games RL can perform many iterations without human input
  • 16.
    Reinforcement learning andAtari games Deep Q Learning was used to teach AI how to play Atari 2600 games
  • 17.
    Reinforcement learning andAtari games AI system did not get a domain knowledge how to play games (rules) System only sees pixels and was instructed to maximize points Implemented for many Atari 2600 games: Pong, Breakout … In 2013. Deepmind has published „Playing Atari with Deep Reinforcement Learning (Mnih et. al)”: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
  • 18.
    Reinforcement learning andAtari games Game: Breakout After 240 minutes RL system has learned the best strategy: Create a tunnel, and send ball above the blocks -> The ball bounces between roof and blocks
  • 19.
    „The implications gofar beyond my beloved chessboard... Not only do these self-taught expert machines perform incredibly well, but we can actually learn from the new knowledge they produce.” Garry Kasparov former world chess champion
  • 20.
    AlphaGo Presented in 2015.by Google DeepMind (https://deepmind.google) The first program that won a match against world champion in Go - Chinese strategy board game - Bigger challenge than chess
  • 21.
    AlphaZero 2017 AlphaZero ==a single AI system that is an expert in: Go Chess Shogi (Japanese chess) https://deepmind.google/discover/blog/alphazero-shedding-new-light-on- chess-shogi-and-go
  • 22.
    Healthcare Reinforcement learning isapplied to: - Development of the new drugs - Diagnostics - Dynamic treatment regimes (DTRs) - Surgery - …
  • 23.
    Trading and Finance Reinforcementlearning achieves better results than supervised learning when applied to trading and finance IBM has developed a sophisticated RL- based platform that has ability to make financial trades
  • 24.
    Autonomous driving RL canbe used for: Trajectory optimization Avoiding collision Lane changing Automatic parking …
  • 25.
    More info: https://wayve.ai| https://youtu.be/eRwTbRtnT1I
  • 26.
    And other areas… Cooling of data center (Google has reduced energy usage by 40%) News recommendation Marketing …
  • 27.
    What is reinforcement learning? Where is RLused? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 28.
    Advantages of ReinforcementLearning ✅RL can solve complex problems that cannot be solved using other methods. ✅It functions in dynamic environments ✅RL does not need a separate step of preparing data Difference between RL and supervised learning ✅It can be used when the only way to collect data from an environment is for an agent to interact with that environment …
  • 29.
    Disadvantages of ReinforcementLearning ⚠ Sparse-reward environment - an agent receives a reward only when the goal is reached Harder to known which steps were actually useful Popular solution == reward shaping -> adding additional hand-crafted rewards to help RL Hand-crafted additional awards require human expert to design them correctly, and additionally humans can be bias
  • 30.
    Disadvantages of ReinforcementLearning ⚠ RL needs to collect a lot of data from environment, and it needs a lot of calculations (data hungry) Not a problem when RL is applied to gaming because it can play the same game many times and collect a lot of data. ⚠ It can be expensive to learn by trying (and failing) For example: in robotics where robots are expensive and can get damaged when used (for learning)
  • 31.
    Solution to thedisadvantages - general advice Combine RL with other techniques For example: RL + Deep Learning
  • 32.
    What is reinforcement learning? Where is RLused? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 33.
  • 34.
    Q-Learning Algorithm Most famousRL algorithm “Q” in “Q-Learning” stands for quality Example (Python): https://www.datacamp.com/tutorial/introduction-q-learning-beginner- tutorial
  • 35.
  • 36.
  • 37.
    Deep Q-Learning Algorithm Deepneural network instead of „simple” Q-Table Used in case of large environments Example (Python): https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q- learning-python
  • 38.
    Deep Q-Learning Algorithm Source:www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning- python
  • 39.
    What is reinforcement learning? Where is RLused? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 40.
    API for reinforcementlearning Python One Agent is used Different environments https://gymnasium.farama.org
  • 41.
    Key points Reinforcement learningis a branch of machine learning where agent learns about its environment using the mechanism of rewards and punishments. RL doesn’t rely on labeled data set. RL learns by trial-and-error through interacting with its environment so it can come to conclusions / knowledge that humans didn’t reach.
  • 42.

Editor's Notes

  • #31 RL achieves excellent results when applied to complex problems
  • #35 https://youtu.be/Lu56xVlZ40M?si=DtUTUBi8-hpdFzhQ