SlideShare a Scribd company logo
Agenda
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
What is reinforcement learning?
Reward Action
What is reinforcement learning?
Reward Action
What is reinforcement learning?
action-reward feedback
loop of a generic RL
model
What is reinforcement learning?
Reinforcement learning is a branch
of machine learning that relies on
learning through the mechanism of
rewards and punishments.
Policy
How does Agent decide which action to take?
Policy determines a probability that Agent will do Action At when in State St
Policy: π(a|s)
Goal == maximize total reward
𝜸 == discount factor
Determines how much is a reward
in distant future is less important
that reward in near future
Gt (Return)
total reward in the future
Learning is done in discrete steps
Rk == reward in step k
The number of steps can be
fixed (T) or infinite (∞)
Reinforcement learning in the the world of AI
Artificial Intelligence
Machine Learning
… …
Supervised learning
Unsupervised learning
Reinforcement learning
Reinforcement learning in the the world of ML
Supervised learning vs reinforcement learning
- Supervised learning relies on labeled data set
Unsupervised learning vs reinforcement learning
- Unsupervised learning == training based on unlabeled data
== finding patterns in
data
- Reinforcement learning == learning through the mechanism of
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
Robotics
RL is used for building robust robots
Industrial robots for more complex applications
Sophisticated grasping strategies, object manipulation techniques, and
enhance hand-eye coordination
RL can be used to teach a robot to walk on 2 or 4 legs
RL can be used to teach a robot to walk on two/four legs
https://www.freethink.com/hard-tech/robot-legs https://bostondynamics.com/blog/starting-
on-the-right-foot-with-reinforcement-learning
https://youtu.be/goxCjGPQH7U
Gaming
RL can be used for testing games
RL can perform many iterations
without human input
Reinforcement learning and Atari games
Deep Q Learning was used to teach AI how to play Atari 2600 games
Reinforcement learning and Atari games
AI system did not get a domain knowledge how to play games (rules)
System only sees pixels and was instructed to maximize points
Implemented for many Atari 2600 games: Pong, Breakout …
In 2013. Deepmind has published „Playing Atari with Deep Reinforcement
Learning (Mnih et. al)”: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
Reinforcement learning and Atari games
Game: Breakout
After 240 minutes RL system has learned the
best strategy:
Create a tunnel, and send ball above the blocks
-> The ball bounces between roof and blocks
„The implications go far beyond my
beloved chessboard... Not only do these
self-taught expert machines perform
incredibly well, but we can actually learn
from the new knowledge they produce.”
Garry Kasparov
former world chess champion
AlphaGo
Presented in 2015. by Google
DeepMind (https://deepmind.google)
The first program that won a match
against world champion in Go
- Chinese strategy board game
- Bigger challenge than chess
AlphaZero
2017 AlphaZero == a single AI system that is an expert in:
Go
Chess
Shogi (Japanese chess)
https://deepmind.google/discover/blog/alphazero-shedding-new-light-on-
chess-shogi-and-go
Healthcare
Reinforcement learning is applied to:
- Development of the new drugs
- Diagnostics
- Dynamic treatment regimes (DTRs)
- Surgery
- …
Trading and Finance
Reinforcement learning achieves better
results than supervised learning when
applied to trading and finance
IBM has developed a sophisticated RL-
based platform that has ability to make
financial trades
Autonomous driving
RL can be used for:
Trajectory optimization
Avoiding collision
Lane changing
Automatic parking
…
More info: https://wayve.ai | https://youtu.be/eRwTbRtnT1I
And other areas …
Cooling of data center (Google has reduced energy usage by 40%)
News recommendation
Marketing
…
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
Advantages of Reinforcement Learning
✅RL can solve complex problems that cannot be solved using other
methods.
✅It functions in dynamic environments
✅RL does not need a separate step of preparing data
Difference between RL and supervised learning
✅It can be used when the only way to collect data from an environment is
for an agent to interact with that environment
…
Disadvantages of Reinforcement Learning
⚠ Sparse-reward environment - an agent receives a reward only when the
goal is reached
Harder to known which steps were actually useful
Popular solution == reward shaping -> adding additional hand-crafted
rewards to help RL
Hand-crafted additional awards require human expert to design them
correctly, and additionally humans can be bias
Disadvantages of Reinforcement Learning
⚠ RL needs to collect a lot of data from environment, and it needs a lot of
calculations (data hungry)
Not a problem when RL is applied to gaming because it can play the
same game many times and collect a lot of data.
⚠ It can be expensive to learn by trying (and failing)
For example: in robotics where robots are expensive and can get
damaged when used (for learning)
Solution to the disadvantages - general advice
Combine RL with other techniques
For example:
RL + Deep Learning
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
RL Algorithms
Source: https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html
Q-Learning Algorithm
Most famous RL algorithm
“Q” in “Q-Learning” stands for quality
Example (Python):
https://www.datacamp.com/tutorial/introduction-q-learning-beginner-
tutorial
Q-Table
Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python
Q-Learning Algorithm
Source: https://www.cse.unsw.edu.au/~cs9417ml/RL1/algorithms.html
Deep Q-Learning Algorithm
Deep neural network instead of „simple” Q-Table
Used in case of large environments
Example (Python):
https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-
learning-python
Deep Q-Learning Algorithm
Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-
python
What is
reinforcement
learning?
Where is
RL used?
What are the
advantages of
RL?
What
algorithms
are used in
RL?
How to get
started?
API for reinforcement learning
Python
One Agent is used
Different environments
https://gymnasium.farama.org
Key points
Reinforcement learning is a branch of machine learning where
agent learns about its environment using the mechanism of rewards and
punishments.
RL doesn’t rely on labeled data set.
RL learns by trial-and-error through interacting with its environment so it
can come to conclusions / knowledge that humans didn’t reach.
@MarkoLohert

More Related Content

Similar to Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024

Machine Learning Contents.pptx
Machine Learning Contents.pptxMachine Learning Contents.pptx
Machine Learning Contents.pptx
Naveenkushwaha18
 
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
Edge AI and Vision Alliance
 
Machine learning Chapter 1
Machine learning Chapter 1Machine learning Chapter 1
Machine learning Chapter 1
JagadishPogu
 
OpenAI Gym & Universe
OpenAI Gym & UniverseOpenAI Gym & Universe
OpenAI Gym & Universe
Entrepreneur / Startup
 
Aprendizaje reforzado con swift
Aprendizaje reforzado con swiftAprendizaje reforzado con swift
Aprendizaje reforzado con swift
NSCoder Mexico
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
ManiMaran230751
 
Introduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabIntroduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLab
CloudxLab
 
Briefly About Reinforcement Learning which we are using in our Esports project?
Briefly About Reinforcement Learning which we are using in our Esports project?Briefly About Reinforcement Learning which we are using in our Esports project?
Briefly About Reinforcement Learning which we are using in our Esports project?
Neeraj Bedi
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
M Waleed Kadous
 
Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.ppt
butest
 
Machine Learning in Finance
Machine Learning in FinanceMachine Learning in Finance
Machine Learning in Finance
Hamed Vaheb
 
Reinforcement learning slides
Reinforcement learning slidesReinforcement learning slides
Reinforcement learning slides
OmranHakami
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
MLconf
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine Learning
Ilya Grigorik
 
UNIT 1 Machine Learning [KCS-055] (1).pptx
UNIT 1 Machine Learning [KCS-055] (1).pptxUNIT 1 Machine Learning [KCS-055] (1).pptx
UNIT 1 Machine Learning [KCS-055] (1).pptx
RohanPathak30
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
Srinivasan R
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
NancyBeaulah_R
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine Learning
Jeff Tanner
 
Types of machine learning
Types of machine learningTypes of machine learning
Types of machine learning
HimaniAloona
 
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Donal Byrne
 

Similar to Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024 (20)

Machine Learning Contents.pptx
Machine Learning Contents.pptxMachine Learning Contents.pptx
Machine Learning Contents.pptx
 
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
“Reinforcement Learning: a Practical Introduction,” a Presentation from Micro...
 
Machine learning Chapter 1
Machine learning Chapter 1Machine learning Chapter 1
Machine learning Chapter 1
 
OpenAI Gym & Universe
OpenAI Gym & UniverseOpenAI Gym & Universe
OpenAI Gym & Universe
 
Aprendizaje reforzado con swift
Aprendizaje reforzado con swiftAprendizaje reforzado con swift
Aprendizaje reforzado con swift
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
Introduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLabIntroduction to Deep Learning | CloudxLab
Introduction to Deep Learning | CloudxLab
 
Briefly About Reinforcement Learning which we are using in our Esports project?
Briefly About Reinforcement Learning which we are using in our Esports project?Briefly About Reinforcement Learning which we are using in our Esports project?
Briefly About Reinforcement Learning which we are using in our Esports project?
 
Is Production RL at a tipping point?
Is Production RL at a tipping point?Is Production RL at a tipping point?
Is Production RL at a tipping point?
 
Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.ppt
 
Machine Learning in Finance
Machine Learning in FinanceMachine Learning in Finance
Machine Learning in Finance
 
Reinforcement learning slides
Reinforcement learning slidesReinforcement learning slides
Reinforcement learning slides
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine Learning
 
UNIT 1 Machine Learning [KCS-055] (1).pptx
UNIT 1 Machine Learning [KCS-055] (1).pptxUNIT 1 Machine Learning [KCS-055] (1).pptx
UNIT 1 Machine Learning [KCS-055] (1).pptx
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine Learning
 
Types of machine learning
Types of machine learningTypes of machine learning
Types of machine learning
 
Simulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous DrivingSimulation To Reality: Reinforcement Learning For Autonomous Driving
Simulation To Reality: Reinforcement Learning For Autonomous Driving
 

Recently uploaded

DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
Gerardo Pardo-Castellote
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
Yara Milbes
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
pavan998932
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
ICS
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
lorraineandreiamcidl
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
Shane Coughlan
 

Recently uploaded (20)

DDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systemsDDS-Security 1.2 - What's New? Stronger security for long-running systems
DDS-Security 1.2 - What's New? Stronger security for long-running systems
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
SMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API ServiceSMS API Integration in Saudi Arabia| Best SMS API Service
SMS API Integration in Saudi Arabia| Best SMS API Service
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
What is Augmented Reality Image Tracking
What is Augmented Reality Image TrackingWhat is Augmented Reality Image Tracking
What is Augmented Reality Image Tracking
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
Webinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for EmbeddedWebinar On-Demand: Using Flutter for Embedded
Webinar On-Demand: Using Flutter for Embedded
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOMLORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
LORRAINE ANDREI_LEQUIGAN_HOW TO USE ZOOM
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
openEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain SecurityopenEuler Case Study - The Journey to Supply Chain Security
openEuler Case Study - The Journey to Supply Chain Security
 

Reinforcement Learning – a Rewards Based Approach to Machine Learning - Marko Lohert - DORS CLUC 2024

  • 1.
  • 2. Agenda What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 3. What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 4. What is reinforcement learning? Reward Action
  • 5. What is reinforcement learning? Reward Action
  • 6. What is reinforcement learning? action-reward feedback loop of a generic RL model
  • 7. What is reinforcement learning? Reinforcement learning is a branch of machine learning that relies on learning through the mechanism of rewards and punishments.
  • 8. Policy How does Agent decide which action to take? Policy determines a probability that Agent will do Action At when in State St Policy: π(a|s)
  • 9. Goal == maximize total reward 𝜸 == discount factor Determines how much is a reward in distant future is less important that reward in near future Gt (Return) total reward in the future Learning is done in discrete steps Rk == reward in step k The number of steps can be fixed (T) or infinite (∞)
  • 10. Reinforcement learning in the the world of AI Artificial Intelligence Machine Learning … … Supervised learning Unsupervised learning Reinforcement learning
  • 11. Reinforcement learning in the the world of ML Supervised learning vs reinforcement learning - Supervised learning relies on labeled data set Unsupervised learning vs reinforcement learning - Unsupervised learning == training based on unlabeled data == finding patterns in data - Reinforcement learning == learning through the mechanism of
  • 12. What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 13. Robotics RL is used for building robust robots Industrial robots for more complex applications Sophisticated grasping strategies, object manipulation techniques, and enhance hand-eye coordination RL can be used to teach a robot to walk on 2 or 4 legs
  • 14. RL can be used to teach a robot to walk on two/four legs https://www.freethink.com/hard-tech/robot-legs https://bostondynamics.com/blog/starting- on-the-right-foot-with-reinforcement-learning https://youtu.be/goxCjGPQH7U
  • 15. Gaming RL can be used for testing games RL can perform many iterations without human input
  • 16. Reinforcement learning and Atari games Deep Q Learning was used to teach AI how to play Atari 2600 games
  • 17. Reinforcement learning and Atari games AI system did not get a domain knowledge how to play games (rules) System only sees pixels and was instructed to maximize points Implemented for many Atari 2600 games: Pong, Breakout … In 2013. Deepmind has published „Playing Atari with Deep Reinforcement Learning (Mnih et. al)”: https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
  • 18. Reinforcement learning and Atari games Game: Breakout After 240 minutes RL system has learned the best strategy: Create a tunnel, and send ball above the blocks -> The ball bounces between roof and blocks
  • 19. „The implications go far beyond my beloved chessboard... Not only do these self-taught expert machines perform incredibly well, but we can actually learn from the new knowledge they produce.” Garry Kasparov former world chess champion
  • 20. AlphaGo Presented in 2015. by Google DeepMind (https://deepmind.google) The first program that won a match against world champion in Go - Chinese strategy board game - Bigger challenge than chess
  • 21. AlphaZero 2017 AlphaZero == a single AI system that is an expert in: Go Chess Shogi (Japanese chess) https://deepmind.google/discover/blog/alphazero-shedding-new-light-on- chess-shogi-and-go
  • 22. Healthcare Reinforcement learning is applied to: - Development of the new drugs - Diagnostics - Dynamic treatment regimes (DTRs) - Surgery - …
  • 23. Trading and Finance Reinforcement learning achieves better results than supervised learning when applied to trading and finance IBM has developed a sophisticated RL- based platform that has ability to make financial trades
  • 24. Autonomous driving RL can be used for: Trajectory optimization Avoiding collision Lane changing Automatic parking …
  • 25. More info: https://wayve.ai | https://youtu.be/eRwTbRtnT1I
  • 26. And other areas … Cooling of data center (Google has reduced energy usage by 40%) News recommendation Marketing …
  • 27. What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 28. Advantages of Reinforcement Learning ✅RL can solve complex problems that cannot be solved using other methods. ✅It functions in dynamic environments ✅RL does not need a separate step of preparing data Difference between RL and supervised learning ✅It can be used when the only way to collect data from an environment is for an agent to interact with that environment …
  • 29. Disadvantages of Reinforcement Learning ⚠ Sparse-reward environment - an agent receives a reward only when the goal is reached Harder to known which steps were actually useful Popular solution == reward shaping -> adding additional hand-crafted rewards to help RL Hand-crafted additional awards require human expert to design them correctly, and additionally humans can be bias
  • 30. Disadvantages of Reinforcement Learning ⚠ RL needs to collect a lot of data from environment, and it needs a lot of calculations (data hungry) Not a problem when RL is applied to gaming because it can play the same game many times and collect a lot of data. ⚠ It can be expensive to learn by trying (and failing) For example: in robotics where robots are expensive and can get damaged when used (for learning)
  • 31. Solution to the disadvantages - general advice Combine RL with other techniques For example: RL + Deep Learning
  • 32. What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 34. Q-Learning Algorithm Most famous RL algorithm “Q” in “Q-Learning” stands for quality Example (Python): https://www.datacamp.com/tutorial/introduction-q-learning-beginner- tutorial
  • 37. Deep Q-Learning Algorithm Deep neural network instead of „simple” Q-Table Used in case of large environments Example (Python): https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q- learning-python
  • 38. Deep Q-Learning Algorithm Source: www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning- python
  • 39. What is reinforcement learning? Where is RL used? What are the advantages of RL? What algorithms are used in RL? How to get started?
  • 40. API for reinforcement learning Python One Agent is used Different environments https://gymnasium.farama.org
  • 41. Key points Reinforcement learning is a branch of machine learning where agent learns about its environment using the mechanism of rewards and punishments. RL doesn’t rely on labeled data set. RL learns by trial-and-error through interacting with its environment so it can come to conclusions / knowledge that humans didn’t reach.

Editor's Notes

  1. RL achieves excellent results when applied to complex problems
  2. https://youtu.be/Lu56xVlZ40M?si=DtUTUBi8-hpdFzhQ