SlideShare a Scribd company logo
1 of 13
Reinforcement Learning
Introduction
• Is the art of optimal decision making process
• Is the training of machine learning models to make a
sequence of decisions
• RL agent is able to perceive and interpret its
environments, take actions and learn through trail and
error.
• Human involvement is limited to changing
the environment
System
Actor
Action / Instruction
Main points in Reinforcement learning
• Input: The input should be an initial state from which
the model will start
• Output: There are many possible output as there are
variety of solution to a particular problem
• Training: The training is based upon the input, The
model will return a state and the user will decide to
reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum
reward.
Example : We have an agent and a reward, with many
hurdles in between. The agent is supposed to find the
best possible path to reach the reward.
Difference between Reinforcement learning and Supervised learning:
Reinforcement learning
• Reinforcement learning is all
about making decisions
sequentially.
• In Reinforcement learning
decision is dependent, So we give
labels to sequences of dependent
decisions
• Example: Chess game
Supervised learning
• In Supervised learning the
decision is made on the initial
input or the input given at the start
• Supervised learning the decisions
are independent of each other so
labels are given to each decision.
• Example: Object recognition
Applications
• In Self -Driving Cars
• In Industry Automation
• In Trading and Finance
• In Natural Language Processing
• In Healthcare
• In Engineering
• In News Recommedation
Reinforcement Learning Algorithms
Value-Based:
• In a value-based Reinforcement Learning method, you should try to
maximize a value function V(s). In this method, the agent is expecting a
long-term return of the current states under policy π.
Policy-based:
• you try to come up with such a policy that the action performed in every
state helps you to gain maximum reward in the future.
Two types of policy-based methods are:
Deterministic: For any state, the same action is produced by the policy π.
Stochastic: Every action has a certain probability, which is determined by
the following equation. Stochastic Policy :n{as) = PA, = aS, =S]
Model-Based:
• In this Reinforcement Learning method, you need to create a virtual model
for each environment. The agent learns to perform in that specific
environment.
Reinforcement Learning Algorithms
Q-Learning, SARSA, DQN and A3C
Types of Reinforcement Learning
1.Positive
Positive Reinforcement is defined as when an event, occurs due to a particular
behaviour, increases the strength and the frequency of the behaviour. In other
words, it has a positive effect on behaviour.
Advantages of reinforcement learning are:
– Maximizes Performance
– Sustain Change for a long period of time
Disadvantages of reinforcement learning:
– Too much Reinforcement can lead to overload of states which can diminish the
results
2.Negative
Negative Reinforcement is defined as strengthening of a behaviour because a
negative condition is stopped or avoided.
Advantages of reinforcement learning:
– Increases Behaviour
– Provide defiance to minimum standard of performance
Disadvantages of reinforcement learning:
– It Only provides enough to meet up the minimum behaviour
Learning Models of Reinforcement
Markov Decision Process
The following parameters are used to get a solution:
• Set of actions- A
• Set of states -S
• Reward- R
• Policy- n
• Value- V
Q-Learning
Q learning is a value-based method of supplying information to inform which action
an agent should take.
Let’s understand this method by the following example:
• There are five rooms in a building which are connected by doors.
• Each room is numbered 0 to 4
• The outside of the building can be one big outside area (5)
• Doors number 1 and 4 lead into the building from room 5
• Next, you need to associate a reward value to each door:
• Doors which lead directly to the goal have a reward of 100
• Doors which is not directly connected to the target room gives zero reward
• As doors are two-way, and two arrows are assigned for each room
• Every arrow in the above image contains an instant reward value
Q-Learning
• Explanation:
• In this image, you can view that room represents a state
• Agent’s movement from one room to another represents an action
• In the below-given image, a state is described as a node, while the arrows show
the action.
For example, an agent traverse from room number 2 to 5
• Initial state = state 2
• State 2-> state 3
• State 3 -> state (2,1,4)
• State 4-> state (0,5,3)
• State 1-> state (5,3)
• State 0-> state 4

More Related Content

Similar to Reinforcemnet Leaning in ML and DL.pptx

reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
VaishnavGhadge1
 
Assessment in mathematics
Assessment in mathematicsAssessment in mathematics
Assessment in mathematics
Carlo Magno
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
ManiMaran230751
 

Similar to Reinforcemnet Leaning in ML and DL.pptx (20)

semi supervised Learning and Reinforcement learning (1).pptx
 semi supervised Learning and Reinforcement learning (1).pptx semi supervised Learning and Reinforcement learning (1).pptx
semi supervised Learning and Reinforcement learning (1).pptx
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
Intro to Reinforcement learning - part II
Intro to Reinforcement learning - part IIIntro to Reinforcement learning - part II
Intro to Reinforcement learning - part II
 
Introduction: Asynchronous Methods for Deep Reinforcement Learning
Introduction: Asynchronous Methods for  Deep Reinforcement LearningIntroduction: Asynchronous Methods for  Deep Reinforcement Learning
Introduction: Asynchronous Methods for Deep Reinforcement Learning
 
Pp ts for machine learning
Pp ts for machine learningPp ts for machine learning
Pp ts for machine learning
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learning
 
Assessment in mathematics
Assessment in mathematicsAssessment in mathematics
Assessment in mathematics
 
How to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative waysHow to formulate reinforcement learning in illustrative ways
How to formulate reinforcement learning in illustrative ways
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
Shanghai deep learning meetup 4
Shanghai deep learning meetup 4Shanghai deep learning meetup 4
Shanghai deep learning meetup 4
 
Statistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptxStatistical Learning and Model Selection (1).pptx
Statistical Learning and Model Selection (1).pptx
 
UNIT 1 Machine Learning [KCS-055] (1).pptx
UNIT 1 Machine Learning [KCS-055] (1).pptxUNIT 1 Machine Learning [KCS-055] (1).pptx
UNIT 1 Machine Learning [KCS-055] (1).pptx
 
Making smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement LearningMaking smart decisions in real-time with Reinforcement Learning
Making smart decisions in real-time with Reinforcement Learning
 
Machine Learning Methods 2.pptx
Machine Learning Methods 2.pptxMachine Learning Methods 2.pptx
Machine Learning Methods 2.pptx
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 

More from ManiMaran230751 (10)

one shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DSone shot15729752 Deep Learning for AI and DS
one shot15729752 Deep Learning for AI and DS
 
Deep-Learning-2017-Lecture ML DL RNN.ppt
Deep-Learning-2017-Lecture  ML DL RNN.pptDeep-Learning-2017-Lecture  ML DL RNN.ppt
Deep-Learning-2017-Lecture ML DL RNN.ppt
 
14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt
 
12337673 deep learning RNN RNN DL ML sa.ppt
12337673 deep learning RNN RNN DL ML sa.ppt12337673 deep learning RNN RNN DL ML sa.ppt
12337673 deep learning RNN RNN DL ML sa.ppt
 
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.pptHADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
HADOOP AND MAPREDUCE ARCHITECTURE-Unit-5.ppt
 
GNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.pptGNA 13552928 deep learning for GAN a.ppt
GNA 13552928 deep learning for GAN a.ppt
 
The Stochastic Network Calculus: A Modern Approach.pptx
The Stochastic Network Calculus: A Modern Approach.pptxThe Stochastic Network Calculus: A Modern Approach.pptx
The Stochastic Network Calculus: A Modern Approach.pptx
 
Open Access and IR along with Quality Indicators.pptx
Open Access and IR along with Quality Indicators.pptxOpen Access and IR along with Quality Indicators.pptx
Open Access and IR along with Quality Indicators.pptx
 
Fundamentals of RL.pptx
Fundamentals of RL.pptxFundamentals of RL.pptx
Fundamentals of RL.pptx
 
Acoustic Model.pptx
Acoustic Model.pptxAcoustic Model.pptx
Acoustic Model.pptx
 

Recently uploaded

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
SanaAli374401
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 

Recently uploaded (20)

Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
An Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdfAn Overview of Mutual Funds Bcom Project.pdf
An Overview of Mutual Funds Bcom Project.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 

Reinforcemnet Leaning in ML and DL.pptx

  • 2. Introduction • Is the art of optimal decision making process • Is the training of machine learning models to make a sequence of decisions • RL agent is able to perceive and interpret its environments, take actions and learn through trail and error. • Human involvement is limited to changing the environment
  • 4. Main points in Reinforcement learning • Input: The input should be an initial state from which the model will start • Output: There are many possible output as there are variety of solution to a particular problem • Training: The training is based upon the input, The model will return a state and the user will decide to reward or punish the model based on its output. • The model keeps continues to learn. • The best solution is decided based on the maximum reward.
  • 5. Example : We have an agent and a reward, with many hurdles in between. The agent is supposed to find the best possible path to reach the reward.
  • 6. Difference between Reinforcement learning and Supervised learning: Reinforcement learning • Reinforcement learning is all about making decisions sequentially. • In Reinforcement learning decision is dependent, So we give labels to sequences of dependent decisions • Example: Chess game Supervised learning • In Supervised learning the decision is made on the initial input or the input given at the start • Supervised learning the decisions are independent of each other so labels are given to each decision. • Example: Object recognition
  • 7. Applications • In Self -Driving Cars • In Industry Automation • In Trading and Finance • In Natural Language Processing • In Healthcare • In Engineering • In News Recommedation
  • 8. Reinforcement Learning Algorithms Value-Based: • In a value-based Reinforcement Learning method, you should try to maximize a value function V(s). In this method, the agent is expecting a long-term return of the current states under policy π. Policy-based: • you try to come up with such a policy that the action performed in every state helps you to gain maximum reward in the future. Two types of policy-based methods are: Deterministic: For any state, the same action is produced by the policy π. Stochastic: Every action has a certain probability, which is determined by the following equation. Stochastic Policy :n{as) = PA, = aS, =S] Model-Based: • In this Reinforcement Learning method, you need to create a virtual model for each environment. The agent learns to perform in that specific environment.
  • 10. Types of Reinforcement Learning 1.Positive Positive Reinforcement is defined as when an event, occurs due to a particular behaviour, increases the strength and the frequency of the behaviour. In other words, it has a positive effect on behaviour. Advantages of reinforcement learning are: – Maximizes Performance – Sustain Change for a long period of time Disadvantages of reinforcement learning: – Too much Reinforcement can lead to overload of states which can diminish the results 2.Negative Negative Reinforcement is defined as strengthening of a behaviour because a negative condition is stopped or avoided. Advantages of reinforcement learning: – Increases Behaviour – Provide defiance to minimum standard of performance Disadvantages of reinforcement learning: – It Only provides enough to meet up the minimum behaviour
  • 11. Learning Models of Reinforcement Markov Decision Process The following parameters are used to get a solution: • Set of actions- A • Set of states -S • Reward- R • Policy- n • Value- V
  • 12. Q-Learning Q learning is a value-based method of supplying information to inform which action an agent should take. Let’s understand this method by the following example: • There are five rooms in a building which are connected by doors. • Each room is numbered 0 to 4 • The outside of the building can be one big outside area (5) • Doors number 1 and 4 lead into the building from room 5 • Next, you need to associate a reward value to each door: • Doors which lead directly to the goal have a reward of 100 • Doors which is not directly connected to the target room gives zero reward • As doors are two-way, and two arrows are assigned for each room • Every arrow in the above image contains an instant reward value
  • 13. Q-Learning • Explanation: • In this image, you can view that room represents a state • Agent’s movement from one room to another represents an action • In the below-given image, a state is described as a node, while the arrows show the action. For example, an agent traverse from room number 2 to 5 • Initial state = state 2 • State 2-> state 3 • State 3 -> state (2,1,4) • State 4-> state (0,5,3) • State 1-> state (5,3) • State 0-> state 4