SlideShare a Scribd company logo
1 of 10
Download to read offline
Introduction
Features
Types of Reinforcement learning
Example
Reinforcement Learning
OMega TechEd
Introduction
2
Reinforcement Learning is a feedback-based Machine learning technique in
which an agent learns to behave in an environment by performing the
actions and seeing the results of actions. For each good action, the agent
gets positive feedback, and for each bad action, the agent gets negative
feedback or penalty.
Definition: "Reinforcement learning is a type of machine learning
method where an intelligent agent (computer program) interacts with the
environment and learns to act within that."
OMega TechEd
Features
 In RL, the agent is not instructed about the environment and what actions
need to be taken.
 It is based on the hit and trial process.
 The agent takes the next action and changes states according to the
feedback of the previous action.
 The agent may get a delayed reward.
 The environment is stochastic, and the agent needs to explore it to reach to
get the maximum positive rewards.
3
OMega TechEd
Types of Reinforcement learning
4
Positive Reinforcement
• The positive reinforcement learning means adding something to increase
the tendency that expected behavior would occur again. It impacts
positively on the behavior of the agent and increases the strength of the
behavior.
Negative Reinforcement
• The negative reinforcement learning is opposite to the positive
reinforcement as it increases the tendency that the specific behavior will
occur again by avoiding the negative condition.
OMega TechEd
Reinforcement learning vs
Supervised learning
Reinforcement Learning Supervised Learning
RL works by interacting with the
environment.
Supervised learning works on the existing
dataset.
The RL algorithm works like the human
brain works when making some decisions.
Supervised Learning works as when a
human learns things in the supervision of a
guide.
There is no labeled dataset is present The labeled dataset is present.
No previous training is provided to the
learning agent.
Training is provided to the algorithm so
that it can predict the output.
RL helps to take decisions sequentially. In Supervised learning, decisions are made
when input is given.
5
OMega TechEd
Example
6
 Suppose there is an AI agent present within a maze environment, and his
goal is to find the diamond. The agent interacts with the environment by
performing some actions, and based on those actions, the state of the
agent gets changed, and it also receives a reward or penalty as feedback.
 The agent continues doing these three things (take action, change
state/remain in the same state, and get feedback), and by doing these
actions, he learns and explores the environment.
 The agent learns that what actions lead to positive feedback or rewards
and what actions lead to negative feedback penalty. As a positive reward,
the agent gets a positive point, and as a penalty, it gets a negative point.
OMega TechEd
Example
7
• The agent is at the very first block of the maze. The
maze is consisting of an S6 block, which is a wall,
S8 a fire pit, and S4 a diamond block.
• The agent cannot cross the S6 block, as it is a solid
wall. If the agent reaches the S4 block, then get
the +1 reward; if it reaches the fire pit, then gets -1
reward point.
• It can take four actions: move up, move down,
move left, and move right.
• The agent can take any path to reach to the final
point, but he needs to make it in possible fewer
steps. Suppose the agent considers the path S9-S5-
S1-S2-S3, so he will get the +1-reward point.
OMega TechEd
Continue…
The agent will try to remember the preceding steps that it has taken to reach
the final step. To memorize the steps, it assigns 1 value to each previous step.
8
OMega TechEd
Continue…
9
The agent has successfully stored the previous steps
assigning the 1 value to each previous block. But
what will the agent do if he starts moving from the
block, which has 1 value block on both sides?
It will be a difficult condition for the agent whether
he should go up or down as each block has the same
value. So, the above approach is not suitable for the
agent to reach the destination. Hence to solve the
problem, we will use the Bellman equation, which
is the main concept behind reinforcement learning.
OMega TechEd
Thank you
Reference:
Artificial Intelligence: A Modern Approach, 3rd ed.
Stuart Russell and Peter Norvig
https://www.javatpoint.com/reinforcement-learning
Join Telegram channel for AI notes. Link is in the description.
OMega TechEd

More Related Content

Similar to Reinforcement Learning Guide

reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introductionConnorShorten2
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.pptssuser43a599
 
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdfleewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdfKristiLBurns
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningshivani saluja
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
 
23 forms of learning-N.doc
23 forms of learning-N.doc23 forms of learning-N.doc
23 forms of learning-N.docjdinfo444
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.pptPOOJASHREEC1
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learningSKS
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
 
5 learning edited 2012.ppt
5 learning edited 2012.ppt5 learning edited 2012.ppt
5 learning edited 2012.pptHenokGetachew15
 
Goal based agent
Goal based agentGoal based agent
Goal based agentLamisaFaria
 
Introduction to Reinforcement Learning.pptx
Introduction to Reinforcement Learning.pptxIntroduction to Reinforcement Learning.pptx
Introduction to Reinforcement Learning.pptxHarsha Patel
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginnersgokulprasath06
 

Similar to Reinforcement Learning Guide (20)

RL.ppt
RL.pptRL.ppt
RL.ppt
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
Organizational behaviour
Organizational behaviourOrganizational behaviour
Organizational behaviour
 
Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introduction
 
YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
Q_Learning.ppt
Q_Learning.pptQ_Learning.ppt
Q_Learning.ppt
 
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdfleewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Intelligent agents part ii
Intelligent agents part iiIntelligent agents part ii
Intelligent agents part ii
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
23 forms of learning-N.doc
23 forms of learning-N.doc23 forms of learning-N.doc
23 forms of learning-N.doc
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
5 learning edited 2012.ppt
5 learning edited 2012.ppt5 learning edited 2012.ppt
5 learning edited 2012.ppt
 
Goal based agent
Goal based agentGoal based agent
Goal based agent
 
Introduction to Reinforcement Learning.pptx
Introduction to Reinforcement Learning.pptxIntroduction to Reinforcement Learning.pptx
Introduction to Reinforcement Learning.pptx
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginners
 

More from Megha Sharma

Association Rule mining
Association Rule miningAssociation Rule mining
Association Rule miningMegha Sharma
 
Bellman's equation Reinforcement learning - II
Bellman's equation Reinforcement learning - IIBellman's equation Reinforcement learning - II
Bellman's equation Reinforcement learning - IIMegha Sharma
 
Entropy and information gain in decision tree.
Entropy and information gain in decision tree.Entropy and information gain in decision tree.
Entropy and information gain in decision tree.Megha Sharma
 
Types of Machine Learning. & Decision Tree.
Types of Machine Learning. & Decision Tree.Types of Machine Learning. & Decision Tree.
Types of Machine Learning. & Decision Tree.Megha Sharma
 
If statements in C
If statements in CIf statements in C
If statements in CMegha Sharma
 
Conditional and special operators
Conditional and special operatorsConditional and special operators
Conditional and special operatorsMegha Sharma
 
Assignment operators
Assignment operatorsAssignment operators
Assignment operatorsMegha Sharma
 
Relational and logical operators
Relational and logical operatorsRelational and logical operators
Relational and logical operatorsMegha Sharma
 
Arithmetic and increment decrement Operator
Arithmetic and increment decrement OperatorArithmetic and increment decrement Operator
Arithmetic and increment decrement OperatorMegha Sharma
 
Structure of C program
Structure of C programStructure of C program
Structure of C programMegha Sharma
 
Algorithm & Flowchart
Algorithm & FlowchartAlgorithm & Flowchart
Algorithm & FlowchartMegha Sharma
 
C Programming introduction
C Programming introductionC Programming introduction
C Programming introductionMegha Sharma
 
Enhanced ER Models
Enhanced ER ModelsEnhanced ER Models
Enhanced ER ModelsMegha Sharma
 
Entity Relationship design issues
Entity Relationship design issuesEntity Relationship design issues
Entity Relationship design issuesMegha Sharma
 
Participation Constraints in ER diagram
Participation Constraints in ER diagramParticipation Constraints in ER diagram
Participation Constraints in ER diagramMegha Sharma
 
Mapping Cardinalities
Mapping CardinalitiesMapping Cardinalities
Mapping CardinalitiesMegha Sharma
 

More from Megha Sharma (20)

Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Association Rule mining
Association Rule miningAssociation Rule mining
Association Rule mining
 
Bellman's equation Reinforcement learning - II
Bellman's equation Reinforcement learning - IIBellman's equation Reinforcement learning - II
Bellman's equation Reinforcement learning - II
 
E-M Algorithm
E-M AlgorithmE-M Algorithm
E-M Algorithm
 
Entropy and information gain in decision tree.
Entropy and information gain in decision tree.Entropy and information gain in decision tree.
Entropy and information gain in decision tree.
 
Types of Machine Learning. & Decision Tree.
Types of Machine Learning. & Decision Tree.Types of Machine Learning. & Decision Tree.
Types of Machine Learning. & Decision Tree.
 
If statements in C
If statements in CIf statements in C
If statements in C
 
Conditional and special operators
Conditional and special operatorsConditional and special operators
Conditional and special operators
 
Assignment operators
Assignment operatorsAssignment operators
Assignment operators
 
Bitwise operators
Bitwise operatorsBitwise operators
Bitwise operators
 
Relational and logical operators
Relational and logical operatorsRelational and logical operators
Relational and logical operators
 
Arithmetic and increment decrement Operator
Arithmetic and increment decrement OperatorArithmetic and increment decrement Operator
Arithmetic and increment decrement Operator
 
Structure of C program
Structure of C programStructure of C program
Structure of C program
 
C tokens
C tokensC tokens
C tokens
 
Algorithm & Flowchart
Algorithm & FlowchartAlgorithm & Flowchart
Algorithm & Flowchart
 
C Programming introduction
C Programming introductionC Programming introduction
C Programming introduction
 
Enhanced ER Models
Enhanced ER ModelsEnhanced ER Models
Enhanced ER Models
 
Entity Relationship design issues
Entity Relationship design issuesEntity Relationship design issues
Entity Relationship design issues
 
Participation Constraints in ER diagram
Participation Constraints in ER diagramParticipation Constraints in ER diagram
Participation Constraints in ER diagram
 
Mapping Cardinalities
Mapping CardinalitiesMapping Cardinalities
Mapping Cardinalities
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Reinforcement Learning Guide

  • 1. Introduction Features Types of Reinforcement learning Example Reinforcement Learning OMega TechEd
  • 2. Introduction 2 Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions. For each good action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or penalty. Definition: "Reinforcement learning is a type of machine learning method where an intelligent agent (computer program) interacts with the environment and learns to act within that." OMega TechEd
  • 3. Features  In RL, the agent is not instructed about the environment and what actions need to be taken.  It is based on the hit and trial process.  The agent takes the next action and changes states according to the feedback of the previous action.  The agent may get a delayed reward.  The environment is stochastic, and the agent needs to explore it to reach to get the maximum positive rewards. 3 OMega TechEd
  • 4. Types of Reinforcement learning 4 Positive Reinforcement • The positive reinforcement learning means adding something to increase the tendency that expected behavior would occur again. It impacts positively on the behavior of the agent and increases the strength of the behavior. Negative Reinforcement • The negative reinforcement learning is opposite to the positive reinforcement as it increases the tendency that the specific behavior will occur again by avoiding the negative condition. OMega TechEd
  • 5. Reinforcement learning vs Supervised learning Reinforcement Learning Supervised Learning RL works by interacting with the environment. Supervised learning works on the existing dataset. The RL algorithm works like the human brain works when making some decisions. Supervised Learning works as when a human learns things in the supervision of a guide. There is no labeled dataset is present The labeled dataset is present. No previous training is provided to the learning agent. Training is provided to the algorithm so that it can predict the output. RL helps to take decisions sequentially. In Supervised learning, decisions are made when input is given. 5 OMega TechEd
  • 6. Example 6  Suppose there is an AI agent present within a maze environment, and his goal is to find the diamond. The agent interacts with the environment by performing some actions, and based on those actions, the state of the agent gets changed, and it also receives a reward or penalty as feedback.  The agent continues doing these three things (take action, change state/remain in the same state, and get feedback), and by doing these actions, he learns and explores the environment.  The agent learns that what actions lead to positive feedback or rewards and what actions lead to negative feedback penalty. As a positive reward, the agent gets a positive point, and as a penalty, it gets a negative point. OMega TechEd
  • 7. Example 7 • The agent is at the very first block of the maze. The maze is consisting of an S6 block, which is a wall, S8 a fire pit, and S4 a diamond block. • The agent cannot cross the S6 block, as it is a solid wall. If the agent reaches the S4 block, then get the +1 reward; if it reaches the fire pit, then gets -1 reward point. • It can take four actions: move up, move down, move left, and move right. • The agent can take any path to reach to the final point, but he needs to make it in possible fewer steps. Suppose the agent considers the path S9-S5- S1-S2-S3, so he will get the +1-reward point. OMega TechEd
  • 8. Continue… The agent will try to remember the preceding steps that it has taken to reach the final step. To memorize the steps, it assigns 1 value to each previous step. 8 OMega TechEd
  • 9. Continue… 9 The agent has successfully stored the previous steps assigning the 1 value to each previous block. But what will the agent do if he starts moving from the block, which has 1 value block on both sides? It will be a difficult condition for the agent whether he should go up or down as each block has the same value. So, the above approach is not suitable for the agent to reach the destination. Hence to solve the problem, we will use the Bellman equation, which is the main concept behind reinforcement learning. OMega TechEd
  • 10. Thank you Reference: Artificial Intelligence: A Modern Approach, 3rd ed. Stuart Russell and Peter Norvig https://www.javatpoint.com/reinforcement-learning Join Telegram channel for AI notes. Link is in the description. OMega TechEd