SlideShare a Scribd company logo
1 of 31
Download to read offline
REINFORCEMENT LEARNING (RL)
SEMINAR REPORT
Submitted by
KALAISRIRAM.S
Reg.No:17TH1222
In partial fulfillment of the requirement for the degree of
BACHELOR OF TECHNOLOGY
in
INFORMATION TECHNOLOGY
of
PONDICHERRY UNIVERSITY
DEPARTMENT OF INFORMATION TECHNOLOGY
MANAKULA VINAYAGAR INSTITUTE OF TECHNOLOGY
KALITHEERTHALKUPPAM, PUDUCHERRY- 605 107.
MARCH-21.
MANAKULA VINAYAGAR INSTITUTE OF
TECHNOLOGY
KALITHEERTHAL KUPPAM, PUDUCHERRY- 605 107
DEPARTMENT OF INFORMATION TECHNOLOGY
BONAFIDE CERTIFICATE
This is to Certify that the Seminar Report Presentation Titled ā€œREENFORCEMENT
LEARNINGā€ is a bonafide record work done by S.KALAISRIRAM (Reg. No:17TH1222)
of Seventh Semester B.TECH in INFORMATION TECHNOLOGY for the SEMINAR
REPORT during the academic year 2020-21.
Staff in charge Head of the Department
Table of Contents
INTRODUCTION.......................................................................................................................................1
PROCESS OF REINFORCEMENT LEARNING ..................................................................................4
THE BELLMAN EQUATION ..................................................................................................................9
TYPES OF REINFORCEMENT LEARNING......................................................................................10
MARKOV DECISION PROCESS.............................................................................................................11
How to represent the agent state? .......................................................................................................11
1. Markov Decision.............................................................................................................................11
2. Markov Property.........................................................................................................................12
3. Markov Process:..........................................................................................................................12
VARIOUS REINFORCEMENT LEARNING ALGORITHMS..........................................................13
COMPARISON OF REINFORCEMENT LEARNING ALGORITHMS ................................................18
1. Associative reinforcement learning ............................................................................................18
2. Deep reinforcement learning.......................................................................................................18
3. Inverse reinforcement learning ...................................................................................................18
4. Safe Reinforcement Learning .....................................................................................................18
LEARNING MEATHODS.......................................................................................................................20
Difference between Reinforcement learning and Supervised learning:...........................................21
APPLICATIONS ......................................................................................................................................23
Reinforcement Learning Applications....................................................................................................23
PROS AND CONS....................................................................................................................................25
Pros of Reinforcement Learning.............................................................................................................25
Cons of Reinforcement Learning............................................................................................................26
CONCLUSION .........................................................................................................................................28
REFERENCES: ....................................................................................................................................28
1
CHAPTER 1
INTRODUCTION
1. What is Reinforcement learning?
ā€¢ Reinforcement Learning is a feedback-based Machine learning
technique in which an agent learns to behave in an environment by
performing the actions and seeing the results of actions. For each
good action, the agent gets positive feedback, and for each bad action,
the agent gets negative feedback or penalty.
ā€¢ In Reinforcement Learning, the agent learns automatically using
feedbacks without any labeled data, unlike supervised learning. Since
there is no labeled data, so the agent is bound to learn by its
experience only.
ā€¢ RL solves a specific type of problem where decision making is
sequential, and the goal is long-term, such as game-playing,
robotics, etc.The agent interacts with the environment and explores it
by itself.
ā€¢ The primary goal of an agent in reinforcement learning is to improve
the performance by getting the maximum positive rewards.
ā€¢ The agent learns with the process of hit and trial, and based on the
experience, it learns to perform the task in a better way.
ā€¢ Hence, we can say that "Reinforcement learning is a type of
machine learning method where an intelligent agent (computer
program) interacts with the environment and learns to act within
that."
.
2
FIG 1
FIG 2
3
2. Terms used in Reinforcement Learning
ā€¢ Agent(): An entity that can perceive/explore the environment and
act upon it.
ā€¢ Environment(): A situation in which an agent is present or
surrounded by. In RL, we assume the stochastic environment,
which means it is random in nature.
ā€¢ Action(): Actions are the moves taken by an agent within the
environment.
ā€¢ State(): State is a situation returned by the environment after each
action taken by the agent.
ā€¢ Reward(): A feedback returned to the agent from the environment
to evaluate the action of the agent.
ā€¢ Policy(): Policy is a strategy applied by the agent for the next
action based on the current state.
ā€¢ Value(): It is expected long-term retuned with the discount factor
and opposite to the short-term reward.
ā€¢ Q-value(): It is mostly similar to the value, but it takes one
additional parameter as a current action (a).
3. Key Features of Reinforcement Learning
o In RL, the agent is not instructed about the environment and what
actions need to be taken.
o It is based on the hit and trial process.
o The agent takes the next action and changes states according to the
feedback of the previous action.
o The agent may get a delayed reward.
o The environment is stochastic, and the agent needs to explore it to
reach to get the maximum positive rewards
4
CHAPTER 2
PROCESS OF REINFORCEMENT LEARNING
1. Elements of Reinforcement Learning
There are four main elements of Reinforcement Learning, which are
given below:
1) Policy: A policy can be defined as a way how an agent behaves at a
given time. It maps the perceived states of the environment to the actions
taken on those states. A policy is the core element of the RL as it alone
can define the behavior of the agent. In some cases, it may be a simple
function or a lookup table, whereas, for other cases, it may involve
general computation as a search process. It could be deterministic or a
stochastic policy:
For deterministic policy: a = Ļ€(s)
For stochastic policy: Ļ€(a | s) = P[At =a | St = s]
2) Reward Signal: The goal of reinforcement learning is defined by the
reward signal. At each state, the environment sends an immediate signal
to the learning agent, and this signal is known as a reward signal. These
rewards are given according to the good and bad actions taken by the
agent. The agent's main objective is to maximize the total number of
rewards for good actions. The reward signal can change the policy, such
as if an action selected by the agent leads to low reward, then the policy
may change to select other actions in the future.
3) Value Function: The value function gives information about how
good the situation and action are and how much reward an agent can
expect. A reward indicates the immediate signal for each good and
bad action, whereas a value function specifies the good state and
action for the future. The value function depends on the reward as,
without reward, there could be no value. The goal of estimating values is
to achieve more rewards.
5
4) Model: The last element of reinforcement learning is the model,
which mimics the behavior of the environment. With the help of the
model, one can make inferences about how the environment will behave.
Such as, if a state and an action are given, then a model can predict the
next state and reward.
2. Approach for implementation
There are mainly three ways to implement reinforcement-learning in
ML, which are:
1. Value-based:
The value-based approach is about to find the optimal value
function, which is the maximum value at a state under any policy.
Therefore, the agent expects the long-term return at any state(s)
under policy Ļ€.
2. Policy-based:
Policy-based approach is to find the optimal policy for the
maximum future rewards without using the value function. In this
approach, the agent tries to apply such a policy that the action
performed in each step helps to maximize the future reward.
The policy-based approach has mainly two types of policy:
o Deterministic: The same action is produced by the policy (Ļ€)
at any state.
o Stochastic: In this policy, probability determines the
produced action.
3. Model-based: In the model-based approach, a virtual model is
created for the environment, and the agent explores that
environment to learn it. There is no particular solution or algorithm
for this approach because the model representation is different for
each environment
6
3. Reinforcement Learning Working
To understand the working process of the RL, we need to consider two
main things:
1 Environment: It can be anything such as a room, maze, football
ground, etc.
2 Agent: An intelligent agent such as AI robot.
Let's take an example of a maze environment that the agent needs to
explore. Consider the below image:
FIG 3
In the above image, the agent is at the very first block of the maze. The
maze is consisting of an S6 block, which is a wall, S8 a fire pit, and
S4 a diamond block.
The agent cannot cross the S6 block, as it is a solid wall. If the agent
reaches the S4 block, then get the +1 reward; if it reaches the fire pit,
7
then gets -1 reward point. It can take four actions: move up, move
down, move left, and move right.
The agent can take any path to reach to the final point, but he needs to
make it in possible fewer steps. Suppose the agent considers the path S9-
S5-S1-S2-S3, so he will get the +1-reward point.
The agent will try to remember the preceding steps that it has taken to
reach the final step. To memorize the steps, it assigns 1 value to each
previous step. Consider the below step:
FIG 4
Now, the agent has successfully stored the previous steps assigning the 1
value to each previous block. But what will the agent do if he starts
moving from the block, which has 1 value block on both sides? Consider
the below diagram:
8
FIG5
It will be a difficult condition for the agent whether he should go up or
down as each block has the same value. So, the above approach is not
suitable for the agent to reach the destination. Hence to solve the
problem, we will use the Bellman equation, which is the main concept
behind reinforcement learning.
9
CHAPTER 3
THE BELLMAN EQUATION
The Bellman equation was introduced by the Mathematician Richard
Ernest Bellman in the year 1953, and hence it is called as a Bellman
equation. It is associated with dynamic programming and used to
calculate the values of a decision problem at a certain point by including
the values of previous states.
It is a way of calculating the value functions in dynamic programming or
environment that leads to modern reinforcement learning.
The key-elements used in Bellman equations are:
o Action performed by the agent is referred to as "a"
o State occurred by performing the action is "s."
o The reward/feedback obtained for each good and bad action is "R."
o A discount factor is Gamma "Ī³."
The Bellman equation can be written as:
V(s) = max [R(s,a) + Ī³V(s`)]
Where,
V(s)= value calculated at a particular point.
R(s,a) = Reward at a particular state s by performing an action.
Ī³ = Discount factor
V(s`) = The value at the previous state.
10
CHAPTER 4
TYPES OF REINFORCEMENT LEARNING
There are mainly two types of reinforcement learning, which are:
o Positive Reinforcement
o Negative Reinforcement
Positive Reinforcement:
The positive reinforcement learning means adding something to increase
the tendency that expected behavior would occur again. It impacts
positively on the behavior of the agent and increases the strength of the
behavior.
This type of reinforcement can sustain the changes for a long time, but
too much positive reinforcement may lead to an overload of states that
can reduce the consequences.
Negative Reinforcement:
The negative reinforcement learning is opposite to the positive
reinforcement as it increases the tendency that the specific behavior will
occur again by avoiding the negative condition.
It can be more effective than the positive reinforcement depending on
situation and behavior, but it provides reinforcement only to meet
minimum behavior.
11
CHAPTER 5
MARKOV DECISION PROCESS
How to represent the agent state?
We can represent the agent state using the Markov State that contains all the
required information from the history. The State St is Markov state if it follows the
given condition:
P[St+1 | St ] = P[St +1 | S1,......, St]
The Markov state follows the Markov property, which says that the future is
independent of the past and can only be defined with the present. The RL works on
fully observable environments, where the agent can observe the environment and
act for the new state. The complete process is known as Markov Decision process
Markov Decision
Markov Decision Process or MDP, is used to formalize the reinforcement
learning problems. If the environment is completely observable, then its dynamic
can be modeled as a Markov Process. In MDP, the agent constantly interacts with
the environment and performs actions; at each action, the environment responds
and generates a new state.
FIG 6
12
MDP is used to describe the environment for the RL, and almost all the RL
problem can be formalized using MDP.
MDP contains a tuple of four elements (S, A, Pa, Ra):
o A set of finite States S
o A set of finite Actions A
o Rewards received after transitioning from state S to state S', due to action a.
o Probability Pa.
MDP uses Markov property, and to better understand the MDP, we need to learn
about it.
Markov Property
It says that "If the agent is present in the current state S1, performs an action a1
and move to the state s2, then the state transition from s1 to s2 only depends on the
current state and future action and states do not depend on past actions, rewards,
or states."
Or, in other words, as per Markov Property, the current state transition does not
depend on any past action or state. Hence, MDP is an RL problem that satisfies the
Markov property. Such as in a Chess game, the players only focus on the
current state and do not need to remember past actions or states.
Finite MDP:
A finite MDP is when there are finite states, finite rewards, and finite actions. In
RL, we consider only the finite MDP.
Markov Process:
Markov Process is a memoryless process with a sequence of random states S1, S2,
....., St that uses the Markov Property. Markov process is also known as Markov
chain, which is a tuple (S, P) on state S and transition function P. These two
components (S and P) can define the dynamics of the system.
13
CHAPTER 6
VARIOUS REINFORCEMENT LEARNING ALGORITHMS
Reinforcement learning algorithms are mainly used in AI applications
and gaming applications. The main used algorithms are:
1. Q-Learning:
a. Q-learning is an Off policy RL algorithm, which is used for
the temporal difference Learning. The temporal difference
learning methods are the way of comparing temporally
successive predictions.
b. It learns the value function Q (S, a), which means how good to
take action "a" at a particular state "s."
c. The below flowchart explains the working of Q- learning:
FIG 7
14
2. State Action Reward State action (SARSA)
o SARSA stands for State Action Reward State action, which is
an on-policy temporal difference learning method. The on-policy
control method selects the action for each state while learning
using a specific policy.
o The goal of SARSA is to calculate the Q Ļ€ (s, a) for the selected
current policy Ļ€ and all pairs of (s-a).
o The main difference between Q-learning and SARSA algorithms is
that unlike Q-learning, the maximum reward for the next state
is not required for updating the Q-value in the table.
o In SARSA, new action and reward are selected using the same
policy, which has determined the original action.
o The SARSA is named because it uses the quintuple Q(s, a, r, s',
a'). Where,
s: original state
a: Original action
r: reward observed while following the states
s' and a': New state, action pair.
3. Deep Q Neural Network (DQN):
ā€¢ As the name suggests, DQN is a Q-learning using Neural
networks.
ā€¢ For a big state space environment, it will be a challenging and
complex task to define and update a Q-table.
ā€¢ To solve such an issue, we can use a DQN algorithm. Where,
instead of defining a Q-table, neural network approximates the Q-
values for each action and state
ā€¢ Another two techniques are also essential for training DQN:
15
1. Experience Replay
2. Separate Target Network:
4. Deep Deterministic Policy Gradient (DDPG)
DDPG relies on the actor-critic architecture with two eponymous
elements, actor and critic. An actor is used to tune the parameter šœ½ for
the policy function, i.e. decide the best action for a specific state.
A critic is used for evaluating the policy function estimated by the actor
according to the temporal difference (TD) error.
Here, the lower-case v denotes the policy that the actor has decided. Does
it look familiar? Yes! It looks just like the Q-learning update equation!
TD learning is a way to learn how to predict a value depending on future
values of a given state. Q-learning is a specific type of TD learning for
learning Q-value.
16
FIG 8
DDPG also borrows the ideas of experience replay and separate target
network from DQN . Another issue for DDPG is that it seldom performs
exploration for actions. A solution for this is adding noise on the
parameter space or the action space.
17
Classification Of RL Algorithm
FIG 9
18
CHAPTER 7
COMPARISON OF REINFORCEMENT LEARNING ALGORITHMS
1. Associative reinforcement learning
Associative reinforcement learning tasks combine facets of stochastic
learning automata tasks and supervised learning pattern classification
tasks. In associative reinforcement learning tasks, the learning system
interacts in a closed loop with its environment
2. Deep reinforcement learning
This approach extends reinforcement learning by using a deep neural
network and without explicitly designing the state space. The work on
learning ATARI games by Google DeepMind increased attention
to deep reinforcement learning or end-to-end reinforcement learning.
3. Inverse reinforcement learning
In inverse reinforcement learning (IRL), no reward function is given.
Instead, the reward function is inferred given an observed behavior from
an expert. The idea is to mimic observed behavior, which is often
optimal or close to optimal.
4. Safe Reinforcement Learning
Safe Reinforcement Learning (SRL) can be defined as the process of
learning policies that maximize the expectation of the return in problems
in which it is important to ensure reasonable system performance and/or
respect safety constraints during the learning and/or deployment
processes.
19
FIG 10
20
CHAPTER 8
LEARNING MEATHODS
These methods are different from previously studied methods and very
rarely used also. In this kind of learning algorithms, there would be an
agent that we want to train over a period of time so that it can interact
with a specific environment. The agent will follow a set of strategies for
interacting with the environment and then after observing the
environment it will take actions regards the current state of the
environment. The following are the main steps of reinforcement
learning methods āˆ’
1. First, we need to prepare an agent with some initial set of strategies.
2. Then observe the environment and its current state.
3. Next, select the optimal policy regards the current state of the
environment and perform important action.
4. Now, the agent can get corresponding reward or penalty as per
accordance with the action taken by it in previous step.
5. Now, we can update the strategies if it is required so.
6. At last, repeat steps 2-5 until the agent got to learn and adopt the
optimal policies.
21
Difference between Reinforcement learning and Supervised
learning:
Reinforcement learning Supervised learning
Reinforcement learning is all
about making decisions
sequentially. In simple words we
can say that the output depends on
the state of the current input and
the next input depends on the
output of the previous input
In Supervised learning the
decision is made on the initial
input or the input given at the start
In Reinforcement learning
decision is dependent, So we give
labels to sequences of dependent
decisions
Supervised learning the
decisions are independent of
each other so labels are given to
each decision.
Example: Chess game Example: Object recognition
TABLE 1
22
TABLE 2
23
CHAPTER 9
APPLICATIONS
Reinforcement Learning Applications
FIG 11
1. Robotics:
1. RL is used in Robot navigation, Robo-soccer, walking,
juggling, etc.
2. Control:
1. RL can be used for adaptive control such as Factory processes,
admission control in telecommunication, and Helicopter pilot is an
example of reinforcement learning.
3. Game Playing:
1. RL can be used in Game playing such as tic-tac-toe, chess, etc.
4. Chemistry:
1. RL can be used for optimizing the chemical reactions.
24
5. Business:
1. RL is now used for business strategy planning.
6. Manufacturing:
1. In various automobile manufacturing companies, the robots
use deep reinforcement learning to pick goods and put them
in some containers.
7. Finance Sector:
1. The RL is currently used in the finance sector for evaluating
trading strategies.
25
CHAPTER 10
PROS AND CONS
Pros of Reinforcement Learning
ā€¢ Reinforcement learning can be used to solve very complex problems
that cannot be solved by conventional techniques.
ā€¢ This technique is preferred to achieve long-term results, which are
very difficult to achieve.
ā€¢ This learning model is very similar to the learning of human
beings. Hence, it is close to achieving perfection.
ā€¢ The model can correct the errors that occurred during the training
process.
ā€¢ Once an error is corrected by the model, the chances of occurring
the same error are very less.
ā€¢ It can create the perfect model to solve a particular problem.
ā€¢ Robots can implement reinforcement learning algorithms to learn
how to walk.
ā€¢ In the absence of a training dataset, it is bound to learn from its
experience.
ā€¢ Reinforcement learning models can outperform humans in many
tasks. DeepMindā€™s AlphaGo program, a reinforcement learning
model, beat the world champion Lee Sedol at the game of Go in
March 2016.
ā€¢ Reinforcement learning is intended to achieve the ideal behavior of
a model within a specific context, to maximize its performance.
ā€¢ It can be useful when the only way to collect information about the
environment is to interact with it
26
ā€¢ Reinforcement learning algorithms maintain a balance between
exploration and exploitation. Exploration is the process of trying
different things to see if they are better than what has been tried
before. Exploitation is the process of trying the things that have
worked best in the past. Other learning algorithms do not perform
this balance
Cons of Reinforcement Learning
ā€¢ Reinforcement learning as a framework is wrong in many different
ways, but it is precisely this quality that makes it useful.
ā€¢ Too much reinforcement learning can lead to an overload of states,
which can diminish the results.
ā€¢ Reinforcement learning is not preferable to use for solving simple
problems.
ā€¢ Reinforcement learning needs a lot of data and a lot of computation. It
is data-hungry. That is why it works really well in video games
because one can play the game again and again and again, so getting
lots of data seems feasible.
ā€¢ Reinforcement learning assumes the world is Markovian, which it is
not. The Markovian model describes a sequence of possible events in
which the probability of each event depends only on the state attained
in the previous event.
ā€¢ The curse of dimensionality limits reinforcement learning heavily for
real physical systems. According to Wikipedia, the curse of
dimensionality refers to various phenomena that arise when analyzing
and organizing data in high-dimensional spaces that do not occur in
low-dimensional settings such as the three-dimensional physical space
of everyday experience.
ā€¢ Another disadvantage is the curse of real-world samples. For example,
consider the case of learning by robots. The robot hardware is usually
27
very expensive, suffers from wear and tear, and requires careful
maintenance. Repairing a robot system is costs a lot.
ā€¢ To solve many problems of reinforcement learning, we can use a
combination of reinforcement learning with other techniques rather
than leaving it altogether. One popular combination is Reinforcement
learning with Deep Learning.
28
CHAPTER 11
CONCLUSION
We can say that Reinforcement Learning is one of the most interesting
and useful parts of Machine learning. In RL, the agent explores the
environment by exploring it without any human intervention. It is the
main learning algorithm that is used in Artificial Intelligence. But there
are some cases where it should not be used, such as if you have enough
data to solve the problem, then other ML algorithms can be used more
efficiently. The main issue with the RL algorithm is that some of the
parameters may affect the speed of the learning, such as delayed
feedback.
REFERENCES:
1. https://www.javatpoint.com/reinforcement-learning
2. https://towardsdatascience.com/introduction-to-various-
reinforcement-learning-algorithms-i-q-learning-sarsa-dqn-
ddpg-72a5e0cb6287
3. https://en.wikipedia.org/wiki/Reinforcement_learning#Compar
ison_of_reinforcement_learning_algorithms

More Related Content

What's hot

Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialOmar Enayet
Ā 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learningbutest
Ā 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
Ā 
I.BEST FIRST SEARCH IN AI
I.BEST FIRST SEARCH IN AII.BEST FIRST SEARCH IN AI
I.BEST FIRST SEARCH IN AIvikas dhakane
Ā 
Markov decision process
Markov decision processMarkov decision process
Markov decision processHamed Abdi
Ā 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
Ā 
Problem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence ProjectsProblem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence ProjectsDr. C.V. Suresh Babu
Ā 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningKhaled Saleh
Ā 
Goal based and utility based agents
Goal based and utility based agentsGoal based and utility based agents
Goal based and utility based agentsMegha Sharma
Ā 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptronomaraldabash
Ā 
Forms of learning in ai
Forms of learning in aiForms of learning in ai
Forms of learning in aiRobert Antony
Ā 
Lecture 16 KL Transform in Image Processing
Lecture 16 KL Transform in Image ProcessingLecture 16 KL Transform in Image Processing
Lecture 16 KL Transform in Image ProcessingVARUN KUMAR
Ā 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
Ā 
Presentation on Breadth First Search (BFS)
Presentation on Breadth First Search (BFS)Presentation on Breadth First Search (BFS)
Presentation on Breadth First Search (BFS)Shuvongkor Barman
Ā 
Logics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiLogics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiShaishavShah8
Ā 
Structure of agents
Structure of agentsStructure of agents
Structure of agentsMANJULA_AP
Ā 
Learning in AI
Learning in AILearning in AI
Learning in AIMinakshi Atre
Ā 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
Ā 

What's hot (20)

Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
Ā 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Ā 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
Ā 
I.BEST FIRST SEARCH IN AI
I.BEST FIRST SEARCH IN AII.BEST FIRST SEARCH IN AI
I.BEST FIRST SEARCH IN AI
Ā 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
Ā 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
Ā 
Problem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence ProjectsProblem Formulation in Artificial Inteligence Projects
Problem Formulation in Artificial Inteligence Projects
Ā 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
Ā 
Goal based and utility based agents
Goal based and utility based agentsGoal based and utility based agents
Goal based and utility based agents
Ā 
Multilayer perceptron
Multilayer perceptronMultilayer perceptron
Multilayer perceptron
Ā 
Ch 6 final
Ch 6 finalCh 6 final
Ch 6 final
Ā 
Forms of learning in ai
Forms of learning in aiForms of learning in ai
Forms of learning in ai
Ā 
Lecture 16 KL Transform in Image Processing
Lecture 16 KL Transform in Image ProcessingLecture 16 KL Transform in Image Processing
Lecture 16 KL Transform in Image Processing
Ā 
Knowledge based agents
Knowledge based agentsKnowledge based agents
Knowledge based agents
Ā 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
Ā 
Presentation on Breadth First Search (BFS)
Presentation on Breadth First Search (BFS)Presentation on Breadth First Search (BFS)
Presentation on Breadth First Search (BFS)
Ā 
Logics for non monotonic reasoning-ai
Logics for non monotonic reasoning-aiLogics for non monotonic reasoning-ai
Logics for non monotonic reasoning-ai
Ā 
Structure of agents
Structure of agentsStructure of agents
Structure of agents
Ā 
Learning in AI
Learning in AILearning in AI
Learning in AI
Ā 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
Ā 

Similar to Reinforcement learning

An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
Ā 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
Ā 
CS3013 -MACHINE LEARNING.pptx
CS3013 -MACHINE LEARNING.pptxCS3013 -MACHINE LEARNING.pptx
CS3013 -MACHINE LEARNING.pptxlogesswarisrinivasan
Ā 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSVijaylakshmi
Ā 
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdfleewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdfKristiLBurns
Ā 
What is Reinforcement Learning.pdf
What is Reinforcement Learning.pdfWhat is Reinforcement Learning.pdf
What is Reinforcement Learning.pdfAiblogtech
Ā 
Hibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning AgentsHibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning Agentsbutest
Ā 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learningazzeddine chenine
Ā 
A learning agent.doc
A learning agent.docA learning agent.doc
A learning agent.docjdinfo444
Ā 
REINFORCEMENT LEARNING (reinforced through trial and error).pptx
REINFORCEMENT LEARNING (reinforced through trial and error).pptxREINFORCEMENT LEARNING (reinforced through trial and error).pptx
REINFORCEMENT LEARNING (reinforced through trial and error).pptxarchayacb21
Ā 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginnersgokulprasath06
Ā 
Lecture 04 intelligent agents
Lecture 04 intelligent agentsLecture 04 intelligent agents
Lecture 04 intelligent agentsHema Kashyap
Ā 
Reinforcement Learning.pdf
Reinforcement Learning.pdfReinforcement Learning.pdf
Reinforcement Learning.pdfhemayadav41
Ā 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxMohibKhan79
Ā 
A Review on Introduction to Reinforcement Learning
A Review on Introduction to Reinforcement LearningA Review on Introduction to Reinforcement Learning
A Review on Introduction to Reinforcement Learningijtsrd
Ā 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.pptbutest
Ā 
poster_final_v7
poster_final_v7poster_final_v7
poster_final_v7Tie Zheng
Ā 
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Lviv Startup Club
Ā 
Agents-and-Problem-Solving-20022024-094442am.pdf
Agents-and-Problem-Solving-20022024-094442am.pdfAgents-and-Problem-Solving-20022024-094442am.pdf
Agents-and-Problem-Solving-20022024-094442am.pdfsyedhasanali293
Ā 
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxRL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxdeeplearning6
Ā 

Similar to Reinforcement learning (20)

An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
Ā 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
Ā 
CS3013 -MACHINE LEARNING.pptx
CS3013 -MACHINE LEARNING.pptxCS3013 -MACHINE LEARNING.pptx
CS3013 -MACHINE LEARNING.pptx
Ā 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Ā 
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdfleewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
leewayhertz.com-Reinforcement Learning from Human Feedback RLHF.pdf
Ā 
What is Reinforcement Learning.pdf
What is Reinforcement Learning.pdfWhat is Reinforcement Learning.pdf
What is Reinforcement Learning.pdf
Ā 
Hibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning AgentsHibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning Agents
Ā 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
Ā 
A learning agent.doc
A learning agent.docA learning agent.doc
A learning agent.doc
Ā 
REINFORCEMENT LEARNING (reinforced through trial and error).pptx
REINFORCEMENT LEARNING (reinforced through trial and error).pptxREINFORCEMENT LEARNING (reinforced through trial and error).pptx
REINFORCEMENT LEARNING (reinforced through trial and error).pptx
Ā 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginners
Ā 
Lecture 04 intelligent agents
Lecture 04 intelligent agentsLecture 04 intelligent agents
Lecture 04 intelligent agents
Ā 
Reinforcement Learning.pdf
Reinforcement Learning.pdfReinforcement Learning.pdf
Reinforcement Learning.pdf
Ā 
reinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptxreinforcement-learning-141009013546-conversion-gate02.pptx
reinforcement-learning-141009013546-conversion-gate02.pptx
Ā 
A Review on Introduction to Reinforcement Learning
A Review on Introduction to Reinforcement LearningA Review on Introduction to Reinforcement Learning
A Review on Introduction to Reinforcement Learning
Ā 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.ppt
Ā 
poster_final_v7
poster_final_v7poster_final_v7
poster_final_v7
Ā 
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Ā 
Agents-and-Problem-Solving-20022024-094442am.pdf
Agents-and-Problem-Solving-20022024-094442am.pdfAgents-and-Problem-Solving-20022024-094442am.pdf
Agents-and-Problem-Solving-20022024-094442am.pdf
Ā 
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxRL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
Ā 

More from SKS

Cloud computing in iot seminar report
Cloud computing in iot seminar reportCloud computing in iot seminar report
Cloud computing in iot seminar reportSKS
Ā 
Uses of ethical theories in professional ethics
Uses of ethical theories in professional ethicsUses of ethical theories in professional ethics
Uses of ethical theories in professional ethicsSKS
Ā 
Deep learning seminar report
Deep learning seminar reportDeep learning seminar report
Deep learning seminar reportSKS
Ā 
Network virtualization seminar report
Network virtualization seminar reportNetwork virtualization seminar report
Network virtualization seminar reportSKS
Ā 
Security in IoT
Security in IoTSecurity in IoT
Security in IoTSKS
Ā 
Variety of moral issues
Variety of moral issuesVariety of moral issues
Variety of moral issuesSKS
Ā 
Research ethics
Research ethicsResearch ethics
Research ethicsSKS
Ā 
Industrial standards
Industrial standardsIndustrial standards
Industrial standardsSKS
Ā 
Engineers are responsible experimenters
Engineers are responsible experimentersEngineers are responsible experimenters
Engineers are responsible experimentersSKS
Ā 
Engineering as experimentation
Engineering as experimentationEngineering as experimentation
Engineering as experimentationSKS
Ā 
Controversy and consensus
Controversy and consensusControversy and consensus
Controversy and consensusSKS
Ā 
Codes of ethics
Codes of ethicsCodes of ethics
Codes of ethicsSKS
Ā 
Codes of ethics
Codes of ethicsCodes of ethics
Codes of ethicsSKS
Ā 
A balanced outlook on the law
A balanced outlook on  the lawA balanced outlook on  the law
A balanced outlook on the lawSKS
Ā 
Theories about the right decision
Theories about the right decision Theories about the right decision
Theories about the right decision SKS
Ā 
Safety and risk
Safety and riskSafety and risk
Safety and riskSKS
Ā 
Risk-benefit analysis
Risk-benefit analysisRisk-benefit analysis
Risk-benefit analysisSKS
Ā 
Reducing risk
Reducing riskReducing risk
Reducing riskSKS
Ā 
Chernobyl case study
Chernobyl case studyChernobyl case study
Chernobyl case studySKS
Ā 
Assessment of safety and risk
Assessment of safety and riskAssessment of safety and risk
Assessment of safety and riskSKS
Ā 

More from SKS (20)

Cloud computing in iot seminar report
Cloud computing in iot seminar reportCloud computing in iot seminar report
Cloud computing in iot seminar report
Ā 
Uses of ethical theories in professional ethics
Uses of ethical theories in professional ethicsUses of ethical theories in professional ethics
Uses of ethical theories in professional ethics
Ā 
Deep learning seminar report
Deep learning seminar reportDeep learning seminar report
Deep learning seminar report
Ā 
Network virtualization seminar report
Network virtualization seminar reportNetwork virtualization seminar report
Network virtualization seminar report
Ā 
Security in IoT
Security in IoTSecurity in IoT
Security in IoT
Ā 
Variety of moral issues
Variety of moral issuesVariety of moral issues
Variety of moral issues
Ā 
Research ethics
Research ethicsResearch ethics
Research ethics
Ā 
Industrial standards
Industrial standardsIndustrial standards
Industrial standards
Ā 
Engineers are responsible experimenters
Engineers are responsible experimentersEngineers are responsible experimenters
Engineers are responsible experimenters
Ā 
Engineering as experimentation
Engineering as experimentationEngineering as experimentation
Engineering as experimentation
Ā 
Controversy and consensus
Controversy and consensusControversy and consensus
Controversy and consensus
Ā 
Codes of ethics
Codes of ethicsCodes of ethics
Codes of ethics
Ā 
Codes of ethics
Codes of ethicsCodes of ethics
Codes of ethics
Ā 
A balanced outlook on the law
A balanced outlook on  the lawA balanced outlook on  the law
A balanced outlook on the law
Ā 
Theories about the right decision
Theories about the right decision Theories about the right decision
Theories about the right decision
Ā 
Safety and risk
Safety and riskSafety and risk
Safety and risk
Ā 
Risk-benefit analysis
Risk-benefit analysisRisk-benefit analysis
Risk-benefit analysis
Ā 
Reducing risk
Reducing riskReducing risk
Reducing risk
Ā 
Chernobyl case study
Chernobyl case studyChernobyl case study
Chernobyl case study
Ā 
Assessment of safety and risk
Assessment of safety and riskAssessment of safety and risk
Assessment of safety and risk
Ā 

Recently uploaded

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
Ā 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
Ā 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
Ā 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
Ā 
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdfssuser54595a
Ā 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
Ā 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
Ā 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
Ā 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
Ā 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
Ā 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
Ā 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
Ā 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
Ā 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
Ā 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
Ā 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
Ā 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
Ā 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
Ā 

Recently uploaded (20)

SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
Ā 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Ā 
Model Call Girl in Bikash Puri Delhi reach out to us at šŸ”9953056974šŸ”
Model Call Girl in Bikash Puri  Delhi reach out to us at šŸ”9953056974šŸ”Model Call Girl in Bikash Puri  Delhi reach out to us at šŸ”9953056974šŸ”
Model Call Girl in Bikash Puri Delhi reach out to us at šŸ”9953056974šŸ”
Ā 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
Ā 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Ā 
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAŠ”Y_INDEX-DM_23-1-final-eng.pdf
Ā 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
Ā 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
Ā 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
Ā 
CĆ³digo Creativo y Arte de Software | Unidad 1
CĆ³digo Creativo y Arte de Software | Unidad 1CĆ³digo Creativo y Arte de Software | Unidad 1
CĆ³digo Creativo y Arte de Software | Unidad 1
Ā 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
Ā 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
Ā 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
Ā 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
Ā 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
Ā 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
Ā 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
Ā 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
Ā 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
Ā 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Ā 

Reinforcement learning

  • 1. REINFORCEMENT LEARNING (RL) SEMINAR REPORT Submitted by KALAISRIRAM.S Reg.No:17TH1222 In partial fulfillment of the requirement for the degree of BACHELOR OF TECHNOLOGY in INFORMATION TECHNOLOGY of PONDICHERRY UNIVERSITY DEPARTMENT OF INFORMATION TECHNOLOGY MANAKULA VINAYAGAR INSTITUTE OF TECHNOLOGY KALITHEERTHALKUPPAM, PUDUCHERRY- 605 107. MARCH-21.
  • 2. MANAKULA VINAYAGAR INSTITUTE OF TECHNOLOGY KALITHEERTHAL KUPPAM, PUDUCHERRY- 605 107 DEPARTMENT OF INFORMATION TECHNOLOGY BONAFIDE CERTIFICATE This is to Certify that the Seminar Report Presentation Titled ā€œREENFORCEMENT LEARNINGā€ is a bonafide record work done by S.KALAISRIRAM (Reg. No:17TH1222) of Seventh Semester B.TECH in INFORMATION TECHNOLOGY for the SEMINAR REPORT during the academic year 2020-21. Staff in charge Head of the Department
  • 3. Table of Contents INTRODUCTION.......................................................................................................................................1 PROCESS OF REINFORCEMENT LEARNING ..................................................................................4 THE BELLMAN EQUATION ..................................................................................................................9 TYPES OF REINFORCEMENT LEARNING......................................................................................10 MARKOV DECISION PROCESS.............................................................................................................11 How to represent the agent state? .......................................................................................................11 1. Markov Decision.............................................................................................................................11 2. Markov Property.........................................................................................................................12 3. Markov Process:..........................................................................................................................12 VARIOUS REINFORCEMENT LEARNING ALGORITHMS..........................................................13 COMPARISON OF REINFORCEMENT LEARNING ALGORITHMS ................................................18 1. Associative reinforcement learning ............................................................................................18 2. Deep reinforcement learning.......................................................................................................18 3. Inverse reinforcement learning ...................................................................................................18 4. Safe Reinforcement Learning .....................................................................................................18 LEARNING MEATHODS.......................................................................................................................20 Difference between Reinforcement learning and Supervised learning:...........................................21 APPLICATIONS ......................................................................................................................................23 Reinforcement Learning Applications....................................................................................................23 PROS AND CONS....................................................................................................................................25 Pros of Reinforcement Learning.............................................................................................................25 Cons of Reinforcement Learning............................................................................................................26 CONCLUSION .........................................................................................................................................28 REFERENCES: ....................................................................................................................................28
  • 4. 1 CHAPTER 1 INTRODUCTION 1. What is Reinforcement learning? ā€¢ Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions. For each good action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or penalty. ā€¢ In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled data, unlike supervised learning. Since there is no labeled data, so the agent is bound to learn by its experience only. ā€¢ RL solves a specific type of problem where decision making is sequential, and the goal is long-term, such as game-playing, robotics, etc.The agent interacts with the environment and explores it by itself. ā€¢ The primary goal of an agent in reinforcement learning is to improve the performance by getting the maximum positive rewards. ā€¢ The agent learns with the process of hit and trial, and based on the experience, it learns to perform the task in a better way. ā€¢ Hence, we can say that "Reinforcement learning is a type of machine learning method where an intelligent agent (computer program) interacts with the environment and learns to act within that." .
  • 6. 3 2. Terms used in Reinforcement Learning ā€¢ Agent(): An entity that can perceive/explore the environment and act upon it. ā€¢ Environment(): A situation in which an agent is present or surrounded by. In RL, we assume the stochastic environment, which means it is random in nature. ā€¢ Action(): Actions are the moves taken by an agent within the environment. ā€¢ State(): State is a situation returned by the environment after each action taken by the agent. ā€¢ Reward(): A feedback returned to the agent from the environment to evaluate the action of the agent. ā€¢ Policy(): Policy is a strategy applied by the agent for the next action based on the current state. ā€¢ Value(): It is expected long-term retuned with the discount factor and opposite to the short-term reward. ā€¢ Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current action (a). 3. Key Features of Reinforcement Learning o In RL, the agent is not instructed about the environment and what actions need to be taken. o It is based on the hit and trial process. o The agent takes the next action and changes states according to the feedback of the previous action. o The agent may get a delayed reward. o The environment is stochastic, and the agent needs to explore it to reach to get the maximum positive rewards
  • 7. 4 CHAPTER 2 PROCESS OF REINFORCEMENT LEARNING 1. Elements of Reinforcement Learning There are four main elements of Reinforcement Learning, which are given below: 1) Policy: A policy can be defined as a way how an agent behaves at a given time. It maps the perceived states of the environment to the actions taken on those states. A policy is the core element of the RL as it alone can define the behavior of the agent. In some cases, it may be a simple function or a lookup table, whereas, for other cases, it may involve general computation as a search process. It could be deterministic or a stochastic policy: For deterministic policy: a = Ļ€(s) For stochastic policy: Ļ€(a | s) = P[At =a | St = s] 2) Reward Signal: The goal of reinforcement learning is defined by the reward signal. At each state, the environment sends an immediate signal to the learning agent, and this signal is known as a reward signal. These rewards are given according to the good and bad actions taken by the agent. The agent's main objective is to maximize the total number of rewards for good actions. The reward signal can change the policy, such as if an action selected by the agent leads to low reward, then the policy may change to select other actions in the future. 3) Value Function: The value function gives information about how good the situation and action are and how much reward an agent can expect. A reward indicates the immediate signal for each good and bad action, whereas a value function specifies the good state and action for the future. The value function depends on the reward as, without reward, there could be no value. The goal of estimating values is to achieve more rewards.
  • 8. 5 4) Model: The last element of reinforcement learning is the model, which mimics the behavior of the environment. With the help of the model, one can make inferences about how the environment will behave. Such as, if a state and an action are given, then a model can predict the next state and reward. 2. Approach for implementation There are mainly three ways to implement reinforcement-learning in ML, which are: 1. Value-based: The value-based approach is about to find the optimal value function, which is the maximum value at a state under any policy. Therefore, the agent expects the long-term return at any state(s) under policy Ļ€. 2. Policy-based: Policy-based approach is to find the optimal policy for the maximum future rewards without using the value function. In this approach, the agent tries to apply such a policy that the action performed in each step helps to maximize the future reward. The policy-based approach has mainly two types of policy: o Deterministic: The same action is produced by the policy (Ļ€) at any state. o Stochastic: In this policy, probability determines the produced action. 3. Model-based: In the model-based approach, a virtual model is created for the environment, and the agent explores that environment to learn it. There is no particular solution or algorithm for this approach because the model representation is different for each environment
  • 9. 6 3. Reinforcement Learning Working To understand the working process of the RL, we need to consider two main things: 1 Environment: It can be anything such as a room, maze, football ground, etc. 2 Agent: An intelligent agent such as AI robot. Let's take an example of a maze environment that the agent needs to explore. Consider the below image: FIG 3 In the above image, the agent is at the very first block of the maze. The maze is consisting of an S6 block, which is a wall, S8 a fire pit, and S4 a diamond block. The agent cannot cross the S6 block, as it is a solid wall. If the agent reaches the S4 block, then get the +1 reward; if it reaches the fire pit,
  • 10. 7 then gets -1 reward point. It can take four actions: move up, move down, move left, and move right. The agent can take any path to reach to the final point, but he needs to make it in possible fewer steps. Suppose the agent considers the path S9- S5-S1-S2-S3, so he will get the +1-reward point. The agent will try to remember the preceding steps that it has taken to reach the final step. To memorize the steps, it assigns 1 value to each previous step. Consider the below step: FIG 4 Now, the agent has successfully stored the previous steps assigning the 1 value to each previous block. But what will the agent do if he starts moving from the block, which has 1 value block on both sides? Consider the below diagram:
  • 11. 8 FIG5 It will be a difficult condition for the agent whether he should go up or down as each block has the same value. So, the above approach is not suitable for the agent to reach the destination. Hence to solve the problem, we will use the Bellman equation, which is the main concept behind reinforcement learning.
  • 12. 9 CHAPTER 3 THE BELLMAN EQUATION The Bellman equation was introduced by the Mathematician Richard Ernest Bellman in the year 1953, and hence it is called as a Bellman equation. It is associated with dynamic programming and used to calculate the values of a decision problem at a certain point by including the values of previous states. It is a way of calculating the value functions in dynamic programming or environment that leads to modern reinforcement learning. The key-elements used in Bellman equations are: o Action performed by the agent is referred to as "a" o State occurred by performing the action is "s." o The reward/feedback obtained for each good and bad action is "R." o A discount factor is Gamma "Ī³." The Bellman equation can be written as: V(s) = max [R(s,a) + Ī³V(s`)] Where, V(s)= value calculated at a particular point. R(s,a) = Reward at a particular state s by performing an action. Ī³ = Discount factor V(s`) = The value at the previous state.
  • 13. 10 CHAPTER 4 TYPES OF REINFORCEMENT LEARNING There are mainly two types of reinforcement learning, which are: o Positive Reinforcement o Negative Reinforcement Positive Reinforcement: The positive reinforcement learning means adding something to increase the tendency that expected behavior would occur again. It impacts positively on the behavior of the agent and increases the strength of the behavior. This type of reinforcement can sustain the changes for a long time, but too much positive reinforcement may lead to an overload of states that can reduce the consequences. Negative Reinforcement: The negative reinforcement learning is opposite to the positive reinforcement as it increases the tendency that the specific behavior will occur again by avoiding the negative condition. It can be more effective than the positive reinforcement depending on situation and behavior, but it provides reinforcement only to meet minimum behavior.
  • 14. 11 CHAPTER 5 MARKOV DECISION PROCESS How to represent the agent state? We can represent the agent state using the Markov State that contains all the required information from the history. The State St is Markov state if it follows the given condition: P[St+1 | St ] = P[St +1 | S1,......, St] The Markov state follows the Markov property, which says that the future is independent of the past and can only be defined with the present. The RL works on fully observable environments, where the agent can observe the environment and act for the new state. The complete process is known as Markov Decision process Markov Decision Markov Decision Process or MDP, is used to formalize the reinforcement learning problems. If the environment is completely observable, then its dynamic can be modeled as a Markov Process. In MDP, the agent constantly interacts with the environment and performs actions; at each action, the environment responds and generates a new state. FIG 6
  • 15. 12 MDP is used to describe the environment for the RL, and almost all the RL problem can be formalized using MDP. MDP contains a tuple of four elements (S, A, Pa, Ra): o A set of finite States S o A set of finite Actions A o Rewards received after transitioning from state S to state S', due to action a. o Probability Pa. MDP uses Markov property, and to better understand the MDP, we need to learn about it. Markov Property It says that "If the agent is present in the current state S1, performs an action a1 and move to the state s2, then the state transition from s1 to s2 only depends on the current state and future action and states do not depend on past actions, rewards, or states." Or, in other words, as per Markov Property, the current state transition does not depend on any past action or state. Hence, MDP is an RL problem that satisfies the Markov property. Such as in a Chess game, the players only focus on the current state and do not need to remember past actions or states. Finite MDP: A finite MDP is when there are finite states, finite rewards, and finite actions. In RL, we consider only the finite MDP. Markov Process: Markov Process is a memoryless process with a sequence of random states S1, S2, ....., St that uses the Markov Property. Markov process is also known as Markov chain, which is a tuple (S, P) on state S and transition function P. These two components (S and P) can define the dynamics of the system.
  • 16. 13 CHAPTER 6 VARIOUS REINFORCEMENT LEARNING ALGORITHMS Reinforcement learning algorithms are mainly used in AI applications and gaming applications. The main used algorithms are: 1. Q-Learning: a. Q-learning is an Off policy RL algorithm, which is used for the temporal difference Learning. The temporal difference learning methods are the way of comparing temporally successive predictions. b. It learns the value function Q (S, a), which means how good to take action "a" at a particular state "s." c. The below flowchart explains the working of Q- learning: FIG 7
  • 17. 14 2. State Action Reward State action (SARSA) o SARSA stands for State Action Reward State action, which is an on-policy temporal difference learning method. The on-policy control method selects the action for each state while learning using a specific policy. o The goal of SARSA is to calculate the Q Ļ€ (s, a) for the selected current policy Ļ€ and all pairs of (s-a). o The main difference between Q-learning and SARSA algorithms is that unlike Q-learning, the maximum reward for the next state is not required for updating the Q-value in the table. o In SARSA, new action and reward are selected using the same policy, which has determined the original action. o The SARSA is named because it uses the quintuple Q(s, a, r, s', a'). Where, s: original state a: Original action r: reward observed while following the states s' and a': New state, action pair. 3. Deep Q Neural Network (DQN): ā€¢ As the name suggests, DQN is a Q-learning using Neural networks. ā€¢ For a big state space environment, it will be a challenging and complex task to define and update a Q-table. ā€¢ To solve such an issue, we can use a DQN algorithm. Where, instead of defining a Q-table, neural network approximates the Q- values for each action and state ā€¢ Another two techniques are also essential for training DQN:
  • 18. 15 1. Experience Replay 2. Separate Target Network: 4. Deep Deterministic Policy Gradient (DDPG) DDPG relies on the actor-critic architecture with two eponymous elements, actor and critic. An actor is used to tune the parameter šœ½ for the policy function, i.e. decide the best action for a specific state. A critic is used for evaluating the policy function estimated by the actor according to the temporal difference (TD) error. Here, the lower-case v denotes the policy that the actor has decided. Does it look familiar? Yes! It looks just like the Q-learning update equation! TD learning is a way to learn how to predict a value depending on future values of a given state. Q-learning is a specific type of TD learning for learning Q-value.
  • 19. 16 FIG 8 DDPG also borrows the ideas of experience replay and separate target network from DQN . Another issue for DDPG is that it seldom performs exploration for actions. A solution for this is adding noise on the parameter space or the action space.
  • 20. 17 Classification Of RL Algorithm FIG 9
  • 21. 18 CHAPTER 7 COMPARISON OF REINFORCEMENT LEARNING ALGORITHMS 1. Associative reinforcement learning Associative reinforcement learning tasks combine facets of stochastic learning automata tasks and supervised learning pattern classification tasks. In associative reinforcement learning tasks, the learning system interacts in a closed loop with its environment 2. Deep reinforcement learning This approach extends reinforcement learning by using a deep neural network and without explicitly designing the state space. The work on learning ATARI games by Google DeepMind increased attention to deep reinforcement learning or end-to-end reinforcement learning. 3. Inverse reinforcement learning In inverse reinforcement learning (IRL), no reward function is given. Instead, the reward function is inferred given an observed behavior from an expert. The idea is to mimic observed behavior, which is often optimal or close to optimal. 4. Safe Reinforcement Learning Safe Reinforcement Learning (SRL) can be defined as the process of learning policies that maximize the expectation of the return in problems in which it is important to ensure reasonable system performance and/or respect safety constraints during the learning and/or deployment processes.
  • 23. 20 CHAPTER 8 LEARNING MEATHODS These methods are different from previously studied methods and very rarely used also. In this kind of learning algorithms, there would be an agent that we want to train over a period of time so that it can interact with a specific environment. The agent will follow a set of strategies for interacting with the environment and then after observing the environment it will take actions regards the current state of the environment. The following are the main steps of reinforcement learning methods āˆ’ 1. First, we need to prepare an agent with some initial set of strategies. 2. Then observe the environment and its current state. 3. Next, select the optimal policy regards the current state of the environment and perform important action. 4. Now, the agent can get corresponding reward or penalty as per accordance with the action taken by it in previous step. 5. Now, we can update the strategies if it is required so. 6. At last, repeat steps 2-5 until the agent got to learn and adopt the optimal policies.
  • 24. 21 Difference between Reinforcement learning and Supervised learning: Reinforcement learning Supervised learning Reinforcement learning is all about making decisions sequentially. In simple words we can say that the output depends on the state of the current input and the next input depends on the output of the previous input In Supervised learning the decision is made on the initial input or the input given at the start In Reinforcement learning decision is dependent, So we give labels to sequences of dependent decisions Supervised learning the decisions are independent of each other so labels are given to each decision. Example: Chess game Example: Object recognition TABLE 1
  • 26. 23 CHAPTER 9 APPLICATIONS Reinforcement Learning Applications FIG 11 1. Robotics: 1. RL is used in Robot navigation, Robo-soccer, walking, juggling, etc. 2. Control: 1. RL can be used for adaptive control such as Factory processes, admission control in telecommunication, and Helicopter pilot is an example of reinforcement learning. 3. Game Playing: 1. RL can be used in Game playing such as tic-tac-toe, chess, etc. 4. Chemistry: 1. RL can be used for optimizing the chemical reactions.
  • 27. 24 5. Business: 1. RL is now used for business strategy planning. 6. Manufacturing: 1. In various automobile manufacturing companies, the robots use deep reinforcement learning to pick goods and put them in some containers. 7. Finance Sector: 1. The RL is currently used in the finance sector for evaluating trading strategies.
  • 28. 25 CHAPTER 10 PROS AND CONS Pros of Reinforcement Learning ā€¢ Reinforcement learning can be used to solve very complex problems that cannot be solved by conventional techniques. ā€¢ This technique is preferred to achieve long-term results, which are very difficult to achieve. ā€¢ This learning model is very similar to the learning of human beings. Hence, it is close to achieving perfection. ā€¢ The model can correct the errors that occurred during the training process. ā€¢ Once an error is corrected by the model, the chances of occurring the same error are very less. ā€¢ It can create the perfect model to solve a particular problem. ā€¢ Robots can implement reinforcement learning algorithms to learn how to walk. ā€¢ In the absence of a training dataset, it is bound to learn from its experience. ā€¢ Reinforcement learning models can outperform humans in many tasks. DeepMindā€™s AlphaGo program, a reinforcement learning model, beat the world champion Lee Sedol at the game of Go in March 2016. ā€¢ Reinforcement learning is intended to achieve the ideal behavior of a model within a specific context, to maximize its performance. ā€¢ It can be useful when the only way to collect information about the environment is to interact with it
  • 29. 26 ā€¢ Reinforcement learning algorithms maintain a balance between exploration and exploitation. Exploration is the process of trying different things to see if they are better than what has been tried before. Exploitation is the process of trying the things that have worked best in the past. Other learning algorithms do not perform this balance Cons of Reinforcement Learning ā€¢ Reinforcement learning as a framework is wrong in many different ways, but it is precisely this quality that makes it useful. ā€¢ Too much reinforcement learning can lead to an overload of states, which can diminish the results. ā€¢ Reinforcement learning is not preferable to use for solving simple problems. ā€¢ Reinforcement learning needs a lot of data and a lot of computation. It is data-hungry. That is why it works really well in video games because one can play the game again and again and again, so getting lots of data seems feasible. ā€¢ Reinforcement learning assumes the world is Markovian, which it is not. The Markovian model describes a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. ā€¢ The curse of dimensionality limits reinforcement learning heavily for real physical systems. According to Wikipedia, the curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience. ā€¢ Another disadvantage is the curse of real-world samples. For example, consider the case of learning by robots. The robot hardware is usually
  • 30. 27 very expensive, suffers from wear and tear, and requires careful maintenance. Repairing a robot system is costs a lot. ā€¢ To solve many problems of reinforcement learning, we can use a combination of reinforcement learning with other techniques rather than leaving it altogether. One popular combination is Reinforcement learning with Deep Learning.
  • 31. 28 CHAPTER 11 CONCLUSION We can say that Reinforcement Learning is one of the most interesting and useful parts of Machine learning. In RL, the agent explores the environment by exploring it without any human intervention. It is the main learning algorithm that is used in Artificial Intelligence. But there are some cases where it should not be used, such as if you have enough data to solve the problem, then other ML algorithms can be used more efficiently. The main issue with the RL algorithm is that some of the parameters may affect the speed of the learning, such as delayed feedback. REFERENCES: 1. https://www.javatpoint.com/reinforcement-learning 2. https://towardsdatascience.com/introduction-to-various- reinforcement-learning-algorithms-i-q-learning-sarsa-dqn- ddpg-72a5e0cb6287 3. https://en.wikipedia.org/wiki/Reinforcement_learning#Compar ison_of_reinforcement_learning_algorithms