Reinforcement Learning
Introduction
• Is the art of optimal decision making process
• Is the training of machine learning models to make a
sequence of decisions
• RL agent is able to perceive and interpret its
environments, take actions and learn through trail and
error.
• Human involvement is limited to changing
the environment
System
Actor
Action / Instruction
Main points in Reinforcement learning
• Input: The input should be an initial state from which
the model will start
• Output: There are many possible output as there are
variety of solution to a particular problem
• Training: The training is based upon the input, The
model will return a state and the user will decide to
reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum
reward.
Example : We have an agent and a reward, with many
hurdles in between. The agent is supposed to find the
best possible path to reach the reward.
Difference between Reinforcement learning and Supervised learning:
Reinforcement learning
• Reinforcement learning is all
about making decisions
sequentially.
• In Reinforcement learning
decision is dependent, So we give
labels to sequences of dependent
decisions
• Example: Chess game
Supervised learning
• In Supervised learning the
decision is made on the initial
input or the input given at the start
• Supervised learning the decisions
are independent of each other so
labels are given to each decision.
• Example: Object recognition
Applications
• In Self -Driving Cars
• In Industry Automation
• In Trading and Finance
• In Natural Language Processing
• In Healthcare
• In Engineering
• In News Recommedation
Reinforcement Learning Algorithms
Value-Based:
• In a value-based Reinforcement Learning method, you should try to
maximize a value function V(s). In this method, the agent is expecting a
long-term return of the current states under policy π.
Policy-based:
• you try to come up with such a policy that the action performed in every
state helps you to gain maximum reward in the future.
Two types of policy-based methods are:
Deterministic: For any state, the same action is produced by the policy π.
Stochastic: Every action has a certain probability, which is determined by
the following equation. Stochastic Policy :n{as) = PA, = aS, =S]
Model-Based:
• In this Reinforcement Learning method, you need to create a virtual model
for each environment. The agent learns to perform in that specific
environment.
Reinforcement Learning Algorithms
Q-Learning, SARSA, DQN and A3C
Types of Reinforcement Learning
1.Positive
Positive Reinforcement is defined as when an event, occurs due to a particular
behaviour, increases the strength and the frequency of the behaviour. In other
words, it has a positive effect on behaviour.
Advantages of reinforcement learning are:
– Maximizes Performance
– Sustain Change for a long period of time
Disadvantages of reinforcement learning:
– Too much Reinforcement can lead to overload of states which can diminish the
results
2.Negative
Negative Reinforcement is defined as strengthening of a behaviour because a
negative condition is stopped or avoided.
Advantages of reinforcement learning:
– Increases Behaviour
– Provide defiance to minimum standard of performance
Disadvantages of reinforcement learning:
– It Only provides enough to meet up the minimum behaviour
Learning Models of Reinforcement
Markov Decision Process
The following parameters are used to get a solution:
• Set of actions- A
• Set of states -S
• Reward- R
• Policy- n
• Value- V
Q-Learning
Q learning is a value-based method of supplying information to inform which action
an agent should take.
Let’s understand this method by the following example:
• There are five rooms in a building which are connected by doors.
• Each room is numbered 0 to 4
• The outside of the building can be one big outside area (5)
• Doors number 1 and 4 lead into the building from room 5
• Next, you need to associate a reward value to each door:
• Doors which lead directly to the goal have a reward of 100
• Doors which is not directly connected to the target room gives zero reward
• As doors are two-way, and two arrows are assigned for each room
• Every arrow in the above image contains an instant reward value
Q-Learning
• Explanation:
• In this image, you can view that room represents a state
• Agent’s movement from one room to another represents an action
• In the below-given image, a state is described as a node, while the arrows show
the action.
For example, an agent traverse from room number 2 to 5
• Initial state = state 2
• State 2-> state 3
• State 3 -> state (2,1,4)
• State 4-> state (0,5,3)
• State 1-> state (5,3)
• State 0-> state 4

Reinforcemnet Leaning in ML and DL.pptx

  • 1.
  • 2.
    Introduction • Is theart of optimal decision making process • Is the training of machine learning models to make a sequence of decisions • RL agent is able to perceive and interpret its environments, take actions and learn through trail and error. • Human involvement is limited to changing the environment
  • 3.
  • 4.
    Main points inReinforcement learning • Input: The input should be an initial state from which the model will start • Output: There are many possible output as there are variety of solution to a particular problem • Training: The training is based upon the input, The model will return a state and the user will decide to reward or punish the model based on its output. • The model keeps continues to learn. • The best solution is decided based on the maximum reward.
  • 5.
    Example : Wehave an agent and a reward, with many hurdles in between. The agent is supposed to find the best possible path to reach the reward.
  • 6.
    Difference between Reinforcementlearning and Supervised learning: Reinforcement learning • Reinforcement learning is all about making decisions sequentially. • In Reinforcement learning decision is dependent, So we give labels to sequences of dependent decisions • Example: Chess game Supervised learning • In Supervised learning the decision is made on the initial input or the input given at the start • Supervised learning the decisions are independent of each other so labels are given to each decision. • Example: Object recognition
  • 7.
    Applications • In Self-Driving Cars • In Industry Automation • In Trading and Finance • In Natural Language Processing • In Healthcare • In Engineering • In News Recommedation
  • 8.
    Reinforcement Learning Algorithms Value-Based: •In a value-based Reinforcement Learning method, you should try to maximize a value function V(s). In this method, the agent is expecting a long-term return of the current states under policy π. Policy-based: • you try to come up with such a policy that the action performed in every state helps you to gain maximum reward in the future. Two types of policy-based methods are: Deterministic: For any state, the same action is produced by the policy π. Stochastic: Every action has a certain probability, which is determined by the following equation. Stochastic Policy :n{as) = PA, = aS, =S] Model-Based: • In this Reinforcement Learning method, you need to create a virtual model for each environment. The agent learns to perform in that specific environment.
  • 9.
  • 10.
    Types of ReinforcementLearning 1.Positive Positive Reinforcement is defined as when an event, occurs due to a particular behaviour, increases the strength and the frequency of the behaviour. In other words, it has a positive effect on behaviour. Advantages of reinforcement learning are: – Maximizes Performance – Sustain Change for a long period of time Disadvantages of reinforcement learning: – Too much Reinforcement can lead to overload of states which can diminish the results 2.Negative Negative Reinforcement is defined as strengthening of a behaviour because a negative condition is stopped or avoided. Advantages of reinforcement learning: – Increases Behaviour – Provide defiance to minimum standard of performance Disadvantages of reinforcement learning: – It Only provides enough to meet up the minimum behaviour
  • 11.
    Learning Models ofReinforcement Markov Decision Process The following parameters are used to get a solution: • Set of actions- A • Set of states -S • Reward- R • Policy- n • Value- V
  • 12.
    Q-Learning Q learning isa value-based method of supplying information to inform which action an agent should take. Let’s understand this method by the following example: • There are five rooms in a building which are connected by doors. • Each room is numbered 0 to 4 • The outside of the building can be one big outside area (5) • Doors number 1 and 4 lead into the building from room 5 • Next, you need to associate a reward value to each door: • Doors which lead directly to the goal have a reward of 100 • Doors which is not directly connected to the target room gives zero reward • As doors are two-way, and two arrows are assigned for each room • Every arrow in the above image contains an instant reward value
  • 13.
    Q-Learning • Explanation: • Inthis image, you can view that room represents a state • Agent’s movement from one room to another represents an action • In the below-given image, a state is described as a node, while the arrows show the action. For example, an agent traverse from room number 2 to 5 • Initial state = state 2 • State 2-> state 3 • State 3 -> state (2,1,4) • State 4-> state (0,5,3) • State 1-> state (5,3) • State 0-> state 4