REINFORCEMENT
LEARNING
3/31/2020 Shivani Saluja 1
INTRODUCTION
• Reinforcement learning is an area of Machine Learning.
• It is about taking suitable action to maximize reward in a particular situation.
• It is employed by various software and machines to find the best possible behavior or path it
should take in a specific situation.
• Reinforcement learning differs from the supervised learning in a way that in supervised learning
the training data has the answer key with it so the model is trained with the correct answer itself
whereas in reinforcement learning, there is no answer but the reinforcement agent decides what
to do to perform the given task. In the absence of training dataset, it is bound to learn from its
experience.
3/31/2020 Shivani Saluja 2
PROBLEM
3/31/2020 Shivani Saluja 3
• We have an agent and a reward,
with many hurdles in between.
• The agent is supposed to find
the best possible path to reach
the reward.
• The goal of the robot is to get the
reward that is the diamond and avoid
the hurdles that is fire.
• The robot learns by trying all the
possible paths and then choosing the
path which gives him the reward with
the least hurdles.
• Each right step will give the robot a
reward and each wrong step will
subtract the reward of the robot.
• The total reward will be calculated
when it reaches the final reward that
is the diamond.
KEYPOINTS
• Input: The input should be an initial state from which the model will start
• Output: There are many possible output as there are variety of solution to a particular
problem
• Training: The training is based upon the input, The model will return a state and the
user will decide to reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.
3/31/2020 Shivani Saluja 4
• Reinforcement learning can be thought of as a hit and trial method of learning.
• The machine gets a Reward or Penalty point for each action it performs.
• If the option is correct, the machine gains the reward point or gets a penalty point in case of a wrong
response
• The reinforcement learning algorithm is all about the interaction between the environment and the
learning agent.
• The learning agent is based on exploration and exploitation.
• Exploration is when the learning agent acts on trial and error and Exploitation is when it performs an
action based on the knowledge gained from the environment. The environment rewards the agent for
every correct action, which is the reinforcement signal. With the aim of collecting more rewards
obtained, the agent improves its environment knowledge to choose or perform the next action.
3/31/2020 Shivani Saluja 5
EXAMPLE
Let see how Pavlov trained his dog using reinforcement training?
• Pavlov divided the training of his dog into four stages.
• In the first part, Pavlov gave meat to the dog, and in response to the meat, the dog
started salivating.
• In the next stage he created a sound with a bell, but this time the dogs did not
respond to anything.
• In the third stage, he tried to train his dog by using the bell and then giving them
Seeing the food the dog started salivating.
• Eventually, the dogs started salivating just after hearing the bell, even if the food was
not given as the dog was reinforced that whenever the master will ring the bell, he
get the food.
3/31/2020 Shivani Saluja 6
3/31/2020 Shivani Saluja 7
TYPES OF REINFORCEMENT
• Positive –
Positive Reinforcement is defined as when an event, occurs due to a particular behavior, increases the strength and the
frequency of the behavior. In other words it has a positive effect on the behavior.Advantages of reinforcement learning
– Maximizes Performance
– Sustain Change for a long period of time
• Disadvantages of reinforcement learning:
– Too much Reinforcement can lead to overload of states which can diminish the results
• Negative –
Negative Reinforcement is defined as strengthening of a behavior because a negative condition is stopped or
avoided.Advantages of reinforcement learning:
– Increases Behavior
– Provide defiance to minimum standard of performance
• Disadvantages of reinforcement learning:
– It Only provides enough to meet up the minimum behavior
3/31/2020 Shivani Saluja 8
REINFORCEMENT LEARNING
• Value-Based:
• In a value-based Reinforcement Learning method, you should try to maximize a
value function V(s). In this method, the agent is expecting a long-term return of the
current states under policy π.
• Policy-based:
• In a policy-based RL method, you try to come up with such a policy that the action
performed in every state helps you to gain maximum reward in the future.
• Model-Based:
• In this Reinforcement Learning method, you need to create a virtual model for each
environment. The agent learns to perform in that specific environment.
3/31/2020 Shivani Saluja 9
APPLICATIONS
• RL can be used in robotics for industrial automation.
• RL can be used in machine learning and data processing
• RL can be used to create training systems that provide custom instruction and
materials according to the requirement of students.
RL can be used in large environments in the following situations:
• A model of the environment is known, but an analytic solution is not available;
• Only a simulation model of the environment is given (the subject of simulation-based
optimization);[6]
• The only way to collect information about the environment is to interact with it.
3/31/2020 Shivani Saluja 10
3/31/2020 Shivani Saluja 11

Reinforcement learning

  • 1.
  • 2.
    INTRODUCTION • Reinforcement learningis an area of Machine Learning. • It is about taking suitable action to maximize reward in a particular situation. • It is employed by various software and machines to find the best possible behavior or path it should take in a specific situation. • Reinforcement learning differs from the supervised learning in a way that in supervised learning the training data has the answer key with it so the model is trained with the correct answer itself whereas in reinforcement learning, there is no answer but the reinforcement agent decides what to do to perform the given task. In the absence of training dataset, it is bound to learn from its experience. 3/31/2020 Shivani Saluja 2
  • 3.
    PROBLEM 3/31/2020 Shivani Saluja3 • We have an agent and a reward, with many hurdles in between. • The agent is supposed to find the best possible path to reach the reward. • The goal of the robot is to get the reward that is the diamond and avoid the hurdles that is fire. • The robot learns by trying all the possible paths and then choosing the path which gives him the reward with the least hurdles. • Each right step will give the robot a reward and each wrong step will subtract the reward of the robot. • The total reward will be calculated when it reaches the final reward that is the diamond.
  • 4.
    KEYPOINTS • Input: Theinput should be an initial state from which the model will start • Output: There are many possible output as there are variety of solution to a particular problem • Training: The training is based upon the input, The model will return a state and the user will decide to reward or punish the model based on its output. • The model keeps continues to learn. • The best solution is decided based on the maximum reward. 3/31/2020 Shivani Saluja 4
  • 5.
    • Reinforcement learningcan be thought of as a hit and trial method of learning. • The machine gets a Reward or Penalty point for each action it performs. • If the option is correct, the machine gains the reward point or gets a penalty point in case of a wrong response • The reinforcement learning algorithm is all about the interaction between the environment and the learning agent. • The learning agent is based on exploration and exploitation. • Exploration is when the learning agent acts on trial and error and Exploitation is when it performs an action based on the knowledge gained from the environment. The environment rewards the agent for every correct action, which is the reinforcement signal. With the aim of collecting more rewards obtained, the agent improves its environment knowledge to choose or perform the next action. 3/31/2020 Shivani Saluja 5
  • 6.
    EXAMPLE Let see howPavlov trained his dog using reinforcement training? • Pavlov divided the training of his dog into four stages. • In the first part, Pavlov gave meat to the dog, and in response to the meat, the dog started salivating. • In the next stage he created a sound with a bell, but this time the dogs did not respond to anything. • In the third stage, he tried to train his dog by using the bell and then giving them Seeing the food the dog started salivating. • Eventually, the dogs started salivating just after hearing the bell, even if the food was not given as the dog was reinforced that whenever the master will ring the bell, he get the food. 3/31/2020 Shivani Saluja 6
  • 7.
  • 8.
    TYPES OF REINFORCEMENT •Positive – Positive Reinforcement is defined as when an event, occurs due to a particular behavior, increases the strength and the frequency of the behavior. In other words it has a positive effect on the behavior.Advantages of reinforcement learning – Maximizes Performance – Sustain Change for a long period of time • Disadvantages of reinforcement learning: – Too much Reinforcement can lead to overload of states which can diminish the results • Negative – Negative Reinforcement is defined as strengthening of a behavior because a negative condition is stopped or avoided.Advantages of reinforcement learning: – Increases Behavior – Provide defiance to minimum standard of performance • Disadvantages of reinforcement learning: – It Only provides enough to meet up the minimum behavior 3/31/2020 Shivani Saluja 8
  • 9.
    REINFORCEMENT LEARNING • Value-Based: •In a value-based Reinforcement Learning method, you should try to maximize a value function V(s). In this method, the agent is expecting a long-term return of the current states under policy π. • Policy-based: • In a policy-based RL method, you try to come up with such a policy that the action performed in every state helps you to gain maximum reward in the future. • Model-Based: • In this Reinforcement Learning method, you need to create a virtual model for each environment. The agent learns to perform in that specific environment. 3/31/2020 Shivani Saluja 9
  • 10.
    APPLICATIONS • RL canbe used in robotics for industrial automation. • RL can be used in machine learning and data processing • RL can be used to create training systems that provide custom instruction and materials according to the requirement of students. RL can be used in large environments in the following situations: • A model of the environment is known, but an analytic solution is not available; • Only a simulation model of the environment is given (the subject of simulation-based optimization);[6] • The only way to collect information about the environment is to interact with it. 3/31/2020 Shivani Saluja 10
  • 11.