2. Introduction
• Is the art of optimal decision making process
• Is the training of machine learning models to make a
sequence of decisions
• RL agent is able to perceive and interpret its
environments, take actions and learn through trail and
error.
• Human involvement is limited to changing
the environment
4. Main points in Reinforcement learning
• Input: The input should be an initial state from which
the model will start
• Output: There are many possible output as there are
variety of solution to a particular problem
• Training: The training is based upon the input, The
model will return a state and the user will decide to
reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum
reward.
5. Example : We have an agent and a reward, with many
hurdles in between. The agent is supposed to find the
best possible path to reach the reward.
6. Difference between Reinforcement learning and Supervised learning:
Reinforcement learning
• Reinforcement learning is all
about making decisions
sequentially.
• In Reinforcement learning
decision is dependent, So we give
labels to sequences of dependent
decisions
• Example: Chess game
Supervised learning
• In Supervised learning the
decision is made on the initial
input or the input given at the start
• Supervised learning the decisions
are independent of each other so
labels are given to each decision.
• Example: Object recognition
7. Applications
• In Self -Driving Cars
• In Industry Automation
• In Trading and Finance
• In Natural Language Processing
• In Healthcare
• In Engineering
• In News Recommedation
8. Reinforcement Learning Algorithms
Value-Based:
• In a value-based Reinforcement Learning method, you should try to
maximize a value function V(s). In this method, the agent is expecting a
long-term return of the current states under policy π.
Policy-based:
• you try to come up with such a policy that the action performed in every
state helps you to gain maximum reward in the future.
Two types of policy-based methods are:
Deterministic: For any state, the same action is produced by the policy π.
Stochastic: Every action has a certain probability, which is determined by
the following equation. Stochastic Policy :n{as) = PA, = aS, =S]
Model-Based:
• In this Reinforcement Learning method, you need to create a virtual model
for each environment. The agent learns to perform in that specific
environment.
10. Types of Reinforcement Learning
1.Positive
Positive Reinforcement is defined as when an event, occurs due to a particular
behaviour, increases the strength and the frequency of the behaviour. In other
words, it has a positive effect on behaviour.
Advantages of reinforcement learning are:
– Maximizes Performance
– Sustain Change for a long period of time
Disadvantages of reinforcement learning:
– Too much Reinforcement can lead to overload of states which can diminish the
results
2.Negative
Negative Reinforcement is defined as strengthening of a behaviour because a
negative condition is stopped or avoided.
Advantages of reinforcement learning:
– Increases Behaviour
– Provide defiance to minimum standard of performance
Disadvantages of reinforcement learning:
– It Only provides enough to meet up the minimum behaviour
11. Learning Models of Reinforcement
Markov Decision Process
The following parameters are used to get a solution:
• Set of actions- A
• Set of states -S
• Reward- R
• Policy- n
• Value- V
12. Q-Learning
Q learning is a value-based method of supplying information to inform which action
an agent should take.
Let’s understand this method by the following example:
• There are five rooms in a building which are connected by doors.
• Each room is numbered 0 to 4
• The outside of the building can be one big outside area (5)
• Doors number 1 and 4 lead into the building from room 5
• Next, you need to associate a reward value to each door:
• Doors which lead directly to the goal have a reward of 100
• Doors which is not directly connected to the target room gives zero reward
• As doors are two-way, and two arrows are assigned for each room
• Every arrow in the above image contains an instant reward value
13. Q-Learning
• Explanation:
• In this image, you can view that room represents a state
• Agent’s movement from one room to another represents an action
• In the below-given image, a state is described as a node, while the arrows show
the action.
For example, an agent traverse from room number 2 to 5
• Initial state = state 2
• State 2-> state 3
• State 3 -> state (2,1,4)
• State 4-> state (0,5,3)
• State 1-> state (5,3)
• State 0-> state 4