2. Types of Machine Learning
Supervised Learning
Learn from labelled data - predict the
right label. Eg: Fraudulent transaction
classification, probability of a customer
to purchase a product given an online
Ad, etc.
Unsupervised Learning
No Labelled data - Instead, it relies of the
underlying pattern of data to find the
relationships between the data elements. Eg:
Marketing segmentation of customers based
on their demographic attributes, finding the
product associations, etc.
3. What is Reinforcement Learning
Modelled against a human brain, where we take an action, seek reward
for that action taken and determine what next action to take. Eg: A baby
learning to walk
There is no labelled data, nor do we find relationship between the data
points - We just seek the reward from every step and determine the
action based on the reward we get
The data is positioned in time sequence manner following this paradigm:
State→Action→Reward→State→Action
4. Applications of RL
Self Driving Cars
Online Ad Recommendations
Robotics
Chatbot
Medication on patients
Stock Trading
Online Education
5. Components of Reinforcement Learning - Markov Decision Process
● State St: Environmental Condition
● Agent: The model/Robot which learns about
the environment and decides the action
● Action At: Agent’s action based on some
condition
● Policy π: Mapping from State → Action
● Reward Rt: Feedback received for the action
The central idea of a reinforcement learning is to
maximize the expected cumulative reward
6. Markov Property
For a sequence - {q1, q2, q3, q4.. qn} -
P(qn|qn-1,qn-2,.. q1) = P(qn|qn-1)
Example: India’s chance of winning tomorrow’s match only depends on the last match that
India played
“The future is
independent of the
past given the
present”
- Markov
7. Basic working of Reinforcement Learning
● Action Space: Left, Right, Jump
● State: Position of Mario, position of the
enemy, places where the reward is, etc.
● Reward: Coins
● Discounted cumulative expected reward:
8. Types of Reinforcement Learning Algorithms
Multi Arm Bandits:
● Used in A/B testing of marketing
Ads, Actual Drug vs Placebo usage
in clinical trials, etc.
● Explore-Exploit Dilemma
● Epsilon Greedy
9. Types of Reinforcement Learning Algorithms
Temporal Differencing
Value of a state V(S): Tells us how
good it is to be at a state at a time t
Cumulative Discounted Reward:
Gt = Rt+1 + ℽRt+2 + ℽ2Rt+3 + ℽ3Rt+4…
TD(1):
V(S)t = V(S)t + ⍺ (Gt - V(S)t)
10. Learning Resources
● David Silver Reinforcement Learning Videos
● Sutton and Barto - Reinforcement Learning
● Prof. Ravindran Balaraman Videos on Reinforcement Learning
● Github repo: Awesome RL
● Deep RL Bootcamp Lectures