Please turn off your webcam
If you are joining from a mobile phone
be sure to click on
Join via Device Audio
We are waiting for other participants to join
We will begin at 4:30 PM IST
Mihir Thakkar
Founder and Instructor
hello@codeheroku.com
Introduction to
Reinforcement
Learning
SESSION
OBJECTIVES
• Introduction to RL
• Use Cases
• Formalization
• Multi Arm Bandit in
Python
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
Quiz
Which is the next best move to make for our
robot?
Why
Reinforcement
Learning?
• Closer to Real World
• Deals with Stochastic
Nature of Environment
• Understand Intelligence
• Replicate Intelligence
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
https://drive.google.com/file/d/1gOrJu2svliyEnlIIeItHeBEd1rT3Vhob/view?usp=sharing
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
RL Model
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
Quiz:
Define States, Actions and Rewards
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
Quiz:
Define States, Actions and Rewards
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
Quiz
Define States, Actions and Rewards
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
Quiz
Define States, Actions and Rewards
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
Markov Decision Process (MDP)
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
Our Goal
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
Multi Arm Bandit
•Unknown Reward
Distribution
•Deterministic Actions
•Objective: Find Sequence
of actions which will
maximize total reward
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
Iterative Averaging
www.codeheroku.com Introduction to Machine Learning – Reinforcement Learning
Exploration Vs Exploitation
To approximate values of actions Agent must
choose actions that are non-optimal to start
with.
Once an agent has approximated the values, it
can greedily pick the highest value action.
Reinforcement
Learning
Challenges
• Access to the environment
• Delayed Reward (Temporal Credit
Risk Assignment)
• High Cost Actions
• Distribution of data changes by the
choice of actions you take
• Efficient state representations? What
constitutes a good state
• Good Rewards functions?
Thanks
Multi Arm Bandit
https://drive.google.com/file/d/1gql13NuNpRyEnpJJOUnIAsp
m2k4krQ9b/view?usp=sharing
https://github.com/codeheroku/Introduction-to-Machine-
Learning/tree/master/Reinforcement%20Learning/RL1%20Multiarm%20Bandit

Introduction to Reinforcement Learning - Code Heroku