This slide is a part of Introduction to Machine Learning course by Code Heroku.
Here is the recorded version of our Reinforcement Learning with OpenAI Gym tutorial: https://www.youtube.com/watch?v=3begG_s9lzg
Here is the link to Introduction to Machine Learning Course: http://www.codeheroku.com/course?course_id=1
You can watch all our upcoming and past workshops here: http://www.codeheroku.com
Subscribe to our YouTube channel: https://www.youtube.com/channel/UCL-_0RrZ3084Ea8Yavtcd9g
Follow our publication on Medium: https://medium.com/code-heroku
Visit our Facebook page: https://www.facebook.com/codeheroku
Driving Behavioral Change for Information Management through Data-Driven Gree...
Reinforcement Learning with OpenAI Gym - Value Iteration Frozen Lake - Code Heroku
1. Pleaseturn off your webcam
If you arejoining from a mobile phone
besureto click on
Join via Device Audio
Weare waiting for other participants to join
Wewill begin at 4:30 PM IST
14. Reinforcement Learning
Challenges
• Access to the Environment
• Delayed Reward (Temporal Credit RiskAssignment)
• High Cost Actions
• Distribution of data changes by the choice of actions you
take
• Efficient state representations?
• Good Rewards functions?
18. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Multi Arm Bandit
• Unknown Reward Distribution
• Deterministic Actions
• Objective:FindSequence of actions
whichwillmaximizetotal reward
19. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Exploration Vs Exploitation
To approximatevaluesof actionsAgent must choose actionsthatare non-
optimalto start with.
Once an agent has approximatedthe values, it can greedily pick the
highest value action.
In general we saw that RL deals with Making Decisions under uncertainty which core to understand intelligence and simulate it
RL also deals with sequence of actions
Often see a huge gap in the therotical approach which is taught in universities and practical implementations. In this entire course if you have noticed we are trying the bridge that gap
Y= F(X)
F(X)
What Happens when we do not know the consequence for our immediate actions
Contrast with Supervised ML
Delayed Rewards / Sparse Signal
RL deals with uncertaininty in envrionments / actions /observations
Good Rewards – Conversational agent, Treatment pathway for patients