SlideShare a Scribd company logo
1 of 20
Pleaseturn off your webcam
If you arejoining from a mobile phone
besureto click on
Join via Device Audio
Weare waiting for other participants to join
Wewill begin at 4:30 PM IST
Mihir Thakkar
Founderand Instructor
hello@codeheroku.com
Reinforcement Learning with
OpenAIGym
SESSION OBJECTIVES
• Quick Recap
• Bellman’sEquations
• Value Iterationin OpenAI Gym
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
RL Problem
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Q Function
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Quiz
Given the following Reward Table,estimatethe value of Q(A3,East)
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Quiz
Given the following Reward Table,estimatethe value of Q(B3,North)
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Given a Value Function
Extract Policy
Value Iteration
OpenAI Gym
https://drive.google.com/file/d/16xMyG7bKrtT_6SId1kqLpR2vL1Km_us8/view?usp=sharing
https://github.com/codeheroku/Introduction-to-Machine-Learning/tree/master/Reinforcement%20Learning/RL2%20Value%20Iteration
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Value Iteration Algorithm
Reinforcement Learning
Challenges
• Access to the Environment
• Delayed Reward (Temporal Credit RiskAssignment)
• High Cost Actions
• Distribution of data changes by the choice of actions you
take
• Efficient state representations?
• Good Rewards functions?
Thanks
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Markov Decision Process (MDP)
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Multi Arm Bandit
• Unknown Reward Distribution
• Deterministic Actions
• Objective:FindSequence of actions
whichwillmaximizetotal reward
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Exploration Vs Exploitation
To approximatevaluesof actionsAgent must choose actionsthatare non-
optimalto start with.
Once an agent has approximatedthe values, it can greedily pick the
highest value action.
www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Iterative Averaging

More Related Content

More from codeheroku

More from codeheroku (10)

Introduction to Unsupervised Learning - Code Heroku
Introduction to Unsupervised Learning - Code HerokuIntroduction to Unsupervised Learning - Code Heroku
Introduction to Unsupervised Learning - Code Heroku
 
Building a movie recommendation engine in Python using Scikit-Learn - Code He...
Building a movie recommendation engine in Python using Scikit-Learn - Code He...Building a movie recommendation engine in Python using Scikit-Learn - Code He...
Building a movie recommendation engine in Python using Scikit-Learn - Code He...
 
Building Web Apps with Python Part 2 - Code Heroku
Building Web Apps with Python Part 2 - Code HerokuBuilding Web Apps with Python Part 2 - Code Heroku
Building Web Apps with Python Part 2 - Code Heroku
 
Building Web Apps with Python - Code Heroku
Building Web Apps with Python - Code HerokuBuilding Web Apps with Python - Code Heroku
Building Web Apps with Python - Code Heroku
 
Introduction to Python - Code Heroku
Introduction to Python - Code HerokuIntroduction to Python - Code Heroku
Introduction to Python - Code Heroku
 
Introduction to Machine Learning - Code Heroku
Introduction to Machine Learning - Code HerokuIntroduction to Machine Learning - Code Heroku
Introduction to Machine Learning - Code Heroku
 
Introduction to Data Visualization Part 2 - Code Heroku
Introduction to Data Visualization Part 2 - Code HerokuIntroduction to Data Visualization Part 2 - Code Heroku
Introduction to Data Visualization Part 2 - Code Heroku
 
Introduction to Data Visualization - Code Heroku
Introduction to Data Visualization - Code HerokuIntroduction to Data Visualization - Code Heroku
Introduction to Data Visualization - Code Heroku
 
Introduction to Computer Vision - Code Heroku
Introduction to Computer Vision - Code HerokuIntroduction to Computer Vision - Code Heroku
Introduction to Computer Vision - Code Heroku
 
Introduction to JavaScript - Code Heroku
Introduction to JavaScript - Code HerokuIntroduction to JavaScript - Code Heroku
Introduction to JavaScript - Code Heroku
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

Reinforcement Learning with OpenAI Gym - Value Iteration Frozen Lake - Code Heroku

Editor's Notes

  1. In general we saw that RL deals with Making Decisions under uncertainty which core to understand intelligence and simulate it RL also deals with sequence of actions
  2. Often see a huge gap in the therotical approach which is taught in universities and practical implementations. In this entire course if you have noticed we are trying the bridge that gap
  3. Y= F(X) F(X) What Happens when we do not know the consequence for our immediate actions Contrast with Supervised ML Delayed Rewards / Sparse Signal RL deals with uncertaininty in envrionments / actions /observations
  4. Good Rewards – Conversational agent, Treatment pathway for patients