Reinforcement Learning with OpenAI Gym - Value Iteration Frozen Lake - Code Heroku

•Download as PPTX, PDF•

0 likes•967 views

This slide is a part of Introduction to Machine Learning course by Code Heroku. Here is the recorded version of our Reinforcement Learning with OpenAI Gym tutorial: https://www.youtube.com/watch?v=3begG_s9lzg Here is the link to Introduction to Machine Learning Course: http://www.codeheroku.com/course?course_id=1 You can watch all our upcoming and past workshops here: http://www.codeheroku.com Subscribe to our YouTube channel: https://www.youtube.com/channel/UCL-_0RrZ3084Ea8Yavtcd9g Follow our publication on Medium: https://medium.com/code-heroku Visit our Facebook page: https://www.facebook.com/codeheroku

Technology

Pleaseturn off your webcam
If you arejoining from a mobile phone
besureto click on
Join via Device Audio
Weare waiting for other participants to join
Wewill begin at 4:30 PM IST

Mihir Thakkar
Founderand Instructor
hello@codeheroku.com
Reinforcement Learning with
OpenAIGym

SESSION OBJECTIVES
• Quick Recap
• Bellman’sEquations
• Value Iterationin OpenAI Gym

www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning

www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
RL Problem

www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Q Function

www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Quiz
Given the following Reward Table,estimatethe value of Q(A3,East)

www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Quiz
Given the following Reward Table,estimatethe value of Q(B3,North)

www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Given a Value Function
Extract Policy

Value Iteration
OpenAI Gym
https://drive.google.com/file/d/16xMyG7bKrtT_6SId1kqLpR2vL1Km_us8/view?usp=sharing
https://github.com/codeheroku/Introduction-to-Machine-Learning/tree/master/Reinforcement%20Learning/RL2%20Value%20Iteration

www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Value Iteration Algorithm

Reinforcement Learning
Challenges
• Access to the Environment
• Delayed Reward (Temporal Credit RiskAssignment)
• High Cost Actions
• Distribution of data changes by the choice of actions you
take
• Efficient state representations?
• Good Rewards functions?

www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Markov Decision Process (MDP)

www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Multi Arm Bandit
• Unknown Reward Distribution
• Deterministic Actions
• Objective:FindSequence of actions
whichwillmaximizetotal reward

www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Exploration Vs Exploitation
To approximatevaluesof actionsAgent must choose actionsthatare non-
optimalto start with.
Once an agent has approximatedthe values, it can greedily pick the
highest value action.

www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning
Iterative Averaging

More from codeheroku

This slide is a part of Introduction to Machine Learning course by Code Heroku. Here is the recorded version of our Introduction to Unsupervised Learning tutorial: https://www.youtube.com/watch?v=gnxCdjaBkXY Here is the link to Introduction to Machine Learning Course: http://www.codeheroku.com/course?course_id=1 You can watch all our upcoming and past workshops here: http://www.codeheroku.com Subscribe to our YouTube channel: https://www.youtube.com/channel/UCL-_0RrZ3084Ea8Yavtcd9g Follow our publication on Medium: https://medium.com/code-heroku Visit our Facebook page: https://www.facebook.com/codeheroku

Introduction to Unsupervised Learning - Code Heroku

codeheroku

This slide is a part of Introduction to Machine Learning course by Code Heroku. Here is the recorded version of our Building a movie recommendation engine tutorial: https://www.youtube.com/watch?v=XoTwndOgXBM Here is the Medium post for this lesson: https://medium.com/code-heroku/building-a-movie-recommendation-engine-in-python-using-scikit-learn-c7489d7cb145 Here is the link to Introduction to Machine Learning Course: http://www.codeheroku.com/course?course_id=1 You can watch all our upcoming and past workshops here: http://www.codeheroku.com Subscribe to our YouTube channel: https://www.youtube.com/channel/UCL-_0RrZ3084Ea8Yavtcd9g Follow our publication on Medium: https://medium.com/code-heroku Visit our Facebook page: https://www.facebook.com/codeheroku

Building a movie recommendation engine in Python using Scikit-Learn - Code He...

codeheroku

Building Web Apps with Python Part 2 - Code Heroku

codeheroku

Building Web Apps with Python - Code Heroku

codeheroku

Introduction to Python - Code Heroku

codeheroku

Introduction to Machine Learning - Code Heroku

codeheroku

Introduction to Data Visualization Part 2 - Code Heroku

codeheroku

Introduction to Data Visualization - Code Heroku

codeheroku

Introduction to Computer Vision - Code Heroku

codeheroku

Introduction to JavaScript - Code Heroku

codeheroku

More from codeheroku (10)

Introduction to Unsupervised Learning - Code Heroku

Building a movie recommendation engine in Python using Scikit-Learn - Code He...

Building Web Apps with Python Part 2 - Code Heroku

Building Web Apps with Python - Code Heroku

Introduction to Python - Code Heroku

Introduction to Machine Learning - Code Heroku

Introduction to Data Visualization Part 2 - Code Heroku

Introduction to Data Visualization - Code Heroku

Introduction to Computer Vision - Code Heroku

Introduction to JavaScript - Code Heroku

Recently uploaded

💉💊+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI}}+971581248768 +971581248768 Mtp-Kit (500MG) Prices » Dubai [(+971581248768**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Maya Whatsapp +971581248768 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971581248768''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971581248768' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Cl

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Automating Google Workspace (GWS) & more with Apps Script

wesley chun

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Miguel Araújo

Abhishek Deb(1), Mr Abdul Kalam(2) M. Des (UX) , School of Design, DIT University , Dehradun. This paper explores the future potential of AI-enabled smartphone processors, aiming to investigate the advancements, capabilities, and implications of integrating artificial intelligence (AI) into smartphone technology. The research study goals consist of evaluating the development of AI in mobile phone processors, analyzing the existing state as well as abilities of AI-enabled cpus determining future patterns as well as chances together with reviewing obstacles as well as factors to consider for more growth.

Exploring the Future Potential of AI-Enabled Smartphone Processors

debabhi2

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

Product Anonymous

🐬 The future of MySQL is Postgres 🐘

RTylerCroy

What is a good lead in your organisation? Which leads are priority? What happens to leads? When sales and marketing give different answers to these questions, or perhaps aren't sure of the answers at all, frustrations build and opportunities are left on the table. Join us for an illuminating session with Cian McLoughlin, HubSpot Principal Customer Success Manager, as we look at that crucial piece of the customer journey in which leads are transferred from marketing to sales.

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

HampshireHUG

Scaling API-first – The story of a global engineering organization

Radu Cotescu

A Domino Admins Adventures (Engage 2024)

Gabriella Davis

Imagine a world where information flows as swiftly as thought itself, making decision-making as fluid as the data driving it. Every moment is critical, and the right tools can significantly boost your organization’s performance. The power of real-time data automation through FME can turn this vision into reality. Aimed at professionals eager to leverage real-time data for enhanced decision-making and efficiency, this webinar will cover the essentials of real-time data and its significance. We’ll explore: FME’s role in real-time event processing, from data intake and analysis to transformation and reporting An overview of leveraging streams vs. automations FME’s impact across various industries highlighted by real-life case studies Live demonstrations on setting up FME workflows for real-time data Practical advice on getting started, best practices, and tips for effective implementation Join us to enhance your skills in real-time data automation with FME, and take your operational capabilities to the next level.

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Safe Software

[2024]Digital Global Overview Report 2024 Meltwater.pdf

hans926745

Discord is a free app offering voice, video, and text chat functionalities, primarily catering to the gaming community. It serves as a hub for users to create and join servers tailored to their interests. Discord’s ecosystem comprises servers, each functioning as a distinct online community with its own channels dedicated to specific topics or activities. Users can engage in text-based discussions, voice calls, or video chats within these channels. Understanding Discord Servers Discord servers are virtual spaces where users congregate to interact, share content, and build communities. Servers may revolve around gaming, hobbies, interests, or fandoms, providing a platform for like-minded individuals to connect. Communication Features Discord offers a range of communication tools, including text channels for messaging, voice channels for real-time audio conversations, and video channels for face-to-face interactions. These features facilitate seamless communication and collaboration. What Does NSFW Mean? The acronym NSFW stands for “Not Safe For Work,” indicating content that may be inappropriate for professional or public settings. NSFW Content NSFW content encompasses material that is sexually explicit, violent, or otherwise graphic in nature. It often includes nudity, profanity, or depictions of sensitive topics.

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

UK Journal

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

Neo4j

What are drone anti-jamming systems? The drone anti-jamming systems and anti-spoof technology protect against interference, jamming, and spoofing of the UAVs. To protect their security, countries are beginning to research drone anti-jamming systems, also known as drone strike weapons. The anti-jam and anti-spoof technology protects against interference, jamming and spoofing. A drone strike weapon is a drone attack weapon that can attack and destroy enemy drones. So what is so unique about this amazing system?

What Are The Drone Anti-jamming Systems Technology?

Antenna Manufacturer Coco

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

Tech Trends Report 2024 Future Today Institute.pdf

hans926745

Data Cloud, More than a CDP by Matt Robison

Anna Loughnan Colquhoun

Advantages of Hiring UIUX Design Service Providers for Your Business

Pixlogix Infotech

Enterprise Knowledge’s Urmi Majumder, Principal Data Architecture Consultant, and Fernando Aguilar Islas, Senior Data Science Consultant, presented "Driving Behavioral Change for Information Management through Data-Driven Green Strategy" on March 27, 2024 at Enterprise Data World (EDW) in Orlando, Florida. In this presentation, Urmi and Fernando discussed a case study describing how the information management division in a large supply chain organization drove user behavior change through awareness of the carbon footprint of their duplicated and near-duplicated content, identified via advanced data analytics. Check out their presentation to gain valuable perspectives on utilizing data-driven strategies to influence positive behavioral shifts and support sustainability initiatives within your organization. In this session, participants gained answers to the following questions: - What is a Green Information Management (IM) Strategy, and why should you have one? - How can Artificial Intelligence (AI) and Machine Learning (ML) support your Green IM Strategy through content deduplication? - How can an organization use insights into their data to influence employee behavior for IM? - How can you reap additional benefits from content reduction that go beyond Green IM?

Driving Behavioral Change for Information Management through Data-Driven Gree...

Enterprise Knowledge

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...

Automating Google Workspace (GWS) & more with Apps Script

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Exploring the Future Potential of AI-Enabled Smartphone Processors

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

🐬 The future of MySQL is Postgres 🐘

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Scaling API-first – The story of a global engineering organization

A Domino Admins Adventures (Engage 2024)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

[2024]Digital Global Overview Report 2024 Meltwater.pdf

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...

What Are The Drone Anti-jamming Systems Technology?

Axa Assurance Maroc - Insurer Innovation Award 2024

Tech Trends Report 2024 Future Today Institute.pdf

Data Cloud, More than a CDP by Matt Robison

Advantages of Hiring UIUX Design Service Providers for Your Business

Driving Behavioral Change for Information Management through Data-Driven Gree...

Reinforcement Learning with OpenAI Gym - Value Iteration Frozen Lake - Code Heroku

1. Pleaseturn off your webcam If you arejoining from a mobile phone besureto click on Join via Device Audio Weare waiting for other participants to join Wewill begin at 4:30 PM IST

2. Mihir Thakkar Founderand Instructor hello@codeheroku.com Reinforcement Learning with OpenAIGym

3. SESSION OBJECTIVES • Quick Recap • Bellman’sEquations • Value Iterationin OpenAI Gym

4. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning

5. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning RL Problem

6. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning Q Function

7. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning

8. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning

9. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning Quiz Given the following Reward Table,estimatethe value of Q(A3,East)

10. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning Quiz Given the following Reward Table,estimatethe value of Q(B3,North)

11. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning Given a Value Function Extract Policy

12. Value Iteration OpenAI Gym https://drive.google.com/file/d/16xMyG7bKrtT_6SId1kqLpR2vL1Km_us8/view?usp=sharing https://github.com/codeheroku/Introduction-to-Machine-Learning/tree/master/Reinforcement%20Learning/RL2%20Value%20Iteration

13. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning Value Iteration Algorithm

14. Reinforcement Learning Challenges • Access to the Environment • Delayed Reward (Temporal Credit RiskAssignment) • High Cost Actions • Distribution of data changes by the choice of actions you take • Efficient state representations? • Good Rewards functions?

15. Thanks

16.

17. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning Markov Decision Process (MDP)

18. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning Multi Arm Bandit • Unknown Reward Distribution • Deterministic Actions • Objective:FindSequence of actions whichwillmaximizetotal reward

19. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning Exploration Vs Exploitation To approximatevaluesof actionsAgent must choose actionsthatare non- optimalto start with. Once an agent has approximatedthe values, it can greedily pick the highest value action.

20. www.codeheroku.com Introduction toMachine Learning –Reinforcement Learning Iterative Averaging

Editor's Notes

In general we saw that RL deals with Making Decisions under uncertainty which core to understand intelligence and simulate it RL also deals with sequence of actions
Often see a huge gap in the therotical approach which is taught in universities and practical implementations. In this entire course if you have noticed we are trying the bridge that gap
Y= F(X) F(X) What Happens when we do not know the consequence for our immediate actions Contrast with Supervised ML Delayed Rewards / Sparse Signal RL deals with uncertaininty in envrionments / actions /observations
Good Rewards – Conversational agent, Treatment pathway for patients

Reinforcement Learning with OpenAI Gym - Value Iteration Frozen Lake - Code Heroku

Recommended

Recommended

More Related Content

More from codeheroku

More from codeheroku (10)

Recently uploaded

Recently uploaded (20)

Reinforcement Learning with OpenAI Gym - Value Iteration Frozen Lake - Code Heroku

Editor's Notes