Uploaded byAlbert Orriols-Puig

2,959 views

Lecture22

This document summarizes a lecture on reinforcement learning and the Q-learning algorithm. Q-learning is a temporal difference learning method that allows an agent to learn optimal policies without needing a model of the environment's dynamics. The algorithm works by learning an action-value function (Q-function) that directly approximates the optimal Q-function through Q-backups without requiring a model of the environment. Pseudocode is provided for the basic Q-learning algorithm. Examples are also given showing how Q-learning can be used to learn an optimal policy for navigating a maze.

Education◦Technology◦

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Lecture22

Recommended

PPTX

Euclidean Distance And Manhattan Distance

byTharuka Vishwajith Sarathchandra

PDF

Image segmentation with deep learning

byAntonio Rueda-Toicen

PDF

Deep Learning: Recurrent Neural Network (Chapter 10)

PPTX

Convolutional Neural Network (CNN) - image recognition

byYUNG-KUEI CHEN

PPTX

Understanding Autoencoder (Deep Learning Book, Chapter 14)

byEntrepreneur / Startup

PPTX

Hopfield Networks

byKanchana Rani G

PPTX

Game playing in AI

byDr. C.V. Suresh Babu

PPTX

AI: AI & Searching

byDataminingTools Inc

PPTX

Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...

PDF

[系列活動] 一日搞懂生成式對抗網路

by台灣資料科學年會

PDF

Introduction of Faster R-CNN

bySimossyi Funabashi

PDF

Reinforcement Learning using OpenAI Gym

byMuhammad Aleem Siddiqui

PDF

Reinforcement Learning / E-Book / Part 1

byHitesh Mohapatra

PPT

rnn BASICS

byPriyanka Reddy

PDF

Convolutional Neural Network Models - Deep Learning

PDF

Introduction to object detection

PDF

Faster R-CNN - PR012

PPTX

Algorithm Using Divide And Conquer

PPTX

Bayesian network

byAhmad El Tawil

PPT

Dinive conquer algorithm

PPTX

Presentation on unsupervised learning

PPTX

Perceptron & Neural Networks

byNAGUR SHAREEF SHAIK

PDF

Divide and Conquer

byMohammed Hussein

PDF

Deep Learning Frameworks slides

bySheamus McGovern

PPTX

Android - Broadcast Receiver

byYong Heui Cho

PPTX

Instance based learning

PPTX

Recurrent Neural Network

byMohammad Sabouri

PPTX

Random forest

byMusa Hawamdah

PDF

Lecture7 - IBk

byAlbert Orriols-Puig

PDF

Lecture9 - Bayesian-Decision-Theory

byAlbert Orriols-Puig

More Related Content

PPTX

Euclidean Distance And Manhattan Distance

byTharuka Vishwajith Sarathchandra

PDF

Image segmentation with deep learning

byAntonio Rueda-Toicen

PDF

Deep Learning: Recurrent Neural Network (Chapter 10)

PPTX

Convolutional Neural Network (CNN) - image recognition

byYUNG-KUEI CHEN

PPTX

Understanding Autoencoder (Deep Learning Book, Chapter 14)

byEntrepreneur / Startup

PPTX

Hopfield Networks

byKanchana Rani G

PPTX

Game playing in AI

byDr. C.V. Suresh Babu

PPTX

AI: AI & Searching

byDataminingTools Inc

Euclidean Distance And Manhattan Distance

byTharuka Vishwajith Sarathchandra

Image segmentation with deep learning

byAntonio Rueda-Toicen

Deep Learning: Recurrent Neural Network (Chapter 10)

Convolutional Neural Network (CNN) - image recognition

byYUNG-KUEI CHEN

Understanding Autoencoder (Deep Learning Book, Chapter 14)

byEntrepreneur / Startup

Hopfield Networks

byKanchana Rani G

Game playing in AI

byDr. C.V. Suresh Babu

AI: AI & Searching

byDataminingTools Inc

What's hot

PPTX

Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...

PDF

[系列活動] 一日搞懂生成式對抗網路

by台灣資料科學年會

PDF

Introduction of Faster R-CNN

bySimossyi Funabashi

PDF

Reinforcement Learning using OpenAI Gym

byMuhammad Aleem Siddiqui

PDF

Reinforcement Learning / E-Book / Part 1

byHitesh Mohapatra

PPT

rnn BASICS

byPriyanka Reddy

PDF

Convolutional Neural Network Models - Deep Learning

PDF

Introduction to object detection

PDF

Faster R-CNN - PR012

PPTX

Algorithm Using Divide And Conquer

PPTX

Bayesian network

byAhmad El Tawil

PPT

Dinive conquer algorithm

PPTX

Presentation on unsupervised learning

PPTX

Perceptron & Neural Networks

byNAGUR SHAREEF SHAIK

PDF

Divide and Conquer

byMohammed Hussein

PDF

Deep Learning Frameworks slides

bySheamus McGovern

PPTX

Android - Broadcast Receiver

byYong Heui Cho

PPTX

Instance based learning

PPTX

Recurrent Neural Network

byMohammad Sabouri

PPTX

Random forest

byMusa Hawamdah

Naive Bayes Classifier | Naive Bayes Algorithm | Naive Bayes Classifier With ...

[系列活動] 一日搞懂生成式對抗網路

by台灣資料科學年會

Introduction of Faster R-CNN

bySimossyi Funabashi

Reinforcement Learning using OpenAI Gym

byMuhammad Aleem Siddiqui

Reinforcement Learning / E-Book / Part 1

byHitesh Mohapatra

rnn BASICS

byPriyanka Reddy

Convolutional Neural Network Models - Deep Learning

Introduction to object detection

Faster R-CNN - PR012

Algorithm Using Divide And Conquer

Bayesian network

byAhmad El Tawil

Dinive conquer algorithm

Presentation on unsupervised learning

Perceptron & Neural Networks

byNAGUR SHAREEF SHAIK

Divide and Conquer

byMohammed Hussein

Deep Learning Frameworks slides

bySheamus McGovern

Android - Broadcast Receiver

byYong Heui Cho

Instance based learning

Recurrent Neural Network

byMohammad Sabouri

Random forest

byMusa Hawamdah

Viewers also liked

PDF

Lecture7 - IBk

byAlbert Orriols-Puig

PDF

Lecture9 - Bayesian-Decision-Theory

byAlbert Orriols-Puig

PDF

Lecture11 - neural networks

byAlbert Orriols-Puig

PDF

Lecture12 - SVM

byAlbert Orriols-Puig

PDF

Lecture13 - Association Rules

byAlbert Orriols-Puig

PDF

Lecture1 AI1 Introduction to artificial intelligence

byAlbert Orriols-Puig

PDF

Lecture21

byAlbert Orriols-Puig

PDF

Lecture10 - Naïve Bayes

byAlbert Orriols-Puig

PDF

Lecture23

byAlbert Orriols-Puig

PDF

Lecture20

byAlbert Orriols-Puig

PDF

Lecture24

byAlbert Orriols-Puig

PDF

Lecture8 - From CBR to IBk

byAlbert Orriols-Puig

PDF

Lecture19

byAlbert Orriols-Puig

PDF

Lecture18

byAlbert Orriols-Puig

PDF

Lecture16 - Advances topics on association rules PART III

byAlbert Orriols-Puig

PDF

Lecture15 - Advances topics on association rules PART II

byAlbert Orriols-Puig

PDF

Lecture17

byAlbert Orriols-Puig

PDF

HAIS09-BeyondHomemadeArtificialDatasets

byAlbert Orriols-Puig

PPT

Day 9 routing

byCYBERINTELLIGENTS

PDF

Lecture14 - Advanced topics in association rules

byAlbert Orriols-Puig

Lecture7 - IBk

byAlbert Orriols-Puig

Lecture9 - Bayesian-Decision-Theory

byAlbert Orriols-Puig

Lecture11 - neural networks

byAlbert Orriols-Puig

Lecture12 - SVM

byAlbert Orriols-Puig

Lecture13 - Association Rules

byAlbert Orriols-Puig

Lecture1 AI1 Introduction to artificial intelligence

byAlbert Orriols-Puig

Lecture21

byAlbert Orriols-Puig

Lecture10 - Naïve Bayes

byAlbert Orriols-Puig

Lecture23

byAlbert Orriols-Puig

Lecture20

byAlbert Orriols-Puig

Lecture24

byAlbert Orriols-Puig

Lecture8 - From CBR to IBk

byAlbert Orriols-Puig

Lecture19

byAlbert Orriols-Puig

Lecture18

byAlbert Orriols-Puig

Lecture16 - Advances topics on association rules PART III

byAlbert Orriols-Puig

Lecture15 - Advances topics on association rules PART II

byAlbert Orriols-Puig

Lecture17

byAlbert Orriols-Puig

HAIS09-BeyondHomemadeArtificialDatasets

byAlbert Orriols-Puig

Day 9 routing

byCYBERINTELLIGENTS

Lecture14 - Advanced topics in association rules

byAlbert Orriols-Puig

Similar to Lecture22

PPTX

An introduction to reinforcement learning (rl)

PDF

Reinfrocement Learning

PDF

Head First Reinforcement Learning

byazzeddine chenine

PDF

Dp

PDF

Introduction to Deep Reinforcement Learning

byIDEAS - Int'l Data Engineering and Science Association

PPTX

How to formulate reinforcement learning in illustrative ways

byYasutoTamura1

PPTX

14_ReinforcementLearning.pptx

PPTX

Reinforcement Learning

bySVijaylakshmi

PPTX

Reinforcement Learning: An Introduction.pptx

byAnbazhaganSelvanatha

PPTX

What is Reinforcement Algorithms and how worked.pptx

byamranmerzad1400

PPT

Cs221 rl

PPT

types of reinforcement-learning and description

bysailajamachapuram

PPT

about reinforcement-learning ,reinforcement-learning.ppt

byommrudraprasad21

PPT

reinforcement-learning its based on the slide of university

byMOHDNADEEM971008

PPT

reinforcement-learning.ppt

PPT

reinforcement-learning.prsentation for c

byRahulChouhan572633

PDF

Cs229 notes12

PPTX

Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...

PDF

Intro to Reinforcement learning - part I

byMikko Mäkipää

PPT

Reinforcement Learner) is an intelligent agent that’s always striving to lear...

An introduction to reinforcement learning (rl)

Reinfrocement Learning

Head First Reinforcement Learning

byazzeddine chenine

Dp

Introduction to Deep Reinforcement Learning

byIDEAS - Int'l Data Engineering and Science Association

How to formulate reinforcement learning in illustrative ways

byYasutoTamura1

14_ReinforcementLearning.pptx

Reinforcement Learning

bySVijaylakshmi

Reinforcement Learning: An Introduction.pptx

byAnbazhaganSelvanatha

What is Reinforcement Algorithms and how worked.pptx

byamranmerzad1400

Cs221 rl

types of reinforcement-learning and description

bysailajamachapuram

about reinforcement-learning ,reinforcement-learning.ppt

byommrudraprasad21

reinforcement-learning its based on the slide of university

byMOHDNADEEM971008

reinforcement-learning.ppt

reinforcement-learning.prsentation for c

byRahulChouhan572633

Cs229 notes12

Navigation in 3 d environment with reinforcement learning by Predrag Njegovan...

Intro to Reinforcement learning - part I

byMikko Mäkipää

Reinforcement Learner) is an intelligent agent that’s always striving to lear...

More from Albert Orriols-Puig

PDF

Lecture3 - Machine Learning

byAlbert Orriols-Puig

PDF

Lecture6 - C4.5

byAlbert Orriols-Puig

PDF

Lecture4 - Machine Learning

byAlbert Orriols-Puig

PDF

Lecture5 - C4.5

byAlbert Orriols-Puig

PDF

Lecture1 - Machine Learning

byAlbert Orriols-Puig

PDF

Lecture2 - Machine Learning

byAlbert Orriols-Puig

PDF

IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...

byAlbert Orriols-Puig

PDF

HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...

byAlbert Orriols-Puig

PDF

New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...

byAlbert Orriols-Puig

PDF

HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...

byAlbert Orriols-Puig

PDF

HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS

byAlbert Orriols-Puig

Lecture3 - Machine Learning

byAlbert Orriols-Puig

Lecture6 - C4.5

byAlbert Orriols-Puig

Lecture4 - Machine Learning

byAlbert Orriols-Puig

Lecture5 - C4.5

byAlbert Orriols-Puig

Lecture1 - Machine Learning

byAlbert Orriols-Puig

Lecture2 - Machine Learning

byAlbert Orriols-Puig

IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...

byAlbert Orriols-Puig

HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...

byAlbert Orriols-Puig

New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...

byAlbert Orriols-Puig

HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...

byAlbert Orriols-Puig

HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS

byAlbert Orriols-Puig

Recently uploaded

PDF

Most Imp Chapters & Weightage for Boards 2025.pdf

PDF

Multiple Myeloma , definition, etiology, PP, CM , DE and management

PPTX

Expected Revenue Report In Odoo 18 CRM

byCeline George

PPTX

Pig- piggy bank in Big Data Analytics.ppt.pptx

PPTX

Overview of how to Create a Model in Odoo 18

byCeline George

PDF

Scalable-MADDPG-Based Cooperative Target Invasion for a Multi-USV System.pdf

PPTX

How to Manage Package Reservation in Odoo 18 Inventory

byCeline George

PPTX

Access partner report in Odoo 18.2 _ Odoo 19

byCeline George

PPTX

TAMIS & TEMS - HOW, WHY and THE STEPS IN PROCTOLOGY

byJohn Thanakumar

PDF

DHA/HAAD/MOH/DOH OPTOMETRY MCQ PYQ. .pdf

PPTX

How to Create Lead_Opportunity From Odoo 18 Website

byCeline George

PDF

Cultivating Greatness Pune's Best Preschools and Schools.pdf

byWellington College

PDF

IMANI Africa files RTI request seeking full disclosure on 2026 SIM registrati...

PPTX

REVISED DEFENSE MECHANISM / ADJUSTMENT MECHANISM-FINAL

PPTX

4 G8_Q3_L4 (Evaluating opinion editorials for textual evidence and quality).pptx

byNorafeSahagun

PPTX

Steve Hale - Developing Elite Young Goalkeepers 2024.pptx

PDF

M.Sc. Nonchordates Complete Syllabus PPT | All Important Topics Covered

byKNIPSS SULTANPUR

PDF

Blood Group Incompatibility: Rh Factor and Erythroblastosis Fetalis

bySudhandraKarthi

PPTX

PARENTAL ROUTES OF DRUGS ADMINISTRATION .pptx

byAneetaSharma15

PDF

Risk management in Moroccan public hospitals_ a literature review

Most Imp Chapters & Weightage for Boards 2025.pdf

Multiple Myeloma , definition, etiology, PP, CM , DE and management

Expected Revenue Report In Odoo 18 CRM

byCeline George

Pig- piggy bank in Big Data Analytics.ppt.pptx

Overview of how to Create a Model in Odoo 18

byCeline George

Scalable-MADDPG-Based Cooperative Target Invasion for a Multi-USV System.pdf

How to Manage Package Reservation in Odoo 18 Inventory

byCeline George

Access partner report in Odoo 18.2 _ Odoo 19

byCeline George

TAMIS & TEMS - HOW, WHY and THE STEPS IN PROCTOLOGY

byJohn Thanakumar

DHA/HAAD/MOH/DOH OPTOMETRY MCQ PYQ. .pdf

How to Create Lead_Opportunity From Odoo 18 Website

byCeline George

Cultivating Greatness Pune's Best Preschools and Schools.pdf

byWellington College

IMANI Africa files RTI request seeking full disclosure on 2026 SIM registrati...

REVISED DEFENSE MECHANISM / ADJUSTMENT MECHANISM-FINAL

4 G8_Q3_L4 (Evaluating opinion editorials for textual evidence and quality).pptx

byNorafeSahagun

Steve Hale - Developing Elite Young Goalkeepers 2024.pptx

M.Sc. Nonchordates Complete Syllabus PPT | All Important Topics Covered

byKNIPSS SULTANPUR

Blood Group Incompatibility: Rh Factor and Erythroblastosis Fetalis

bySudhandraKarthi

PARENTAL ROUTES OF DRUGS ADMINISTRATION .pptx

byAneetaSharma15

Risk management in Moroccan public hospitals_ a literature review

In this document

Powered by AI

Slide 1Introduction to Machine Learning and Reinforcement Learning

Overview of lecture series; Introduction of reinforcement learning and recap of previous concepts, including the state and action value functions.

Slide 2Recap on Policy and Value Functions

Definition of policy as a mapping from states to actions based on probabilities.

Slide 3Today's Agenda for Reinforcement Learning

Outline of the main topics to be covered including Bellman equations, optimal policy, Q-learning.

Slide 4Estimating Future Rewards

Introduction to estimating rewards based on states using state-value and action-value functions.

Slide 5Understanding the Bellman Equation

Introduction and manipulation of the Bellman equation relevant for policy evaluation.

Slide 6Q-value Bellman Equation

Discussion of the Q-value Bellman equation as part of evaluating action-value functions.

Slide 7Calculating Value Functions

Methods for calculating value functions including solving linear equations and iterative methods.

Slide 8Example: Gridworld

Example environment (Gridworld) to illustrate concepts of rewards in reinforcement learning.

Slide 9Searching for the Optimal Policy

Introduction to the approach of identifying the optimal policy in reinforcement learning.

Slide 10Defining the Optimal Policy

Criteria for optimal policies based on their expected returns compared to other policies.

Slide 11Learning Optimal Policies

Exploration of techniques to discover optimal policies within reinforcement learning.

Slide 12Q-learning Basics

Introduction to Q-learning, emphasizing learning through experience without needing a model.

Slide 13Advantages of Q-learning

Benefits of Q-learning, including adaptability and applicability without environment models.

Slide 14Dynamic Programming Overview

Brief discussion on dynamic programming and its need for an accurate environment model.

Slide 15Temporal Difference Learning

Introduction to Temporal Difference learning, emphasizing its incremental updates without models.

Slide 16Q-learning Explained

Description of Q-backups in Q-learning; the learned action-value function approximate Q*.

Slide 17Q-learning Pseudo Code

Presentation of pseudo code for implementing Q-learning in reinforcement learning frameworks.

Slide 18Q-learning in Action: Maze World

Demonstration of Q-learning application in a 15x15 maze world with defined rewards.

Slide 19Initial Q-learning Policy

Display of the initial policy used in Q-learning implementation for the maze.

Slide 20Q-learning Progress After Episodes

Results of Q-learning after 20 episodes showing changes in agent's performance.

Slide 21Further Progress: 30 Episodes

Observations and results from the Q-learning application after 30 episodes.

Slide 22Performance After 100 Episodes

Analysis and output of Q-learning's performance after 100 training episodes.

Slide 23Advancements After 150 Episodes

Performance evaluation of the Q-learning agent following 150 episodes of learning.

Slide 24Post 200 Episodes Analysis

Assessment of Q-learning outcomes after 200 episodes, gauging improvements in policy.

Slide 25Results After 250 Episodes

Performance metrics from Q-learning after 250 episodes showcasing learned strategies.

Slide 26Q-learning After 300 Episodes

Analysis of the agent's performance and learned behavior after training for 300 episodes.

Slide 27Progress After 350 Episodes

Ongoing performance assessment of the Q-learning agent after 350 episodes.

Slide 28Results After 400 Episodes

Final performance insights from Q-learning after 400 episodes of training.

Slide 29Exploration vs. Exploitation

Discussion on the balance of exploration techniques in learning optimal policies and values.

Slide 30Next Class Preview

Preview of the next lecture focusing on reinforcement learning with Learning Classifier Systems.

Slide 31Conclusion of Lecture 22

Findings and reinforcement of concepts discussed in the course on reinforcement learning.