lecture_21.pptx - PowerPoint Presentation

LECTURE 21: REINFORCEMENT LEARNINGObjectives:Definitions and TerminologyAgent-Environment InteractionMarkov Decision ProcessesValue FunctionsBellman Equations

Resources:RSAB/TO: The RL ProblemAM: RL TutorialRKVM: RL SIMTM: Intro to RLGT: Active SpeechAudio:URL:

AttributionsThese slides were originally developed by R.S. Sutton and A.G. Barto, Reinforcement Learning: An Introduction. (They have been reformatted and slightly annotated to better integrate them into this course.)

The original slides have been incorporated into many machine learning courses, including Tim Oates’ Introduction of Machine Learning, which contains links to several good lectures on various topics in machine learning (and is where I first found these slides).

A slightly more advanced version of the same material is available as part of Andrew Moore’s excellent set of statistical data mining tutorials.

The objectives of this lecture are:

present idealized form of the RL problem for which we have precise theoretical results;

introduce key components of the mathematics: value functions and Bellman equations;

describe trade-offs between applicability and mathematical tractability.rrr. . .. . .t +1t +2sst +3sst +1t +2t +3aaaatt +1t +2tt +3The Agent-Environment Interface

The Agent Learns A PolicyDefinition of a policy:

Policy at state t, t, is a mapping from states to action probabilities.

t (s,a)= probability that at = a when st = s.

Reinforcement learning methods specify how the agent changes its policy as a result of experience.

Roughly, the agent’s goal is to get as much reward as it can over the long run.

Learning can occur in several ways:

Adaptation of classifier parameters based on prior and current data (e.g., many help systems now ask you “was this answer helpful to you”).

Selection of the most appropriate next training pattern during classifier training (e.g., active learning).

Common algorithm design issues include rate of convergence, bias vs. variance, adaptation speed, and batch vs. incremental adaptation. Getting the Degree of Abstraction RightTime steps need not refer to fixed intervals of real time (e.g., each new training pattern can be considered a time step).Actions can be low level (e.g., voltages to motors), or high level (e.g., accept a job offer), “mental” (e.g., shift in focus of attention), etc. Actions can be rule-based (e.g., user expresses a preference) or mathematics-based (e.g., assignment of a class or update of a probability).States can be low-level “sensations”, or they can be abstract, symbolic, based on memory, or subjective (e.g., the state of being “surprised” or “lost”). States can be hidden or observable.A reinforcement learning (RL) agent is not like a whole animal or robot, which consist of many RL agents as well as other components.The environment is not necessarily unknown to the agent, only incompletely controllable.Reward computation is in the agent’s environment because the agent cannot change it arbitrarily.

lecture_21.pptx - PowerPoint Presentation

More Related Content

What's hot

Viewers also liked

Similar to lecture_21.pptx - PowerPoint Presentation

More from butest

lecture_21.pptx - PowerPoint Presentation

Editor's Notes