Reinforcement Learning
Overview
Introduction to Reinforcement
Learning
Chapter 1 – Reinforcement Learning: An Introduction
Imitation Learning Lecture Slides from CMU Deep
Reinforcement Learning Course
What is Reinforcement Learning?
Exploration versus Exploitation
Reinforcement Learning Systems
Policy
Reward Signal
Value Function (1)
Value Function (2)
Model-free versus Model-based
On-policy versus Off-policy
Credit Assignment Problem
Reward Design
What is Deep Reinforcement Learning?
Finite Markov Decision Processes
Chapter 3 – Reinforcement Learning: An Introduction
Markov Decision Process (MDP)
Time Discounting
Agent-Environment Interaction (1)
Agent-Environment Interaction (2)
Action Selection
MDP Dynamics
State Transition Probabilities
Expected Rewards
State-Value Function (1)
State-Value Function (2)
Action-Value Function
Bellman Equation (1)
Bellman Equation (2)
Optimality
Temporal-Difference Learning
Chapter 6 – Reinforcement Learning: An Introduction
Playing Atari with Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
David Silver’s Tutorial on Deep Reinforcement Learning
What is TD learning?
Value-based Reinforcement Learning
Update Rule for TD(0)
Update Rule Intuition
Tabular TD(0) Algorithm
SARSA – On-policy TD Control
SARSA Update Rule
SARSA Algorithm
Q-learning – Off-policy TD Control
One-step Q-learning Algorithm
Epsilon-greedy Policy
Deep Q-Networks (DQN)
Q-Networks
Experience Replay
State representation
Q-Network Training
Loss Function Gradient Derivation
DQN Algorithm
Comments
Policy Gradient Methods
Chapter 13 – Reinforcement Learning: An Introduction
Policy Gradient Lecture Slides from David Silver’s
Reinforcement Learning Course
David Silver’s Tutorial on Deep Reinforcement Learning
What are Policy Gradient Methods?
Policy-based Reinforcement Learning
Notation
Policy Approximation
Types of Policy Gradient Method
Finite Difference Policy Gradient
REINFORCE: Monte Carlo Policy Gradient
REINFORCE Properties
REINFORCE Algorithm
Actor-Critic Methods
One-step Actor-Critic Update Rules
One-step Actor-Critic Algorithm
Asynchronous Reinforcement
Learning
Asynchronous Methods for Deep Reinforcement Learning
What is Asynchronous Reinforcement Learning?
Parallelism (1)
Parallelism (2)
No Experience Replay
Asynchronous Algorithms
Asynchronous one-step Q-learning
Exploration
Asynchronous one-step Q-learning Algorithm
Asynchronous one-step SARSA
n-step Q-learning
n-step Returns
Asynchronous n-step Q-learning Algorithm
A3C
Advantage Definition
A3C Algorithm
Summary

Reinforcement Learning and deep reinforcement learning