Reinforcement learning Markov decisions process mdp ppt

1
(Module-2)
Markov Decision Process (MDP)

2
Markov Decision Process (MDP)
• An MDP is a mathematical framework for modeling
sequential decision-making problems under uncertainty.
• It assumes that the environment has the Markov
property, meaning the future depends only on the
current state and action, not on the past history.
• MDPs are powerful tools for modeling sequential
decision-making problems in various domains, including
robotics, game playing, resource allocation, and more.

3
Key Equations:
Note: The goal in an MDP is to find an optimal policy (π*) that maximizes the expected
cumulative discounted reward over time.

4
Bellman Equation in MDP
The Bellman equation is a cornerstone of Markov Decision
Processes (MDPs), providing a powerful tool for understanding
and optimizing sequential decision-making under uncertainty.
The standard Bellman equation expresses the value of
a state (V(s)) as the expected reward of taking the best
action from that state, considering both the
immediate reward and the discounted value of the
next state reached:
V(s) = max_a [ R(s, a) + γ * Σ P(s', s, a) *
V(s') ]

5
V(s'): Expected value of the next state s'
The Bellman equation uses future value (V(s')) to estimate
the value of the current state (V(s)), considering the
immediate reward (R(s, a)) by taking action a in state s.

6
The optimal Bellman equation emphasizes finding the
optimal value function (V*): the function that assigns the
highest possible expected total reward to each state under
the optimal policy.
optimOptimal Bellman Equationation
V*(s) = max_a [ R(s, a) + γ * Σ P(s', s, a) * V*(s’) ]
This equation essentially says that the optimal value of a state
is equal to the immediate reward you get by taking the best
possible action in that state, plus the discounted value of the
next state you reach under the optimal policy.

7
Q-value Function:Mathematical Notation

8
Mathematical Notation:
• max_a' Q(s', a'): The maximum expected future reward achievable
from the next state s’, considering all possible actions a'.

14
Cauchy Sequences
• Cauchy Sequence: A sequence of elements in a metric space
where the elements become arbitrarily close to each other as
the sequence progresses.

22
BANACH'S FIXED POINT THEOREM

23
Banach's Fixed Point Theorem

24

25
• Banach fixed point theorem states that if you
have a contraction mapping T acting on a
complete metric space, then there is guaranteed
to be a unique point x in that space such that T(x)
= x. This point x is called the fixed point of the
function T.
• The theorem also suggests a way to find this

26

27
• Banach's fixed point theorem plays a crucial role
in proving the convergence of some important
algorithms in reinforcement learning.
• Dynamic programming algorithms, like value
iteration and policy iteration, use Bellman
Equation to iteratively improve the agent's policy.

Reinforcement learning Markov decisions process mdp ppt

More Related Content

Similar to Reinforcement learning Markov decisions process mdp ppt

Recently uploaded

Reinforcement learning Markov decisions process mdp ppt