3. Reinforcement Learning
This concept came from Supervised learning. In
Supervised learning, we know about output but In
Reinforcement learning output is predicted from past
output.
Behavior Based
Critic Information: It doesn’t tell what is going to be in
future, It tells w.r.t past what is current state.
E.g.
S1 0.3 S2 0.7
5. Q Learning
Episodes/Trials: Sequence of action from start to terminal state.
Policy: (Behaviour map) State action Pair.
Represents as Pi.
Finite Horizon
Infinite Horizon
6. Q Learning
Finite Horizon:
Episode/Agent tries to maximize reward.
Infinite Horizon:
Tries to maximize reward but it has no specified time limit.
(Infinite)