How do AI games work? How does Deep Blue, the supercomputer, beat chess champion Gary Kasparov. The answer is reinforcement learning.
This presentation develops your basic understanding of Reinforcement Learning, a key sub-discipline of Artificial Intelligence, with a number of examples from AI gaming and other exciting fields.
2. Spotle.ai Study Material
Spotle.ai/Learn
Let’s play chess!
I just don’t make any possible move
without thinking what my opponent’s
move can be to counter my move.
I try to consider all possible moves that
are safe. And then choose the one that I
feel is the best move among all.
Machines can learn this way. And this
learning is called reinforcement machine
learning.
3. Spotle.ai Study Material
Spotle.ai/Learn
What is reinforcement learning?
First, a particular situation in which the learning will be applicable.
You start at a point, you go through several steps to reach a level.
In the process you earn a reward point for every correct step and you lose a reward point
for every wrong step.
Finally, you choose the path with the highest reward point in that particular situation.
Agent Environment
State
Reward
Action
4. Spotle.ai Study Material
Spotle.ai/Learn
Terminologies
Agent: The learner and the decision maker.
Environment: Where the agent learns and decides what actions to perform.
Action: A set of actions which the agent can perform.
State: The state of the agent in the environment.
Reward: For each action selected by the agent the environment provides a reward.
Usually a scalar value.
Agent Environment
State
Reward
Action
5. In supervised learning the training data has the output, that is, the answer in it. Here
the model is trained with the correct answer. But in case of reinforcement learning,
there is no answer given. The reinforcement agent decides the action to perform based
on the maximum reward it receives. There is no training data in reinforcement
learning. The machine learns from its experience.
Supervised learning? No
Spotle.ai Study Material
Spotle.ai/Learn
Training
data
Not available
8. Spotle.ai Study Material
Spotle.ai/Learn
Pavlov Experiment
TRIAL 2
In the second trial Pavlov
does not give meat to his
dog but rings a bell.
Without seeing the meat
the dog does not start
salivating.
10. Spotle.ai Study Material
Spotle.ai/Learn
Pavlov Experiment
TRIAL 4
In trial 4 Pavlov rings the
bell and at this his dog
starts salivating, hoping
that meat will follow the
ringing of the bell. This is
learning by reinforcement.
The dog was rewarded
with meat after the
ringing of the bell.
11. Summarizing
❖ The input is an initial stage from which the machine starts learning.
❖ There are more than one possible output in a particular problem.
❖ Each output state is given a reward or punishment.
❖ The output with maximum reward is selected to be performed.
❖ The reinforcement learning process is continuous.