SlideShare a Scribd company logo
Reinforcement Learning
BY:
DR. SUKHNANDAN KAUR
ASSISTANT PROFESSOR ,
CSED, TIET
Reinforcement Learning
How can an agent learn behaviors when it doesn’t have a
supervisor to tell it how to perform?
This problem is called reinforcement learning.
Rewards:
Positive / Negative
Reinforcement Learning (cont.)
The goal is to get the agent to act in the world so as
to maximize its rewards.
The agent has to figure out what it did that made it
get the reward/punishment
Components of Reinforcement Learning
1. Agent
2. Environment
3.Rewards
4. State
5. Action(s)
Q Learning
AI can directly drive an optimal policy from its environment
without needing to create a model beforehand.
Q learning is model free learning technique that can be
used to find the optimal action selection policy using Q
function.
Q function gives the largest expected return achievable by
any policy π for each possible state action pair.
Reinforcement Learning
In reinforcement learning we want to obtain function Q(S, A) that predicts
best action A in a state S to get maximum cumulative reward.
cumulative reward 1 = Q(s1, a1) + Q(s2, a1) = -1 + 0 = -1
cumulative reward 2 = Q(s1, a2) + Q(s2, a2) = 1 + 0.5 = 1.5
cumulative reward 3 = Q(s1, a2) + Q(s2, a1) = 1 + 0 = 1
cumulative reward 4 = Q(s1, a1) + Q(s2, a2) = -1 + 0.5 = -0.5
maximum cumulative reward = max(cumulative reward 1, cumulative reward 2, cumulative
reward 3, cumulative reward 4) = 1.5
a1 a2
s1 -1 1
s2 0 0.5
How Q learning Works?
Initialize Q
Choose Action from Q
Calculate Reward
Take action
Update Q
Example
USER
Go to house
Retrieve
Flower
State =1
Q(State = 1, Action = “Retrieve Flower”) = 0.5
Q(State = 1, Action = “Go to house”) = 3.0
Accessible or
observable state
Repeat:
 s  sensed state
 If s is terminal then exit
 a  choose action (given s)
 Perform a
Reactive Agent Algorithm
Repeat:
 s  sensed state
 If s is terminal then exit
 a  P(s) /* Choose action using policy
 Perform a
Reactive Agent Algorithm using
Reinforcement Learning
Approaches
 Learn policy directly– function mapping from states to actions
Q(S, A)
Where, Q = {s1,s2,s3,s4} and A = {a1,a2,a3}
 Learn utility values for states (i.e., the value function)
If the outcome of performing an action at a state is deterministic, then the
agent can update the utility value U() of states:
◦ U(new state) = reward + U(old state)
Exploration / Exploitation policy
Wacky approach (exploration): act randomly in
hopes of eventually exploring entire environment
Greedy approach (exploitation): act to maximize
utility using current estimate
Reasonable balance: act more wacky (exploratory)
when agent has little idea of environment; more
greedy when the model is close to correct path.
Summary
Active area of research.
Reinforcement learning is applicable to game-playing, robot controllers,
others

More Related Content

Similar to Q_Learning.ppt

YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
Shoaib Iqbal
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
ssuser43a599
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
SVijaylakshmi
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
azzeddine chenine
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
POOJASHREEC1
 
Lecture notes
Lecture notesLecture notes
Lecture notes
butest
 
CS799_FinalReport
CS799_FinalReportCS799_FinalReport
CS799_FinalReport
Abhanshu Gupta
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
Kuppusamy P
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
VaishnavGhadge1
 
semi supervised Learning and Reinforcement learning (1).pptx
 semi supervised Learning and Reinforcement learning (1).pptx semi supervised Learning and Reinforcement learning (1).pptx
semi supervised Learning and Reinforcement learning (1).pptx
Dr.Shweta
 
Aaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement Learning
AminaRepo
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
MLconf
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
RithikRaj25
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
Ding Li
 
Reinforcement Learning using OpenAI Gym
Reinforcement Learning using OpenAI GymReinforcement Learning using OpenAI Gym
Reinforcement Learning using OpenAI Gym
Muhammad Aleem Siddiqui
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
Salem-Kabbani
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
ManiMaran230751
 
Intro rl
Intro rlIntro rl
Intro rl
Ronald Teo
 
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxRL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
deeplearning6
 
NUS-ISS Learning Day 2019-Introduction to reinforcement learning
NUS-ISS Learning Day 2019-Introduction to reinforcement learningNUS-ISS Learning Day 2019-Introduction to reinforcement learning
NUS-ISS Learning Day 2019-Introduction to reinforcement learning
NUS-ISS
 

Similar to Q_Learning.ppt (20)

YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
 
Lecture notes
Lecture notesLecture notes
Lecture notes
 
CS799_FinalReport
CS799_FinalReportCS799_FinalReport
CS799_FinalReport
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
semi supervised Learning and Reinforcement learning (1).pptx
 semi supervised Learning and Reinforcement learning (1).pptx semi supervised Learning and Reinforcement learning (1).pptx
semi supervised Learning and Reinforcement learning (1).pptx
 
Aaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement Learning
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning using OpenAI Gym
Reinforcement Learning using OpenAI GymReinforcement Learning using OpenAI Gym
Reinforcement Learning using OpenAI Gym
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
Intro rl
Intro rlIntro rl
Intro rl
 
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxRL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
 
NUS-ISS Learning Day 2019-Introduction to reinforcement learning
NUS-ISS Learning Day 2019-Introduction to reinforcement learningNUS-ISS Learning Day 2019-Introduction to reinforcement learning
NUS-ISS Learning Day 2019-Introduction to reinforcement learning
 

Recently uploaded

Innovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & InnovationInnovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & Innovation
Operational Excellence Consulting
 
Income Tax exemption for Start up : Section 80 IAC
Income Tax  exemption for Start up : Section 80 IACIncome Tax  exemption for Start up : Section 80 IAC
Income Tax exemption for Start up : Section 80 IAC
CA Dr. Prithvi Ranjan Parhi
 
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
➒➌➎➏➑➐➋➑➐➐Dpboss Matka Guessing Satta Matka Kalyan Chart Indian Matka
 
TIMES BPO: Business Plan For Startup Industry
TIMES BPO: Business Plan For Startup IndustryTIMES BPO: Business Plan For Startup Industry
TIMES BPO: Business Plan For Startup Industry
timesbpobusiness
 
Pitch Deck Teardown: Kinnect's $250k Angel deck
Pitch Deck Teardown: Kinnect's $250k Angel deckPitch Deck Teardown: Kinnect's $250k Angel deck
Pitch Deck Teardown: Kinnect's $250k Angel deck
HajeJanKamps
 
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdfRegistered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
dazzjoker
 
Cover Story - China's Investment Leader - Dr. Alyce SU
Cover Story - China's Investment Leader - Dr. Alyce SUCover Story - China's Investment Leader - Dr. Alyce SU
Cover Story - China's Investment Leader - Dr. Alyce SU
msthrill
 
Innovative Uses of Revit in Urban Planning and Design
Innovative Uses of Revit in Urban Planning and DesignInnovative Uses of Revit in Urban Planning and Design
Innovative Uses of Revit in Urban Planning and Design
Chandresh Chudasama
 
Call8328958814 satta matka Kalyan result satta guessing
Call8328958814 satta matka Kalyan result satta guessingCall8328958814 satta matka Kalyan result satta guessing
Call8328958814 satta matka Kalyan result satta guessing
➑➌➋➑➒➎➑➑➊➍
 
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
IPLTech Electric
 
Chapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .pptChapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .ppt
ssuser567e2d
 
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
APCO
 
Part 2 Deep Dive: Navigating the 2024 Slowdown
Part 2 Deep Dive: Navigating the 2024 SlowdownPart 2 Deep Dive: Navigating the 2024 Slowdown
Part 2 Deep Dive: Navigating the 2024 Slowdown
jeffkluth1
 
How HR Search Helps in Company Success.pdf
How HR Search Helps in Company Success.pdfHow HR Search Helps in Company Success.pdf
How HR Search Helps in Company Success.pdf
HumanResourceDimensi1
 
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
my Pandit
 
Ellen Burstyn: From Detroit Dreamer to Hollywood Legend | CIO Women Magazine
Ellen Burstyn: From Detroit Dreamer to Hollywood Legend | CIO Women MagazineEllen Burstyn: From Detroit Dreamer to Hollywood Legend | CIO Women Magazine
Ellen Burstyn: From Detroit Dreamer to Hollywood Legend | CIO Women Magazine
CIOWomenMagazine
 
Garments ERP Software in Bangladesh _ Pridesys IT Ltd.pdf
Garments ERP Software in Bangladesh _ Pridesys IT Ltd.pdfGarments ERP Software in Bangladesh _ Pridesys IT Ltd.pdf
Garments ERP Software in Bangladesh _ Pridesys IT Ltd.pdf
Pridesys IT Ltd.
 
Profiles of Iconic Fashion Personalities.pdf
Profiles of Iconic Fashion Personalities.pdfProfiles of Iconic Fashion Personalities.pdf
Profiles of Iconic Fashion Personalities.pdf
TTop Threads
 
GKohler - Retail Scavenger Hunt Presentation
GKohler - Retail Scavenger Hunt PresentationGKohler - Retail Scavenger Hunt Presentation
GKohler - Retail Scavenger Hunt Presentation
GraceKohler1
 
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
Lacey Max
 

Recently uploaded (20)

Innovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & InnovationInnovation Management Frameworks: Your Guide to Creativity & Innovation
Innovation Management Frameworks: Your Guide to Creativity & Innovation
 
Income Tax exemption for Start up : Section 80 IAC
Income Tax  exemption for Start up : Section 80 IACIncome Tax  exemption for Start up : Section 80 IAC
Income Tax exemption for Start up : Section 80 IAC
 
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
Dpboss Matka Guessing Satta Matta Matka Kalyan panel Chart Indian Matka Dpbos...
 
TIMES BPO: Business Plan For Startup Industry
TIMES BPO: Business Plan For Startup IndustryTIMES BPO: Business Plan For Startup Industry
TIMES BPO: Business Plan For Startup Industry
 
Pitch Deck Teardown: Kinnect's $250k Angel deck
Pitch Deck Teardown: Kinnect's $250k Angel deckPitch Deck Teardown: Kinnect's $250k Angel deck
Pitch Deck Teardown: Kinnect's $250k Angel deck
 
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdfRegistered-Establishment-List-in-Uttarakhand-pdf.pdf
Registered-Establishment-List-in-Uttarakhand-pdf.pdf
 
Cover Story - China's Investment Leader - Dr. Alyce SU
Cover Story - China's Investment Leader - Dr. Alyce SUCover Story - China's Investment Leader - Dr. Alyce SU
Cover Story - China's Investment Leader - Dr. Alyce SU
 
Innovative Uses of Revit in Urban Planning and Design
Innovative Uses of Revit in Urban Planning and DesignInnovative Uses of Revit in Urban Planning and Design
Innovative Uses of Revit in Urban Planning and Design
 
Call8328958814 satta matka Kalyan result satta guessing
Call8328958814 satta matka Kalyan result satta guessingCall8328958814 satta matka Kalyan result satta guessing
Call8328958814 satta matka Kalyan result satta guessing
 
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
Sustainable Logistics for Cost Reduction_ IPLTech Electric's Eco-Friendly Tra...
 
Chapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .pptChapter 7 Final business management sciences .ppt
Chapter 7 Final business management sciences .ppt
 
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
 
Part 2 Deep Dive: Navigating the 2024 Slowdown
Part 2 Deep Dive: Navigating the 2024 SlowdownPart 2 Deep Dive: Navigating the 2024 Slowdown
Part 2 Deep Dive: Navigating the 2024 Slowdown
 
How HR Search Helps in Company Success.pdf
How HR Search Helps in Company Success.pdfHow HR Search Helps in Company Success.pdf
How HR Search Helps in Company Success.pdf
 
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
 
Ellen Burstyn: From Detroit Dreamer to Hollywood Legend | CIO Women Magazine
Ellen Burstyn: From Detroit Dreamer to Hollywood Legend | CIO Women MagazineEllen Burstyn: From Detroit Dreamer to Hollywood Legend | CIO Women Magazine
Ellen Burstyn: From Detroit Dreamer to Hollywood Legend | CIO Women Magazine
 
Garments ERP Software in Bangladesh _ Pridesys IT Ltd.pdf
Garments ERP Software in Bangladesh _ Pridesys IT Ltd.pdfGarments ERP Software in Bangladesh _ Pridesys IT Ltd.pdf
Garments ERP Software in Bangladesh _ Pridesys IT Ltd.pdf
 
Profiles of Iconic Fashion Personalities.pdf
Profiles of Iconic Fashion Personalities.pdfProfiles of Iconic Fashion Personalities.pdf
Profiles of Iconic Fashion Personalities.pdf
 
GKohler - Retail Scavenger Hunt Presentation
GKohler - Retail Scavenger Hunt PresentationGKohler - Retail Scavenger Hunt Presentation
GKohler - Retail Scavenger Hunt Presentation
 
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
How are Lilac French Bulldogs Beauty Charming the World and Capturing Hearts....
 

Q_Learning.ppt

  • 1. Reinforcement Learning BY: DR. SUKHNANDAN KAUR ASSISTANT PROFESSOR , CSED, TIET
  • 2. Reinforcement Learning How can an agent learn behaviors when it doesn’t have a supervisor to tell it how to perform? This problem is called reinforcement learning. Rewards: Positive / Negative
  • 3. Reinforcement Learning (cont.) The goal is to get the agent to act in the world so as to maximize its rewards. The agent has to figure out what it did that made it get the reward/punishment
  • 4. Components of Reinforcement Learning 1. Agent 2. Environment 3.Rewards 4. State 5. Action(s)
  • 5. Q Learning AI can directly drive an optimal policy from its environment without needing to create a model beforehand. Q learning is model free learning technique that can be used to find the optimal action selection policy using Q function. Q function gives the largest expected return achievable by any policy π for each possible state action pair.
  • 6. Reinforcement Learning In reinforcement learning we want to obtain function Q(S, A) that predicts best action A in a state S to get maximum cumulative reward. cumulative reward 1 = Q(s1, a1) + Q(s2, a1) = -1 + 0 = -1 cumulative reward 2 = Q(s1, a2) + Q(s2, a2) = 1 + 0.5 = 1.5 cumulative reward 3 = Q(s1, a2) + Q(s2, a1) = 1 + 0 = 1 cumulative reward 4 = Q(s1, a1) + Q(s2, a2) = -1 + 0.5 = -0.5 maximum cumulative reward = max(cumulative reward 1, cumulative reward 2, cumulative reward 3, cumulative reward 4) = 1.5 a1 a2 s1 -1 1 s2 0 0.5
  • 7. How Q learning Works? Initialize Q Choose Action from Q Calculate Reward Take action Update Q
  • 8. Example USER Go to house Retrieve Flower State =1 Q(State = 1, Action = “Retrieve Flower”) = 0.5 Q(State = 1, Action = “Go to house”) = 3.0
  • 9. Accessible or observable state Repeat:  s  sensed state  If s is terminal then exit  a  choose action (given s)  Perform a Reactive Agent Algorithm
  • 10. Repeat:  s  sensed state  If s is terminal then exit  a  P(s) /* Choose action using policy  Perform a Reactive Agent Algorithm using Reinforcement Learning
  • 11. Approaches  Learn policy directly– function mapping from states to actions Q(S, A) Where, Q = {s1,s2,s3,s4} and A = {a1,a2,a3}  Learn utility values for states (i.e., the value function) If the outcome of performing an action at a state is deterministic, then the agent can update the utility value U() of states: ◦ U(new state) = reward + U(old state)
  • 12. Exploration / Exploitation policy Wacky approach (exploration): act randomly in hopes of eventually exploring entire environment Greedy approach (exploitation): act to maximize utility using current estimate Reasonable balance: act more wacky (exploratory) when agent has little idea of environment; more greedy when the model is close to correct path.
  • 13. Summary Active area of research. Reinforcement learning is applicable to game-playing, robot controllers, others