SlideShare a Scribd company logo
1 of 13
Reinforcement Learning
BY:
DR. SUKHNANDAN KAUR
ASSISTANT PROFESSOR ,
CSED, TIET
Reinforcement Learning
How can an agent learn behaviors when it doesn’t have a
supervisor to tell it how to perform?
This problem is called reinforcement learning.
Rewards:
Positive / Negative
Reinforcement Learning (cont.)
The goal is to get the agent to act in the world so as
to maximize its rewards.
The agent has to figure out what it did that made it
get the reward/punishment
Components of Reinforcement Learning
1. Agent
2. Environment
3.Rewards
4. State
5. Action(s)
Q Learning
AI can directly drive an optimal policy from its environment
without needing to create a model beforehand.
Q learning is model free learning technique that can be
used to find the optimal action selection policy using Q
function.
Q function gives the largest expected return achievable by
any policy π for each possible state action pair.
Reinforcement Learning
In reinforcement learning we want to obtain function Q(S, A) that predicts
best action A in a state S to get maximum cumulative reward.
cumulative reward 1 = Q(s1, a1) + Q(s2, a1) = -1 + 0 = -1
cumulative reward 2 = Q(s1, a2) + Q(s2, a2) = 1 + 0.5 = 1.5
cumulative reward 3 = Q(s1, a2) + Q(s2, a1) = 1 + 0 = 1
cumulative reward 4 = Q(s1, a1) + Q(s2, a2) = -1 + 0.5 = -0.5
maximum cumulative reward = max(cumulative reward 1, cumulative reward 2, cumulative
reward 3, cumulative reward 4) = 1.5
a1 a2
s1 -1 1
s2 0 0.5
How Q learning Works?
Initialize Q
Choose Action from Q
Calculate Reward
Take action
Update Q
Example
USER
Go to house
Retrieve
Flower
State =1
Q(State = 1, Action = “Retrieve Flower”) = 0.5
Q(State = 1, Action = “Go to house”) = 3.0
Accessible or
observable state
Repeat:
 s  sensed state
 If s is terminal then exit
 a  choose action (given s)
 Perform a
Reactive Agent Algorithm
Repeat:
 s  sensed state
 If s is terminal then exit
 a  P(s) /* Choose action using policy
 Perform a
Reactive Agent Algorithm using
Reinforcement Learning
Approaches
 Learn policy directly– function mapping from states to actions
Q(S, A)
Where, Q = {s1,s2,s3,s4} and A = {a1,a2,a3}
 Learn utility values for states (i.e., the value function)
If the outcome of performing an action at a state is deterministic, then the
agent can update the utility value U() of states:
◦ U(new state) = reward + U(old state)
Exploration / Exploitation policy
Wacky approach (exploration): act randomly in
hopes of eventually exploring entire environment
Greedy approach (exploitation): act to maximize
utility using current estimate
Reasonable balance: act more wacky (exploratory)
when agent has little idea of environment; more
greedy when the model is close to correct path.
Summary
Active area of research.
Reinforcement learning is applicable to game-playing, robot controllers,
others

More Related Content

Similar to Q_Learning.ppt

Lecture notes
Lecture notesLecture notes
Lecture notes
butest
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
VaishnavGhadge1
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
ManiMaran230751
 

Similar to Q_Learning.ppt (20)

YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
 
Lecture notes
Lecture notesLecture notes
Lecture notes
 
CS799_FinalReport
CS799_FinalReportCS799_FinalReport
CS799_FinalReport
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
semi supervised Learning and Reinforcement learning (1).pptx
 semi supervised Learning and Reinforcement learning (1).pptx semi supervised Learning and Reinforcement learning (1).pptx
semi supervised Learning and Reinforcement learning (1).pptx
 
Aaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement Learning
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx14_ReinforcementLearning.pptx
14_ReinforcementLearning.pptx
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning using OpenAI Gym
Reinforcement Learning using OpenAI GymReinforcement Learning using OpenAI Gym
Reinforcement Learning using OpenAI Gym
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
Intro rl
Intro rlIntro rl
Intro rl
 
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptxRL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
RL_Dr.SNR Final ppt for Presentation 28.05.2021.pptx
 
NUS-ISS Learning Day 2019-Introduction to reinforcement learning
NUS-ISS Learning Day 2019-Introduction to reinforcement learningNUS-ISS Learning Day 2019-Introduction to reinforcement learning
NUS-ISS Learning Day 2019-Introduction to reinforcement learning
 

Recently uploaded

Obat Aborsi Depok 0851\7696\3835 Jual Obat Cytotec Di Depok
Obat Aborsi Depok 0851\7696\3835 Jual Obat Cytotec Di DepokObat Aborsi Depok 0851\7696\3835 Jual Obat Cytotec Di Depok
Obat Aborsi Depok 0851\7696\3835 Jual Obat Cytotec Di Depok
Obat Aborsi Jakarta Wa 085176963835 Apotek Jual Obat Cytotec Di Jakarta
 
如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证
ogawka
 
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
daisycvs
 
Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312
LR1709MUSIC
 
Obat Aborsi Surabaya 0851\7696\3835 Jual Obat Cytotec Di Surabaya
Obat Aborsi Surabaya 0851\7696\3835 Jual Obat Cytotec Di SurabayaObat Aborsi Surabaya 0851\7696\3835 Jual Obat Cytotec Di Surabaya
Obat Aborsi Surabaya 0851\7696\3835 Jual Obat Cytotec Di Surabaya
Obat Aborsi Jakarta Wa 085176963835 Apotek Jual Obat Cytotec Di Jakarta
 
Obat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di Malang
Obat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di MalangObat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di Malang
Obat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di Malang
Obat Aborsi Jakarta Wa 085176963835 Apotek Jual Obat Cytotec Di Jakarta
 
NewBase 17 May 2024 Energy News issue - 1725 by Khaled Al Awadi_compresse...
NewBase   17 May  2024  Energy News issue - 1725 by Khaled Al Awadi_compresse...NewBase   17 May  2024  Energy News issue - 1725 by Khaled Al Awadi_compresse...
NewBase 17 May 2024 Energy News issue - 1725 by Khaled Al Awadi_compresse...
Khaled Al Awadi
 
obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...
obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...
obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...
yulianti213969
 
Presentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelledPresentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelled
CaitlinCummins3
 

Recently uploaded (20)

Obat Aborsi Depok 0851\7696\3835 Jual Obat Cytotec Di Depok
Obat Aborsi Depok 0851\7696\3835 Jual Obat Cytotec Di DepokObat Aborsi Depok 0851\7696\3835 Jual Obat Cytotec Di Depok
Obat Aborsi Depok 0851\7696\3835 Jual Obat Cytotec Di Depok
 
MichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdfMichaelStarkes_UncutGemsProjectSummary.pdf
MichaelStarkes_UncutGemsProjectSummary.pdf
 
Understanding Financial Accounting 3rd Canadian Edition by Christopher D. Bur...
Understanding Financial Accounting 3rd Canadian Edition by Christopher D. Bur...Understanding Financial Accounting 3rd Canadian Edition by Christopher D. Bur...
Understanding Financial Accounting 3rd Canadian Edition by Christopher D. Bur...
 
SCI9-Q4-MOD8.1.pdfjttstwjwetw55k5wwtwrjw
SCI9-Q4-MOD8.1.pdfjttstwjwetw55k5wwtwrjwSCI9-Q4-MOD8.1.pdfjttstwjwetw55k5wwtwrjw
SCI9-Q4-MOD8.1.pdfjttstwjwetw55k5wwtwrjw
 
Moradia Isolada com Logradouro; Detached house with patio in Penacova
Moradia Isolada com Logradouro; Detached house with patio in PenacovaMoradia Isolada com Logradouro; Detached house with patio in Penacova
Moradia Isolada com Logradouro; Detached house with patio in Penacova
 
如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(SUT毕业证书)斯威本科技大学毕业证成绩单本科硕士学位证留信学历认证
 
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
Abortion pills in Muscut<Oman(+27737758557) Cytotec available.inn Kuwait City.
 
Space Tech Expo Exhibitor List 2024 - Exhibitors Data
Space Tech Expo Exhibitor List 2024 - Exhibitors DataSpace Tech Expo Exhibitor List 2024 - Exhibitors Data
Space Tech Expo Exhibitor List 2024 - Exhibitors Data
 
Top^Clinic ^%[+27785538335__Safe*Women's clinic//Abortion Pills In Harare
Top^Clinic ^%[+27785538335__Safe*Women's clinic//Abortion Pills In HarareTop^Clinic ^%[+27785538335__Safe*Women's clinic//Abortion Pills In Harare
Top^Clinic ^%[+27785538335__Safe*Women's clinic//Abortion Pills In Harare
 
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deck
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deckPitch Deck Teardown: Goodcarbon's $5.5m Seed deck
Pitch Deck Teardown: Goodcarbon's $5.5m Seed deck
 
Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312Shots fired Budget Presentation.pdf12312
Shots fired Budget Presentation.pdf12312
 
Obat Aborsi Surabaya 0851\7696\3835 Jual Obat Cytotec Di Surabaya
Obat Aborsi Surabaya 0851\7696\3835 Jual Obat Cytotec Di SurabayaObat Aborsi Surabaya 0851\7696\3835 Jual Obat Cytotec Di Surabaya
Obat Aborsi Surabaya 0851\7696\3835 Jual Obat Cytotec Di Surabaya
 
Obat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di Malang
Obat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di MalangObat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di Malang
Obat Aborsi Malang 0851\7696\3835 Jual Obat Cytotec Di Malang
 
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
Pay after result spell caster (,$+27834335081)@ bring back lost lover same da...
 
Mastering The Art Of 'Closing The Sale'.
Mastering The Art Of 'Closing The Sale'.Mastering The Art Of 'Closing The Sale'.
Mastering The Art Of 'Closing The Sale'.
 
Innomantra Viewpoint - Building Moonshots : May-Jun 2024.pdf
Innomantra Viewpoint - Building Moonshots : May-Jun 2024.pdfInnomantra Viewpoint - Building Moonshots : May-Jun 2024.pdf
Innomantra Viewpoint - Building Moonshots : May-Jun 2024.pdf
 
NewBase 17 May 2024 Energy News issue - 1725 by Khaled Al Awadi_compresse...
NewBase   17 May  2024  Energy News issue - 1725 by Khaled Al Awadi_compresse...NewBase   17 May  2024  Energy News issue - 1725 by Khaled Al Awadi_compresse...
NewBase 17 May 2024 Energy News issue - 1725 by Khaled Al Awadi_compresse...
 
obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...
obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...
obat aborsi jakarta wa 081336238223 jual obat aborsi cytotec asli di jakarta9...
 
Presentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelledPresentation4 (2) survey responses clearly labelled
Presentation4 (2) survey responses clearly labelled
 
Toyota Kata Coaching for Agile Teams & Transformations
Toyota Kata Coaching for Agile Teams & TransformationsToyota Kata Coaching for Agile Teams & Transformations
Toyota Kata Coaching for Agile Teams & Transformations
 

Q_Learning.ppt

  • 1. Reinforcement Learning BY: DR. SUKHNANDAN KAUR ASSISTANT PROFESSOR , CSED, TIET
  • 2. Reinforcement Learning How can an agent learn behaviors when it doesn’t have a supervisor to tell it how to perform? This problem is called reinforcement learning. Rewards: Positive / Negative
  • 3. Reinforcement Learning (cont.) The goal is to get the agent to act in the world so as to maximize its rewards. The agent has to figure out what it did that made it get the reward/punishment
  • 4. Components of Reinforcement Learning 1. Agent 2. Environment 3.Rewards 4. State 5. Action(s)
  • 5. Q Learning AI can directly drive an optimal policy from its environment without needing to create a model beforehand. Q learning is model free learning technique that can be used to find the optimal action selection policy using Q function. Q function gives the largest expected return achievable by any policy π for each possible state action pair.
  • 6. Reinforcement Learning In reinforcement learning we want to obtain function Q(S, A) that predicts best action A in a state S to get maximum cumulative reward. cumulative reward 1 = Q(s1, a1) + Q(s2, a1) = -1 + 0 = -1 cumulative reward 2 = Q(s1, a2) + Q(s2, a2) = 1 + 0.5 = 1.5 cumulative reward 3 = Q(s1, a2) + Q(s2, a1) = 1 + 0 = 1 cumulative reward 4 = Q(s1, a1) + Q(s2, a2) = -1 + 0.5 = -0.5 maximum cumulative reward = max(cumulative reward 1, cumulative reward 2, cumulative reward 3, cumulative reward 4) = 1.5 a1 a2 s1 -1 1 s2 0 0.5
  • 7. How Q learning Works? Initialize Q Choose Action from Q Calculate Reward Take action Update Q
  • 8. Example USER Go to house Retrieve Flower State =1 Q(State = 1, Action = “Retrieve Flower”) = 0.5 Q(State = 1, Action = “Go to house”) = 3.0
  • 9. Accessible or observable state Repeat:  s  sensed state  If s is terminal then exit  a  choose action (given s)  Perform a Reactive Agent Algorithm
  • 10. Repeat:  s  sensed state  If s is terminal then exit  a  P(s) /* Choose action using policy  Perform a Reactive Agent Algorithm using Reinforcement Learning
  • 11. Approaches  Learn policy directly– function mapping from states to actions Q(S, A) Where, Q = {s1,s2,s3,s4} and A = {a1,a2,a3}  Learn utility values for states (i.e., the value function) If the outcome of performing an action at a state is deterministic, then the agent can update the utility value U() of states: ◦ U(new state) = reward + U(old state)
  • 12. Exploration / Exploitation policy Wacky approach (exploration): act randomly in hopes of eventually exploring entire environment Greedy approach (exploitation): act to maximize utility using current estimate Reasonable balance: act more wacky (exploratory) when agent has little idea of environment; more greedy when the model is close to correct path.
  • 13. Summary Active area of research. Reinforcement learning is applicable to game-playing, robot controllers, others