SlideShare a Scribd company logo
1 of 19
Reinforcement
Learning
Reinforcement learning
• Reinforcement learning, in a simplistic definition, is learning best
actions based on reward or punishment.
• There are three basic concepts in reinforcement learning:
 State
 Action
 and reward
• In this picture this lady wants to train
her dog.
• Then she order to her dog to perform certain
action and for every proper execution she
would give an orrange as a reward to dog.
• The dog will remember that if I do a certain
action then I would get an orrange .
STATE:
• The state describes the current situation. For a robot that is learning
to walk, the state is the position of its two legs.
ACTION:
• Action is what an agent can do in each state.
• Given the state, or positions of its two legs, a robot can take steps
within a certain distance.
REWARD:
• When a robot takes an action in a state, it receives a reward.
• Here the term “reward” is an abstract concept that describes
feedback from the environment.
• When the reward is positive, it is corresponding to our normal
meaning of reward.
• When the reward is negative, it is corresponding to what we usually
call “punishment."
RL Cont...
• A robot learns to go through a maze.
• When the robot takes one step to the right, it reaches an open
location, if it is going right for three steps, the robot hits a wall.
• The robot that is running through the maze remembers every wall it
hits.
• In the end, it remembers the previous actions that lead to dead ends.
• It also remembers the path (that is, a sequence of actions) that leads
it successfully through the maze.
RL Cont...
• The essential goal of reinforcement learning is learning a sequence
of actions that lead to a long-term reward.
• An agent learns that sequence by interacting with the environment
and observing the rewards in every state.
Q-learning: A commonly used reinforcement
learning method
• Q-learning is the most commonly used reinforcement learning
method, where Q stands for the long-term value of an action.
• Q-learning is about learning Q-values through observations.
• The procedure for Q-learning is:
• Q(state, action) = (1-learning_rate)Q(state, action) +
learning_rate(r+ discount_rate *max_a(Q(state’, action)))
• In the beginning, the agent initializes Q-values to 0 for every state-
action pair. More precisely, Q(state, action) = 0 for all states s and
actions a.
• After the agent starts learning, it takes an action a in state s and
receives reward r.
RL Cont..
• It also observes that the state has changed to a
new state s’. The agent will update Q(state, action) with above
formula.
• The learning rate is a number between 0 and 1.It is a weight given
to the new information versus the old information.
• The new long-term reward is the current reward, r, plus all
future rewards in the next state, s’, and later states, assuming this
agent always takes its best actions in the future.
RL Cont..
• The future rewards are discounted by a discount rate between 0
and 1, meaning future rewards are not as valuable as the reward now.
• As the agent visits all the states and tries different actions,
it eventually learns the optimal Q-values for all possible state-
action pairs. Then it can derive the action in every state that is
optimal for the long term.
Maze robot example :
RL Cont..
• The robot starts from the lower left corner of the maze.
• Each location (state) is indicated by a number.
• There are four action choices (left, right, up, down), but in certain states,
action choices are limited.
• For example, in state 1 (initial state), the robot has only two
action choices: up or right. I
• In state 4, it has three action choices: left, right, or up.
• When the robot hits a wall, it receives reward -1.
• When it reaches an open location, it receives reward 0.
• When it reaches the exit, it receives reward 100.
RL Cont..
• Q(state, action) = (1-learning_rate)Q(state, action)
+ learning_rate (r+ discount_rate x max_a (Q(state’, action)))
• Where the learning rate is 0.2 and discount rate is 0.9
• Q(4, left) = 0.8 x 0+ 0.2 (0+0.9 Q(1,right))
• Q(4, right) = 0.8 x 0+ 0.2 (0+0.9 Q(5,up))
• Thus Q(5,up) has a higher value than Q(1,right)
• For this reason, Q(4,right) has a higher value than Q(4, left).
• Thus, the best action in state 4 is going right.
Advantages of Reinforcement Learning
• It can solve higher-order and complex problems. Also, the solutions
obtained will be very accurate.
• The reason for its perfection is that it is very similar to the human
learning technique.
• Due to it’s learning ability, it can be used with neural networks. This
can be termed as deep reinforcement learning.
• The best part is that even when there is no training data, it will
learn through the experience it has from processing the training data.
Disadvantages of Reinforcement Learning
• This consumes time and lots of computational power.
THANK YOU

More Related Content

What's hot

Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learningKien Le
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning Melaku Eneayehu
 
Reinforcement learning slides
Reinforcement learning slidesReinforcement learning slides
Reinforcement learning slidesOmranHakami
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Searchmatele41
 
Kalman filter - Applications in Image processing
Kalman filter - Applications in Image processingKalman filter - Applications in Image processing
Kalman filter - Applications in Image processingRavi Teja
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIMikko Mäkipää
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoSeongwon Hwang
 
Activation functions
Activation functionsActivation functions
Activation functionsPRATEEK SAHU
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningJungyeol
 
Practical Digital Image Processing 1
Practical Digital Image Processing 1Practical Digital Image Processing 1
Practical Digital Image Processing 1Aly Abdelkareem
 
Markov decision process
Markov decision processMarkov decision process
Markov decision processHamed Abdi
 
Edge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsEdge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsDebmalya Biswas
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaEdureka!
 

What's hot (20)

Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning Q-Learning
Reinforcement Learning   Q-Learning Reinforcement Learning   Q-Learning
Reinforcement Learning Q-Learning
 
Reinforcement learning slides
Reinforcement learning slidesReinforcement learning slides
Reinforcement learning slides
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Solving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) SearchSolving problems by searching Informed (heuristics) Search
Solving problems by searching Informed (heuristics) Search
 
Kalman filter - Applications in Image processing
Kalman filter - Applications in Image processingKalman filter - Applications in Image processing
Kalman filter - Applications in Image processing
 
Intro to Reinforcement learning - part III
Intro to Reinforcement learning - part IIIIntro to Reinforcement learning - part III
Intro to Reinforcement learning - part III
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Activation functions
Activation functionsActivation functions
Activation functions
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Practical Digital Image Processing 1
Practical Digital Image Processing 1Practical Digital Image Processing 1
Practical Digital Image Processing 1
 
Markov decision process
Markov decision processMarkov decision process
Markov decision process
 
Edge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare ApplicationsEdge AI Framework for Healthcare Applications
Edge AI Framework for Healthcare Applications
 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
Multi Layer Network
 
Reinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | EdurekaReinforcement Learning Tutorial | Edureka
Reinforcement Learning Tutorial | Edureka
 
Deep learning
Deep learningDeep learning
Deep learning
 

Similar to Reinforcement learning.pptx

Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]Shakeeb Ahmad Mohammad Mukhtar
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement LearningUtkarsh Garg
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptxManiMaran230751
 
semi supervised Learning and Reinforcement learning (1).pptx
 semi supervised Learning and Reinforcement learning (1).pptx semi supervised Learning and Reinforcement learning (1).pptx
semi supervised Learning and Reinforcement learning (1).pptxDr.Shweta
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginnersgokulprasath06
 
Lecture notes
Lecture notesLecture notes
Lecture notesbutest
 
Machine Learning - Reinforcement Learning
Machine Learning - Reinforcement LearningMachine Learning - Reinforcement Learning
Machine Learning - Reinforcement LearningJY Chun
 
chapterThree.pptx
chapterThree.pptxchapterThree.pptx
chapterThree.pptxchalachew5
 
Reinforcement learning in Machine learning
 Reinforcement learning in Machine learning Reinforcement learning in Machine learning
Reinforcement learning in Machine learningMegha Sharma
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningPrabhu Kumar
 

Similar to Reinforcement learning.pptx (20)

Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
Q-Learning Algorithm: A Concise Introduction [Shakeeb A.]
 
Intro to Reinforcement Learning
Intro to Reinforcement LearningIntro to Reinforcement Learning
Intro to Reinforcement Learning
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
semi supervised Learning and Reinforcement learning (1).pptx
 semi supervised Learning and Reinforcement learning (1).pptx semi supervised Learning and Reinforcement learning (1).pptx
semi supervised Learning and Reinforcement learning (1).pptx
 
RL.ppt
RL.pptRL.ppt
RL.ppt
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
Fundamentals of RL.pptx
Fundamentals of RL.pptxFundamentals of RL.pptx
Fundamentals of RL.pptx
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginners
 
Deep einforcement learning
Deep einforcement learningDeep einforcement learning
Deep einforcement learning
 
Lecture notes
Lecture notesLecture notes
Lecture notes
 
Machine Learning - Reinforcement Learning
Machine Learning - Reinforcement LearningMachine Learning - Reinforcement Learning
Machine Learning - Reinforcement Learning
 
Finalver
FinalverFinalver
Finalver
 
(ppt
(ppt(ppt
(ppt
 
Deep RL.pdf
Deep RL.pdfDeep RL.pdf
Deep RL.pdf
 
Hill climbing algorithm
Hill climbing algorithmHill climbing algorithm
Hill climbing algorithm
 
chapterThree.pptx
chapterThree.pptxchapterThree.pptx
chapterThree.pptx
 
Reinforcement learning in Machine learning
 Reinforcement learning in Machine learning Reinforcement learning in Machine learning
Reinforcement learning in Machine learning
 
An efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game LearningAn efficient use of temporal difference technique in Computer Game Learning
An efficient use of temporal difference technique in Computer Game Learning
 
Q_Learning.ppt
Q_Learning.pptQ_Learning.ppt
Q_Learning.ppt
 
Making Complex Decisions(Artificial Intelligence)
Making Complex Decisions(Artificial Intelligence)Making Complex Decisions(Artificial Intelligence)
Making Complex Decisions(Artificial Intelligence)
 

Recently uploaded

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 

Reinforcement learning.pptx

  • 2. Reinforcement learning • Reinforcement learning, in a simplistic definition, is learning best actions based on reward or punishment. • There are three basic concepts in reinforcement learning:  State  Action  and reward
  • 3. • In this picture this lady wants to train her dog. • Then she order to her dog to perform certain action and for every proper execution she would give an orrange as a reward to dog. • The dog will remember that if I do a certain action then I would get an orrange .
  • 4.
  • 5. STATE: • The state describes the current situation. For a robot that is learning to walk, the state is the position of its two legs. ACTION: • Action is what an agent can do in each state. • Given the state, or positions of its two legs, a robot can take steps within a certain distance. REWARD: • When a robot takes an action in a state, it receives a reward. • Here the term “reward” is an abstract concept that describes feedback from the environment.
  • 6. • When the reward is positive, it is corresponding to our normal meaning of reward. • When the reward is negative, it is corresponding to what we usually call “punishment."
  • 7. RL Cont... • A robot learns to go through a maze. • When the robot takes one step to the right, it reaches an open location, if it is going right for three steps, the robot hits a wall. • The robot that is running through the maze remembers every wall it hits. • In the end, it remembers the previous actions that lead to dead ends. • It also remembers the path (that is, a sequence of actions) that leads it successfully through the maze.
  • 8. RL Cont... • The essential goal of reinforcement learning is learning a sequence of actions that lead to a long-term reward. • An agent learns that sequence by interacting with the environment and observing the rewards in every state.
  • 9. Q-learning: A commonly used reinforcement learning method • Q-learning is the most commonly used reinforcement learning method, where Q stands for the long-term value of an action. • Q-learning is about learning Q-values through observations. • The procedure for Q-learning is: • Q(state, action) = (1-learning_rate)Q(state, action) + learning_rate(r+ discount_rate *max_a(Q(state’, action))) • In the beginning, the agent initializes Q-values to 0 for every state- action pair. More precisely, Q(state, action) = 0 for all states s and actions a. • After the agent starts learning, it takes an action a in state s and receives reward r.
  • 10. RL Cont.. • It also observes that the state has changed to a new state s’. The agent will update Q(state, action) with above formula. • The learning rate is a number between 0 and 1.It is a weight given to the new information versus the old information. • The new long-term reward is the current reward, r, plus all future rewards in the next state, s’, and later states, assuming this agent always takes its best actions in the future.
  • 11. RL Cont.. • The future rewards are discounted by a discount rate between 0 and 1, meaning future rewards are not as valuable as the reward now. • As the agent visits all the states and tries different actions, it eventually learns the optimal Q-values for all possible state- action pairs. Then it can derive the action in every state that is optimal for the long term.
  • 13. RL Cont.. • The robot starts from the lower left corner of the maze. • Each location (state) is indicated by a number. • There are four action choices (left, right, up, down), but in certain states, action choices are limited. • For example, in state 1 (initial state), the robot has only two action choices: up or right. I • In state 4, it has three action choices: left, right, or up. • When the robot hits a wall, it receives reward -1. • When it reaches an open location, it receives reward 0. • When it reaches the exit, it receives reward 100.
  • 14. RL Cont.. • Q(state, action) = (1-learning_rate)Q(state, action) + learning_rate (r+ discount_rate x max_a (Q(state’, action))) • Where the learning rate is 0.2 and discount rate is 0.9 • Q(4, left) = 0.8 x 0+ 0.2 (0+0.9 Q(1,right)) • Q(4, right) = 0.8 x 0+ 0.2 (0+0.9 Q(5,up)) • Thus Q(5,up) has a higher value than Q(1,right) • For this reason, Q(4,right) has a higher value than Q(4, left). • Thus, the best action in state 4 is going right.
  • 15.
  • 16. Advantages of Reinforcement Learning • It can solve higher-order and complex problems. Also, the solutions obtained will be very accurate. • The reason for its perfection is that it is very similar to the human learning technique. • Due to it’s learning ability, it can be used with neural networks. This can be termed as deep reinforcement learning. • The best part is that even when there is no training data, it will learn through the experience it has from processing the training data.
  • 17. Disadvantages of Reinforcement Learning • This consumes time and lots of computational power.
  • 18.