SlideShare a Scribd company logo
1 of 11
Download to read offline
GROUP 3
CSA3003
GROUP MEMBERS
 20BAI10254 ARCHIT SRIVASTAVA
 20BAI10321 KARTHIK BISHT
 20BAI10154 HARSH SEN
 20BAI10129 DIVYANSHU CHETAN
INTRODUCTION
 Reinforcement learning is an important sub-category of machine learning.
 Reinforcement learning (RL) is a machine learning method that does not
require the raw data to be labeled, as is required typically with machine
learning. Reinforcement learning helps determine if an algorithm is producing
a correct right answer or a reward indicating it was a good decision.
 RL is based on interactions between an AI system and its environment. An
algorithm receives a numerical score based on its outcome and then the
positive behaviors are “reinforced” to refine the algorithm over time. In recent
years, RL has been behind super-human performance on GO, Atari games and
many other applications.
WHAT IS NAÏVE REINFORCE ALGORITHM AND
HOW DOES IT WORK
 REINFORCE is a part of the exclusive category of Policy Gradient algorithms
used in Reinforcement Learning.
 Making a Policy—a model that receives a state as input and outputs the
probability of executing an action—would be a straightforward way to execute
this approach.
 A policy is simply a manual or cheat sheet that instructs the agent on what to
do in each state.
 The policy is then improved upon iteratively, with minor changes made at each
stage, until we have a policy that addresses the environment.
 The policy is usually a Neural Network that takes the state as input and
generates a probability distribution across action space as output whose
objective is to maximize the “Expected reward”.
 Each policy determines the likelihood that a particular action will be taken at
each station in the environment.
 The agent samples from these probabilities and selects an action to perform in
the environment. At the end of an episode, we know the total rewards the
agent can get if it follows that policy. We backpropagate the reward through
the path the agent took to estimate the “Expected reward” at each state for a
given policy.
 The expected reward is given as the sum of the probability of an action in state
s multiplied by the discounted reward.
 Here the discounted reward is the sum of all the rewards the agent receives in
that future discounted by a factor Gamma.
 As per the original implementation of the REINFORCE algorithm, the
Expected reward is the sum of products of a log of probabilities and discounted
rewards.
 Using the policy gradient theorem, we can devise a naive algorithm that uses
gradient ascent to update our policy parameters.
 The theorem gives a sum over all states and operations, but we only use the
sample gradient when updating the parameters because we simply cannot
get the gradient of all possible operations and states.
STEPS INVOLVED
The steps involved in the implementation of REINFORCE would be as follows:
 Initialize a Random Policy (a NN that takes the state as input and returns the
probability of actions)
 Use the policy to play N steps of the game — record action probabilities-from
policy, reward-from environment, action — sampled by agent
 Calculate the discounted reward for each step by backpropagation
 Calculate expected reward G
 Adjust weights of Policy (back-propagate error in NN) to increase G
 Repeat from 2
Naïve REINFORCE Characteristics
 Naïve REINFORCE is a gradient policy algorithm. Policy-Gradient methods are
a subclass of Policy-Based methods that estimate an optimal policy’s weights
through gradient ascent.
 This algorithm is the fundamental policy gradient algorithm on which nearly
all the advanced policy gradient algorithms are based.
 REINFORCE is a family of reinforcement learning methods which REINFORCE
is a family of reinforcement learning methods which directly update the policy
weights.
 Policy gradient algorithms attempt to determine the best policy by learning an
estimate of the action values rather than computing the action values as with
Q-value approaches.
 Unlike Q-Learning, these methods return a probability distribution over the
actions rather than an action vector.
 REINFORCE algorithm fined an unbiased estimate of the gradient, but without
the assistance of a learned value function. REINFORCE learns much more
slowly than RL methods using value functions.

More Related Content

Similar to rlpptgroup3-231018180804-0c05fb2f789piutt

Similar to rlpptgroup3-231018180804-0c05fb2f789piutt (20)

Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
Andrii Prysiazhnyk: Why the amazon sellers are buiyng the RTX 3080: Dynamic p...
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement Learning - DQN
Reinforcement Learning - DQNReinforcement Learning - DQN
Reinforcement Learning - DQN
 
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
ゆるふわ強化学習入門
ゆるふわ強化学習入門ゆるふわ強化学習入門
ゆるふわ強化学習入門
 
Proximal Policy Optimization
Proximal Policy OptimizationProximal Policy Optimization
Proximal Policy Optimization
 
Online learning & adaptive game playing
Online learning & adaptive game playingOnline learning & adaptive game playing
Online learning & adaptive game playing
 
Policy gradient
Policy gradientPolicy gradient
Policy gradient
 
Reinforcement learning-ebook-part1
Reinforcement learning-ebook-part1Reinforcement learning-ebook-part1
Reinforcement learning-ebook-part1
 
Reinforcement Learning / E-Book / Part 1
Reinforcement Learning / E-Book / Part 1Reinforcement Learning / E-Book / Part 1
Reinforcement Learning / E-Book / Part 1
 
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
1118_Seminar_Continuous_Deep Q-Learning with Model based acceleration
 
Intro to Deep Reinforcement Learning
Intro to Deep Reinforcement LearningIntro to Deep Reinforcement Learning
Intro to Deep Reinforcement Learning
 
Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)Reinforcement learning:policy gradient (part 1)
Reinforcement learning:policy gradient (part 1)
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learning
 
reinforcement learning in artificial intelligence
reinforcement learning in artificial intelligencereinforcement learning in artificial intelligence
reinforcement learning in artificial intelligence
 

Recently uploaded

哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
qaffana
 
young call girls in Sainik Farm 🔝 9953056974 🔝 Delhi escort Service
young call girls in Sainik Farm 🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Sainik Farm 🔝 9953056974 🔝 Delhi escort Service
young call girls in Sainik Farm 🔝 9953056974 🔝 Delhi escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
amitlee9823
 
CHEAP Call Girls in Ashok Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Ashok Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Ashok Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Ashok Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
🔝 9953056974🔝 Delhi Call Girls in Ajmeri Gate
🔝 9953056974🔝 Delhi Call Girls in Ajmeri Gate🔝 9953056974🔝 Delhi Call Girls in Ajmeri Gate
🔝 9953056974🔝 Delhi Call Girls in Ajmeri Gate
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Kothanur Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
Kothanur Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...Kothanur Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
Kothanur Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
amitlee9823
 
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
anilsa9823
 

Recently uploaded (20)

哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
哪里办理美国宾夕法尼亚州立大学毕业证(本硕)psu成绩单原版一模一样
 
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
VVIP Pune Call Girls Warje (7001035870) Pune Escorts Nearby with Complete Sat...
 
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
Kalyan callg Girls, { 07738631006 } || Call Girl In Kalyan Women Seeking Men ...
 
Book Sex Workers Available Pune Call Girls Yerwada 6297143586 Call Hot India...
Book Sex Workers Available Pune Call Girls Yerwada  6297143586 Call Hot India...Book Sex Workers Available Pune Call Girls Yerwada  6297143586 Call Hot India...
Book Sex Workers Available Pune Call Girls Yerwada 6297143586 Call Hot India...
 
young call girls in Sainik Farm 🔝 9953056974 🔝 Delhi escort Service
young call girls in Sainik Farm 🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Sainik Farm 🔝 9953056974 🔝 Delhi escort Service
young call girls in Sainik Farm 🔝 9953056974 🔝 Delhi escort Service
 
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai Mumbai ...
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai  Mumbai ...High Profile Call Girls In Andheri 7738631006 Call girls in mumbai  Mumbai ...
High Profile Call Girls In Andheri 7738631006 Call girls in mumbai Mumbai ...
 
Top Rated Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Chakan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
 
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Katraj ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
 
Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
Call Girls Banashankari Just Call 👗 7737669865 👗 Top Class Call Girl Service ...
 
NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...
NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...
NO1 Verified Amil Baba In Karachi Kala Jadu In Karachi Amil baba In Karachi A...
 
CHEAP Call Girls in Ashok Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Ashok Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Ashok Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Ashok Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
VVIP Pune Call Girls Balaji Nagar (7001035870) Pune Escorts Nearby with Compl...
 
🔝 9953056974🔝 Delhi Call Girls in Ajmeri Gate
🔝 9953056974🔝 Delhi Call Girls in Ajmeri Gate🔝 9953056974🔝 Delhi Call Girls in Ajmeri Gate
🔝 9953056974🔝 Delhi Call Girls in Ajmeri Gate
 
Kothanur Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
Kothanur Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...Kothanur Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
Kothanur Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
 
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai GapedCall Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
Call Girls Dubai Slut Wife O525547819 Call Girls Dubai Gaped
 
VVIP Pune Call Girls Kalyani Nagar (7001035870) Pune Escorts Nearby with Comp...
VVIP Pune Call Girls Kalyani Nagar (7001035870) Pune Escorts Nearby with Comp...VVIP Pune Call Girls Kalyani Nagar (7001035870) Pune Escorts Nearby with Comp...
VVIP Pune Call Girls Kalyani Nagar (7001035870) Pune Escorts Nearby with Comp...
 
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
Lucknow 💋 Call Girls Adil Nagar | ₹,9500 Pay Cash 8923113531 Free Home Delive...
 
Develop Keyboard Skill.pptx er power point
Develop Keyboard Skill.pptx er power pointDevelop Keyboard Skill.pptx er power point
Develop Keyboard Skill.pptx er power point
 
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Chikhali Call Me 7737669865 Budget Friendly No Advance Booking
 
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service NashikLow Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
Low Rate Call Girls Nashik Vedika 7001305949 Independent Escort Service Nashik
 

rlpptgroup3-231018180804-0c05fb2f789piutt

  • 2. GROUP MEMBERS  20BAI10254 ARCHIT SRIVASTAVA  20BAI10321 KARTHIK BISHT  20BAI10154 HARSH SEN  20BAI10129 DIVYANSHU CHETAN
  • 3. INTRODUCTION  Reinforcement learning is an important sub-category of machine learning.  Reinforcement learning (RL) is a machine learning method that does not require the raw data to be labeled, as is required typically with machine learning. Reinforcement learning helps determine if an algorithm is producing a correct right answer or a reward indicating it was a good decision.  RL is based on interactions between an AI system and its environment. An algorithm receives a numerical score based on its outcome and then the positive behaviors are “reinforced” to refine the algorithm over time. In recent years, RL has been behind super-human performance on GO, Atari games and many other applications.
  • 4. WHAT IS NAÏVE REINFORCE ALGORITHM AND HOW DOES IT WORK  REINFORCE is a part of the exclusive category of Policy Gradient algorithms used in Reinforcement Learning.  Making a Policy—a model that receives a state as input and outputs the probability of executing an action—would be a straightforward way to execute this approach.  A policy is simply a manual or cheat sheet that instructs the agent on what to do in each state.  The policy is then improved upon iteratively, with minor changes made at each stage, until we have a policy that addresses the environment.
  • 5.  The policy is usually a Neural Network that takes the state as input and generates a probability distribution across action space as output whose objective is to maximize the “Expected reward”.  Each policy determines the likelihood that a particular action will be taken at each station in the environment.
  • 6.  The agent samples from these probabilities and selects an action to perform in the environment. At the end of an episode, we know the total rewards the agent can get if it follows that policy. We backpropagate the reward through the path the agent took to estimate the “Expected reward” at each state for a given policy.  The expected reward is given as the sum of the probability of an action in state s multiplied by the discounted reward.  Here the discounted reward is the sum of all the rewards the agent receives in that future discounted by a factor Gamma.  As per the original implementation of the REINFORCE algorithm, the Expected reward is the sum of products of a log of probabilities and discounted rewards.
  • 7.  Using the policy gradient theorem, we can devise a naive algorithm that uses gradient ascent to update our policy parameters.  The theorem gives a sum over all states and operations, but we only use the sample gradient when updating the parameters because we simply cannot get the gradient of all possible operations and states.
  • 8. STEPS INVOLVED The steps involved in the implementation of REINFORCE would be as follows:  Initialize a Random Policy (a NN that takes the state as input and returns the probability of actions)  Use the policy to play N steps of the game — record action probabilities-from policy, reward-from environment, action — sampled by agent  Calculate the discounted reward for each step by backpropagation  Calculate expected reward G  Adjust weights of Policy (back-propagate error in NN) to increase G  Repeat from 2
  • 9.
  • 10. Naïve REINFORCE Characteristics  Naïve REINFORCE is a gradient policy algorithm. Policy-Gradient methods are a subclass of Policy-Based methods that estimate an optimal policy’s weights through gradient ascent.  This algorithm is the fundamental policy gradient algorithm on which nearly all the advanced policy gradient algorithms are based.  REINFORCE is a family of reinforcement learning methods which REINFORCE is a family of reinforcement learning methods which directly update the policy weights.
  • 11.  Policy gradient algorithms attempt to determine the best policy by learning an estimate of the action values rather than computing the action values as with Q-value approaches.  Unlike Q-Learning, these methods return a probability distribution over the actions rather than an action vector.  REINFORCE algorithm fined an unbiased estimate of the gradient, but without the assistance of a learned value function. REINFORCE learns much more slowly than RL methods using value functions.