SlideShare a Scribd company logo

An efficient use of temporal difference technique in Computer Game Learning

Prabhu Kumar
Prabhu Kumar
Prabhu KumarData scientist intern at Cyrrup

A computer game using temporal difference algorithm of Machine learning which improves the ability of the computer to learn and also explore the best next move for the game by greedy movement techniques and exploration method techniques for the future states of the game.

An efficient use of temporal difference technique in Computer Game Learning

1 of 18
Download to read offline
An Efficient use of temporal difference technique in computer game
learning
Indian institute of technology
( Indian school of mines ),
Dhanbad.
Project guide:- Presented by:
Dr. Rajendra Pamula Prabhu Kumar
Department of computer 15MT000624
Science and engineering Computer science and engineering
Indian institute of technology Indian institute of technology
( Indian school of mines), ( Indian school of mines),
Dhanbad Dhanbad
Outline
1. Introduction of reinforcement learning
2. Agent-Environment interface
3. Types of reinforcement learning
4. Elements of the reinforcement learning
5. Types of selection of state
6. Algorithms of reinforcement learning
References
Introduction of reinforcement learning
 Reinforcement learning is the part of machine learning, which is a field of computer
science that gives computer to ability to learn without being explicitly programmed.
 Reinforcement learning is a framework for computational learning agents use experience
from their interaction with an environment to improve performance over time.
 In reinforcement learning task, the agent understands the state of the environment and it
always tries to maximize the long-term return which is based on real value reward.
 It is learning of what to do-how to do mapping situation to action so as maximize total
numerical reward and minimize the penalty.
Introduction of reinforcement learning cont.…
• If there is no explicit teacher to guide the learning agent, the agent must learn the behavior
through trail-and-error interaction with unknown environment.
• The learning agent senses the environment, takes actions on it, and receives numeric reward or
punishment from some reward function.
• When we say agent learn means ”sometimes it modifies the code itself or modifies the database ”,
database implies the experiences, information, event etc.
• It is responsible for making decision.
• The main goal of reinforcement learning is “Buildup a good model such as algorithm which
generate a sequence of decision and lead to the highest long-term reward.”
Agent environment interface
o At each time step t, the reinforcement learning agent
receives some representation of environment’s current state
s(t) € S ,where S is the set of possible state and then choose
some action a(t)€ A(st), where A(st) is set of actions that can
be executed in state s(t).
o The agent receives reward r(t+1) and execute in next state
s(t+1)
o The reward function can be used for specify the wide range
of planning goals, It means the designer can tell the agent
what he has achieve.
o The reward function which must be unalterable by the
agent.
Types of reinforcement learning
There are two types of reinforcement learning
1. Episodic: The interaction with the environment is divided into independent episodes.
“Independent”, means performance in each episode is depends only the action taken on that
episode.
in episodic task, a return is sum of all reward received from the beginning of the episode until ends.
where, T is terminal state i.e. end of episode ends
S0 is the starting state of
R denotes as total return
r(k) denotes as the reward on the kth states
Ad

Recommended

Rl chapter 1 introduction
Rl chapter 1 introductionRl chapter 1 introduction
Rl chapter 1 introductionConnorShorten2
 
Reinforcement learning 7313
Reinforcement learning 7313Reinforcement learning 7313
Reinforcement learning 7313Slideshare
 
A reinforcement learning approach for designing artificial autonomous intelli...
A reinforcement learning approach for designing artificial autonomous intelli...A reinforcement learning approach for designing artificial autonomous intelli...
A reinforcement learning approach for designing artificial autonomous intelli...Université de Liège (ULg)
 
Reinforcement learning
Reinforcement  learningReinforcement  learning
Reinforcement learningSKS
 
Reinforcement Learning Guide For Beginners
Reinforcement Learning Guide For BeginnersReinforcement Learning Guide For Beginners
Reinforcement Learning Guide For Beginnersgokulprasath06
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learningbutest
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-LearningKuppusamy P
 
Aaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAaa ped-24- Reinforcement Learning
Aaa ped-24- Reinforcement LearningAminaRepo
 

More Related Content

What's hot

Reinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsReinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsSeung Jae Lee
 
Introduction: Asynchronous Methods for Deep Reinforcement Learning
Introduction: Asynchronous Methods for  Deep Reinforcement LearningIntroduction: Asynchronous Methods for  Deep Reinforcement Learning
Introduction: Asynchronous Methods for Deep Reinforcement LearningTakashi Nagata
 
Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Chris Ohk
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmSupun Abeysinghe
 
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...Ajay Kumar
 
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Chris Ohk
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Chris Ohk
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsSeung Jae Lee
 
Lecture notes
Lecture notesLecture notes
Lecture notesbutest
 
Financial Trading as a Game: A Deep Reinforcement Learning Approach
Financial Trading as a Game: A Deep Reinforcement Learning ApproachFinancial Trading as a Game: A Deep Reinforcement Learning Approach
Financial Trading as a Game: A Deep Reinforcement Learning Approach謙益 黃
 
Introduction to dynamic programming
Introduction to dynamic programmingIntroduction to dynamic programming
Introduction to dynamic programmingAmisha Narsingani
 
Multiobjective optimization and Genetic algorithms in Scilab
Multiobjective optimization and Genetic algorithms in ScilabMultiobjective optimization and Genetic algorithms in Scilab
Multiobjective optimization and Genetic algorithms in ScilabScilab
 
Linear Programming Problems {Operation Research}
Linear Programming Problems {Operation Research}Linear Programming Problems {Operation Research}
Linear Programming Problems {Operation Research}FellowBuddy.com
 
Multiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimalityMultiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimalityAmogh Mundhekar
 
Multi Objective Optimization
Multi Objective OptimizationMulti Objective Optimization
Multi Objective OptimizationNawroz University
 
Multiobjective presentation
Multiobjective presentationMultiobjective presentation
Multiobjective presentationMohammed Kamil
 
001 lpp introduction
001 lpp introduction001 lpp introduction
001 lpp introductionVictor Seelan
 

What's hot (20)

Reinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsReinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular Methods
 
Introduction: Asynchronous Methods for Deep Reinforcement Learning
Introduction: Asynchronous Methods for  Deep Reinforcement LearningIntroduction: Asynchronous Methods for  Deep Reinforcement Learning
Introduction: Asynchronous Methods for Deep Reinforcement Learning
 
Optimization Using Evolutionary Computing Techniques
Optimization Using Evolutionary Computing Techniques Optimization Using Evolutionary Computing Techniques
Optimization Using Evolutionary Computing Techniques
 
Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017
 
A brief introduction to Searn Algorithm
A brief introduction to Searn AlgorithmA brief introduction to Searn Algorithm
A brief introduction to Searn Algorithm
 
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
ADVANCED OPTIMIZATION TECHNIQUES META-HEURISTIC ALGORITHMS FOR ENGINEERING AP...
 
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
Adversarially Guided Actor-Critic, Y. Flet-Berliac et al, 2021
 
CS799_FinalReport
CS799_FinalReportCS799_FinalReport
CS799_FinalReport
 
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
Continuous Control with Deep Reinforcement Learning, lillicrap et al, 2015
 
Reinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo MethodsReinforcement Learning 5. Monte Carlo Methods
Reinforcement Learning 5. Monte Carlo Methods
 
Metaheuristics
MetaheuristicsMetaheuristics
Metaheuristics
 
Lecture notes
Lecture notesLecture notes
Lecture notes
 
Financial Trading as a Game: A Deep Reinforcement Learning Approach
Financial Trading as a Game: A Deep Reinforcement Learning ApproachFinancial Trading as a Game: A Deep Reinforcement Learning Approach
Financial Trading as a Game: A Deep Reinforcement Learning Approach
 
Introduction to dynamic programming
Introduction to dynamic programmingIntroduction to dynamic programming
Introduction to dynamic programming
 
Multiobjective optimization and Genetic algorithms in Scilab
Multiobjective optimization and Genetic algorithms in ScilabMultiobjective optimization and Genetic algorithms in Scilab
Multiobjective optimization and Genetic algorithms in Scilab
 
Linear Programming Problems {Operation Research}
Linear Programming Problems {Operation Research}Linear Programming Problems {Operation Research}
Linear Programming Problems {Operation Research}
 
Multiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimalityMultiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimality
 
Multi Objective Optimization
Multi Objective OptimizationMulti Objective Optimization
Multi Objective Optimization
 
Multiobjective presentation
Multiobjective presentationMultiobjective presentation
Multiobjective presentation
 
001 lpp introduction
001 lpp introduction001 lpp introduction
001 lpp introduction
 

Similar to An efficient use of temporal difference technique in Computer Game Learning

Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningSVijaylakshmi
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning Chandra Meena
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfVaishnavGhadge1
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learningazzeddine chenine
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDongHyun Kwak
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningElias Hasnat
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNINGpradiprahul
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.pptcharusharma165
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningNAVER Engineering
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.pptssuser43a599
 
Hibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning AgentsHibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning Agentsbutest
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.pptPOOJASHREEC1
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement LearningNatan Katz
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptxManiMaran230751
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratchJie-Han Chen
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learningCairo University
 

Similar to An efficient use of temporal difference technique in Computer Game Learning (20)

Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Reinforcement learning
Reinforcement learning Reinforcement learning
Reinforcement learning
 
reinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdfreinforcement-learning-141009013546-conversion-gate02.pdf
reinforcement-learning-141009013546-conversion-gate02.pdf
 
Head First Reinforcement Learning
Head First Reinforcement LearningHead First Reinforcement Learning
Head First Reinforcement Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
RL.ppt
RL.pptRL.ppt
RL.ppt
 
REINFORCEMENT LEARNING
REINFORCEMENT LEARNINGREINFORCEMENT LEARNING
REINFORCEMENT LEARNING
 
reiniforcement learning.ppt
reiniforcement learning.pptreiniforcement learning.ppt
reiniforcement learning.ppt
 
Introduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement LearningIntroduction of Deep Reinforcement Learning
Introduction of Deep Reinforcement Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
YijueRL.ppt
YijueRL.pptYijueRL.ppt
YijueRL.ppt
 
RL_online _presentation_1.ppt
RL_online _presentation_1.pptRL_online _presentation_1.ppt
RL_online _presentation_1.ppt
 
Hibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning AgentsHibridization of Reinforcement Learning Agents
Hibridization of Reinforcement Learning Agents
 
Reinforcement Learning.ppt
Reinforcement Learning.pptReinforcement Learning.ppt
Reinforcement Learning.ppt
 
Reinfrocement Learning
Reinfrocement LearningReinfrocement Learning
Reinfrocement Learning
 
24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx24.09.2021 Reinforcement Learning Algorithms.pptx
24.09.2021 Reinforcement Learning Algorithms.pptx
 
Introduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement LearningIntroduction to Deep Reinforcement Learning
Introduction to Deep Reinforcement Learning
 
Deep reinforcement learning from scratch
Deep reinforcement learning from scratchDeep reinforcement learning from scratch
Deep reinforcement learning from scratch
 
Deep Reinforcement learning
Deep Reinforcement learningDeep Reinforcement learning
Deep Reinforcement learning
 

Recently uploaded

Importance of magazines in education ppt
Importance of magazines in education pptImportance of magazines in education ppt
Importance of magazines in education pptsafnarafeek2002
 
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFEDNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFEandreiandasan
 
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Adrian Sanabria
 
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...Memory Fabric Forum
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS Chicago
 
Manual Eurotronic Thermostatic Valve Comry Z-Wave
Manual Eurotronic Thermostatic Valve Comry Z-WaveManual Eurotronic Thermostatic Valve Comry Z-Wave
Manual Eurotronic Thermostatic Valve Comry Z-WaveDomotica daVinci
 
Unlocking the Cloud's True Potential: Why Multitenancy Is The Key?
Unlocking the Cloud's True Potential: Why Multitenancy Is The Key?Unlocking the Cloud's True Potential: Why Multitenancy Is The Key?
Unlocking the Cloud's True Potential: Why Multitenancy Is The Key?GleecusTechlabs1
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch TuesdayIvanti
 
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptxEvolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptxKyle Willson
 
5 Things You Shouldn’t Do at Salesforce World Tour Sydney 2024!
5 Things You Shouldn’t Do at Salesforce World Tour Sydney 2024!5 Things You Shouldn’t Do at Salesforce World Tour Sydney 2024!
5 Things You Shouldn’t Do at Salesforce World Tour Sydney 2024!XfilesPro
 
H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxMemory Fabric Forum
 
Bit N Build Poland
Bit N Build PolandBit N Build Poland
Bit N Build PolandGDSC PJATK
 
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre..."Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...shaiyuvasv
 
Navigating the Never Normal Strategies for Portfolio Leaders
Navigating the Never Normal Strategies for Portfolio LeadersNavigating the Never Normal Strategies for Portfolio Leaders
Navigating the Never Normal Strategies for Portfolio LeadersOnePlan Solutions
 
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdfQuinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdfDomotica daVinci
 
Artificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdfArtificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdfIsidro Navarro
 
Bluetooth Low Energy(BLE) and beacons working
Bluetooth Low Energy(BLE) and beacons workingBluetooth Low Energy(BLE) and beacons working
Bluetooth Low Energy(BLE) and beacons workingshrey Ansh
 
Manual sensor Zigbee 3.0 MOES ZSS-X-PIRL-C
Manual  sensor Zigbee 3.0 MOES ZSS-X-PIRL-CManual  sensor Zigbee 3.0 MOES ZSS-X-PIRL-C
Manual sensor Zigbee 3.0 MOES ZSS-X-PIRL-CDomotica daVinci
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdfLLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdfThomas Poetter
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellencePrecisely
 

Recently uploaded (20)

Importance of magazines in education ppt
Importance of magazines in education pptImportance of magazines in education ppt
Importance of magazines in education ppt
 
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFEDNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
 
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
 
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
Q1 Memory Fabric Forum: Advantages of Optical CXL​ for Disaggregated Compute ...
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user group
 
Manual Eurotronic Thermostatic Valve Comry Z-Wave
Manual Eurotronic Thermostatic Valve Comry Z-WaveManual Eurotronic Thermostatic Valve Comry Z-Wave
Manual Eurotronic Thermostatic Valve Comry Z-Wave
 
Unlocking the Cloud's True Potential: Why Multitenancy Is The Key?
Unlocking the Cloud's True Potential: Why Multitenancy Is The Key?Unlocking the Cloud's True Potential: Why Multitenancy Is The Key?
Unlocking the Cloud's True Potential: Why Multitenancy Is The Key?
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch Tuesday
 
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptxEvolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
Evolution of Chatbots: From Custom AI Chatbots and AI Chatbots for Websites.pptx
 
5 Things You Shouldn’t Do at Salesforce World Tour Sydney 2024!
5 Things You Shouldn’t Do at Salesforce World Tour Sydney 2024!5 Things You Shouldn’t Do at Salesforce World Tour Sydney 2024!
5 Things You Shouldn’t Do at Salesforce World Tour Sydney 2024!
 
H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptx
 
Bit N Build Poland
Bit N Build PolandBit N Build Poland
Bit N Build Poland
 
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre..."Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
 
Navigating the Never Normal Strategies for Portfolio Leaders
Navigating the Never Normal Strategies for Portfolio LeadersNavigating the Never Normal Strategies for Portfolio Leaders
Navigating the Never Normal Strategies for Portfolio Leaders
 
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdfQuinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
 
Artificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdfArtificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdf
 
Bluetooth Low Energy(BLE) and beacons working
Bluetooth Low Energy(BLE) and beacons workingBluetooth Low Energy(BLE) and beacons working
Bluetooth Low Energy(BLE) and beacons working
 
Manual sensor Zigbee 3.0 MOES ZSS-X-PIRL-C
Manual  sensor Zigbee 3.0 MOES ZSS-X-PIRL-CManual  sensor Zigbee 3.0 MOES ZSS-X-PIRL-C
Manual sensor Zigbee 3.0 MOES ZSS-X-PIRL-C
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdfLLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI.pdf
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
 

An efficient use of temporal difference technique in Computer Game Learning

  • 1. An Efficient use of temporal difference technique in computer game learning Indian institute of technology ( Indian school of mines ), Dhanbad. Project guide:- Presented by: Dr. Rajendra Pamula Prabhu Kumar Department of computer 15MT000624 Science and engineering Computer science and engineering Indian institute of technology Indian institute of technology ( Indian school of mines), ( Indian school of mines), Dhanbad Dhanbad
  • 2. Outline 1. Introduction of reinforcement learning 2. Agent-Environment interface 3. Types of reinforcement learning 4. Elements of the reinforcement learning 5. Types of selection of state 6. Algorithms of reinforcement learning References
  • 3. Introduction of reinforcement learning  Reinforcement learning is the part of machine learning, which is a field of computer science that gives computer to ability to learn without being explicitly programmed.  Reinforcement learning is a framework for computational learning agents use experience from their interaction with an environment to improve performance over time.  In reinforcement learning task, the agent understands the state of the environment and it always tries to maximize the long-term return which is based on real value reward.  It is learning of what to do-how to do mapping situation to action so as maximize total numerical reward and minimize the penalty.
  • 4. Introduction of reinforcement learning cont.… • If there is no explicit teacher to guide the learning agent, the agent must learn the behavior through trail-and-error interaction with unknown environment. • The learning agent senses the environment, takes actions on it, and receives numeric reward or punishment from some reward function. • When we say agent learn means ”sometimes it modifies the code itself or modifies the database ”, database implies the experiences, information, event etc. • It is responsible for making decision. • The main goal of reinforcement learning is “Buildup a good model such as algorithm which generate a sequence of decision and lead to the highest long-term reward.”
  • 5. Agent environment interface o At each time step t, the reinforcement learning agent receives some representation of environment’s current state s(t) € S ,where S is the set of possible state and then choose some action a(t)€ A(st), where A(st) is set of actions that can be executed in state s(t). o The agent receives reward r(t+1) and execute in next state s(t+1) o The reward function can be used for specify the wide range of planning goals, It means the designer can tell the agent what he has achieve. o The reward function which must be unalterable by the agent.
  • 6. Types of reinforcement learning There are two types of reinforcement learning 1. Episodic: The interaction with the environment is divided into independent episodes. “Independent”, means performance in each episode is depends only the action taken on that episode. in episodic task, a return is sum of all reward received from the beginning of the episode until ends. where, T is terminal state i.e. end of episode ends S0 is the starting state of R denotes as total return r(k) denotes as the reward on the kth states
  • 7. Types of the reinforcement learning contd.. 2.Continuing task: It consist infinite sequence of state, action and rewards. In this task, the action and environment interaction doesn’t break down in separate episode. The performance is depends upon the current action. In the case of continuing task, The return is depends upon discount factor  where γ denotes discount factor which adjust the relative importance between long-term consequences vs. short term consequences.  The discount factor is between 0 and 1.  The discount factor reflects the strategy of how fast learning takes place  If γ =0, agent only concerned about maximizing the immediate rewards  If γ approaches to 1, The agent takes the future reward into account
  • 8. Element of the reinforcement learning 1. Policy: It defines the learning agent’s way of behaving at a given time. It might be a function or simple lookup table. It only used in reinforcement learning is to determine the behavior. 2. Reward function: It is the function which defines which one is the bad and good event for agent. It maps each state-action pair of the environment to a single real number. It must necessarily unalterable by the agent.
  • 9. Elements in the reinforcement learning contd.. 3. Value function: It specifies what is good in long run. The value of the state is the total amount of reward an agent can expect to accumulate over the future, starting from that state. Where as reward is the immediate desirability of environmental states. i.e. values indicate the long-tem desirability of states. 4. Model: It is used for planning. It defines the copy of behavior, e.g. by given state and action ,The model might predict the next state and next reward.
  • 10. Algorithm of reinforcement learning 1. Markov decision processes: • It is standard, general formalism for sequential decision problems. • It consist tuple of <S,A,P,R> where S is the set of states. A is the set of actions available to the agent P is the probability, P(a, ss′) = P r {st+1 = s ′ | st = s, at = a}, it is a state transition function that defines the probability of transitioning to state s ′ at time t + 1 after action a is taken when agent is in state s at time t. R is the reward function that determines the probability of receiving reward after choosing action a in state s and going for next state s’.
  • 11. Algorithms in reinforcement learning 2. Dynamic programming (DP) • It is the method to solve the markov decision process i.e. to find an optimal policy, if the full knowledge of model is available. • For dynamic programming, all the transition probabilities and reward expectation must be known. • This algorithm updates the estimates of states values based on their estimates of the next state. • There are two basic DP methods used for computing optimal policy 1. Policy iteration 2. Value iteration
  • 12. Policy Iteration: • It forms a sequence of policy Ωo, Ω1, Ω2….Ωk, Ωk+1 where Ωk+1 is an improvement of Ωk. • Policy evaluation task is concerned with computing state value function for any policy Ω • The iterative algorithm for policy evaluation is • Estimating value functions is particularly useful for finding the better policy. • The policy improvement algorithm uses action-value function to improve the current policy. If then it is better to select action a in policy Ω • If Ω and Ω’ are two policy and this condition hold then Ω’ is the better policy than Ω
  • 13. Value iteration • In value iteration, optimal policy is not computed directly. • For that, optimal value function is computed and then a greedy policy with respect to function is an optimal policy. • It stops and find the optimal policy when the changes introduced by backups/updates becomes sufficiently small. • One threshold value has been initialized and compared with threshold value. • If the policy value is sufficiently smaller than threshold, the policy is called as optimal policy
  • 14. 3. Temporal difference • The temporal differences idea has been taken from dynamic programming. • The temporal difference and dynamic programming, both are used for accumulating the value functions. • In this methods, learning takes place after every time step which is beneficial as it makes for efficient learning • The agent can revise its policy after every action and state it experiences. • TD algorithms make updates of the estimated policy values based on each state transition and on the immediate reward received from the environment on this transition. • The initial temporal difference algorithm is called TD(0),called tabular estimates v(Ω). It updates by following method V (s) ← V (s) + α (V (s’) − V (s)) where α is the positive step size parameter V(s’) is the value function for next state α (V (s’) − V (s)), called as Temporal difference error. V(s) is the value function for the current state Which is always designed to move toward 0.
  • 15. Type of selection of states 1. Greedy 2. Exploration process a) Providing initial knowledge b) Deriving a policy from demonstration c) Ask for help d) Teacher provide advice
  • 16. Application of reinforcement learning 1. Benchmark Problems a) Mountain car b) Cart-pole balancing c) Pendulum swing up tec.. 2. Games a) Tic-Tac-Toe b) Chess etc.. 3. Real world applications a) Robotics b) Control of helicopter c) Prediction of stock prices
  • 17. References • R. S. Sutton. Reinforcement learning: past, present and future [online]. Available from http: //www-anw. cs. umass. edu/ rich/Talks/SEAL98/SEAL98. html [accessed on December 2005]. 1999. • R. S. Sutton and A. G. Barto. Reinforcement learning. an introduction. Cambridge, MA: The MIT Press, 1998. • M. L. Puterman. Markov decision processes-discrete stochastic dynamic programming. John Wiley and sons, Inc, New York, NY, 1994.