SlideShare a Scribd company logo
1 of 27
Introduction to the HAHRL-RTS Platform Omar Enayet Amr Saqr AbdelRahman Al-Ogail Ahmed Atta
Agenda Complexity of RTS Games. Analysis of the Strategy Game. The HAHRL-RTS Platform. The Hierarchy. Heuristic Algorithms . Function Approximation. References.
Complexity of RTS Games There’s no doubt that strategy games are complex domains: Gigantic set of allowed Actions (almost infinite) Gigantic set of Game States (almost infinite) imperfect information nondeterministic behavior However : Real-time Planning and Reactions are required !
Complexity of RTS Games No Model of the Game i.e.: we don’t know exactly how can we go from a state to another. Infinite number of states and actions Result : Infeasible Learning with Raw Reinforcement Learning
Solution Solution :  Approximation of state space, action space, and value functions. Hierarchical Reinforcement Learning Applying heuristics Others
Analysis of the Strategy Game
Primitive Actions Primary Primitive Actions Move a unit Train/Upgrade a unit Gather a resource Make a unit attack Make a unit defend Build a building Repair a building NB: Upgrading units or buildings is not available in BosWars but found in most RTS Games.
Wining a Game Any player wins by doing 2 types of actions simultaneously, either an action that strengthens him or an action that weakens his enemy (Fig 1).
Wining a Game
6 Main Sub-Strategies When a human plays a strategy game, he doesn’t learn everything at the same time. He learns each of the following 6 independent sub-strategies separately:
1-Train What Units ? Train/Build/Upgrade attacking Units:What unit does he need to train?? Will he depend on fast cheep units to perform successive fast attacks or powerful expensive slow units to perform one or two brutal attacks to finish his enemy? Or will it be a combination of the two which is often a better choice? Does his enemy have some weak points concerning a certain unit? Or his enemy has units which can infiltrate his defenses so he must train their anti-units? Does he prefer to spend his money on expensive upgrades or spend it on more amounts of non-upgraded units? NB: I deal with attacking Buildings as static attacking units
2- How to Defend ? Defend:How will he use his current units to defend?  Will he concentrate all his units in one force stuck to each other or will he stretch his units upon his borders? Or a mix of the two approaches? Will he keep the defending units (which maybe an attacking building) around his buildings or will he make them guard far from the base to stop the enemy early. Or a mix of the two approaches? If he detects an attack on his radar, will he order the units to attack them at once, or will he wait for the opponent to come to his base and be crushed? Or a mix of the two approaches? How will he defend un-armed units? Will he place armed units near them to for protection or will he prefer to use the armed units in another useful thing? If an un-armed unit is under attack how will he react? What are his reactions to different events while defending?
3- How to Attack ? Attack:How will he use his current units to attack?  Will he attack the important buildings first? Or will he prefer to crush all the defensive buildings and units first? Or a mix of the two approaches? Will he divide his attacking force to separate small forces to attack from different places, or will he attack with one big solid force? Or a mix of the two approaches? What are his reactions to different events while attacking?
4- How to Gather Resources ? Gather Resources: How will he gather the resources? Will he train a lot of gatherers to have a large rate of gathering resources? Or will he train a limited amount because it would be a waste of money and he wants to rush (attack early) in the beginning of the game so he needs that money? Or a mix of the two approaches? Will he start gathering the far resources first because the near resources are more guaranteed? Or will he be greedy and acquire the resources the nearer the first? Or a mix of the two approaches?
5- How to construct buildings ? Construct Buildings:How does he place his buildings? Will he stick them to each other in order to defend them easily? Or will he leave large spaces between them to make it harder for the opponent to destroy them? Or a mix of the two approaches?
6- How to Repair ? Repair:How will he do the repairing? Although it’s a minor thing, but different approaches are used. Will he place a repairing unit near every building in case of having an attack, or will he just order the nearest one to repair the building being attacked? Or a mix of the two approaches?
Heuristically accelerated Hierarchical RL in RTS Games
The Hierarchy Since the 6 sub-strategies do not depend on each other (think of it and you’ll find them nearly independent),  So, I will divide the AI system to a hierarchy as shown in figure 1, each child node is by itself a Semi-Marcov decision process (SMDP) where Heuristically Accelerated Reinforcement Learning Techniques will be applied. Each child node will be later divided into other sub-nodes of SMDPs.
Heuristic Algorithms Aheuristic, is an algorithm that is able to produce an acceptable solution to a problem in many practical scenarios, in the fashion of a general heuristic, but for which there is no formal proof of its correctness. Alternatively, it may be correct, but may not be proven to produce an optimal solution, or to use reasonable resources.
Heuristic Algorithms (Cont’d) Firstly : The Splitting of the learning into learning the six sub-strategies is a heuristic Secondly : Using Case-Based Reasoning when choosing actions is a heuristic. Why Heuristics ?? Because they will accelerate the learning dramatically. They will decrease the non-determination of the AI so Testing is easier. Why not Heuristics ? : Programming Increases
Feature-Based Function Approximation The Problem:  The State-action Space is infinite  The Goal:  We want to approximate the state-action space but reinforcement learning still becomes efficient.
The Approach If the actions are infinite, make them discrete with any appropriate way. For example: In the Resource Gathering Problem, the actions are joining more N number of gatherers to gather this resource, N this could be any number, we will convert it to discrete values such as : [0,1] [2,4] [5,8] [9,15] [16,22] [22,35] Only. Notice that its rare cases when u need to join more than 35 gatherers to the already-working gatherers to gather a resource.
The Approach (Cont’d) The states won’t be represented explicitly, but depending on their features. For example: In the Resource Gathering Problem, the states are infinite depending on the combinations of following features: number of gatherers, relative distant between each gatherer and the resource, available resources, wanted resources …etc. which is a huge number, instead we will use features themselves as you will see
The Approach (Cont’d)
The Approach (Cont’d)
Result of Approximation So the complexity won’t depend on the number of states*number of actions, Instead it will depend on the number of features*number of actions, so in the Resource Gathering Problem, if we have 6 distinct actions and we approximated the infinite number of states to at least 100 we will learn the Values of at least 600 Q-Value Pairs, but by using this approach if we have 5 features and 6 distinct actions, we will learn 5*6=30 thetas only.  We approximated only state space not action space, infinite states to definite number of features. But still exists a problem if the action space is large.
References Andrew G. Barto, Sridhar Mahadevan, 2003, Recent Advances in Hierarchical Reinforcement Learning Marina Irodova and Robert H. Sloa, 2005, Reinforcement Learning and Function Approximation Reinaldo A. C. Bianchi, Raquel Ros and Ram´on L´opez de M´antaras, 2009, Improving Reinforcement Learning by using Case Based Heuristics Richard S. Sutton and Andrew G. Barto, 1998,  Reinforcement Learning: An Introduction Wikipedia

More Related Content

Similar to Introduction To Heuristically accelerated Hierarchical RL in RTS Games

introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learningcolleges
 
ANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVER
ANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVERANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVER
ANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVERgerogepatton
 
ANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVER
ANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVERANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVER
ANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVERijaia
 
Defence Data Science - BlueHat Seattle, 2019
Defence Data Science - BlueHat Seattle, 2019Defence Data Science - BlueHat Seattle, 2019
Defence Data Science - BlueHat Seattle, 2019Jon Hawes
 
Online learning & adaptive game playing
Online learning & adaptive game playingOnline learning & adaptive game playing
Online learning & adaptive game playingSaeid Ghafouri
 
Machine learning para tertulianos, by javier ramirez at teowaki
Machine learning para tertulianos, by javier ramirez at teowakiMachine learning para tertulianos, by javier ramirez at teowaki
Machine learning para tertulianos, by javier ramirez at teowakijavier ramirez
 
SOCIAL COGNITIVE THEORY EVALUATION SCORING GUIDE SOCIAL COGNIT.docx
SOCIAL COGNITIVE THEORY EVALUATION SCORING GUIDE SOCIAL COGNIT.docxSOCIAL COGNITIVE THEORY EVALUATION SCORING GUIDE SOCIAL COGNIT.docx
SOCIAL COGNITIVE THEORY EVALUATION SCORING GUIDE SOCIAL COGNIT.docxwhitneyleman54422
 
Heuristic search-in-artificial-intelligence
Heuristic search-in-artificial-intelligenceHeuristic search-in-artificial-intelligence
Heuristic search-in-artificial-intelligencegrinu
 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.pptbutest
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxChandrakalaV15
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfHassanElalfy4
 
Ai planning with evolutionary computing
Ai planning with evolutionary computingAi planning with evolutionary computing
Ai planning with evolutionary computingpinozz
 
Robust Agent Execution
Robust Agent ExecutionRobust Agent Execution
Robust Agent ExecutionLuke Dicken
 
50134147-Knowledge-Representation-Using-Rules.ppt
50134147-Knowledge-Representation-Using-Rules.ppt50134147-Knowledge-Representation-Using-Rules.ppt
50134147-Knowledge-Representation-Using-Rules.pptssuserec53e73
 
Sheet1Points0ENGL510 Grading Rubric Group .docx
Sheet1Points0ENGL510 Grading Rubric                        Group .docxSheet1Points0ENGL510 Grading Rubric                        Group .docx
Sheet1Points0ENGL510 Grading Rubric Group .docxlesleyryder69361
 
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptxThe 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptxChode Amarnath
 

Similar to Introduction To Heuristically accelerated Hierarchical RL in RTS Games (20)

introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
ANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVER
ANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVERANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVER
ANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVER
 
ANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVER
ANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVERANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVER
ANSWER SET PROGRAMMING (DLV – CLINGO):CONNECT 4 SOLVER
 
Defence Data Science - BlueHat Seattle, 2019
Defence Data Science - BlueHat Seattle, 2019Defence Data Science - BlueHat Seattle, 2019
Defence Data Science - BlueHat Seattle, 2019
 
Online learning & adaptive game playing
Online learning & adaptive game playingOnline learning & adaptive game playing
Online learning & adaptive game playing
 
c27_mas.ppt
c27_mas.pptc27_mas.ppt
c27_mas.ppt
 
Lecture 4 (1).pptx
Lecture 4 (1).pptxLecture 4 (1).pptx
Lecture 4 (1).pptx
 
Machine learning para tertulianos, by javier ramirez at teowaki
Machine learning para tertulianos, by javier ramirez at teowakiMachine learning para tertulianos, by javier ramirez at teowaki
Machine learning para tertulianos, by javier ramirez at teowaki
 
SOCIAL COGNITIVE THEORY EVALUATION SCORING GUIDE SOCIAL COGNIT.docx
SOCIAL COGNITIVE THEORY EVALUATION SCORING GUIDE SOCIAL COGNIT.docxSOCIAL COGNITIVE THEORY EVALUATION SCORING GUIDE SOCIAL COGNIT.docx
SOCIAL COGNITIVE THEORY EVALUATION SCORING GUIDE SOCIAL COGNIT.docx
 
Heuristic search-in-artificial-intelligence
Heuristic search-in-artificial-intelligenceHeuristic search-in-artificial-intelligence
Heuristic search-in-artificial-intelligence
 
acai01-updated.ppt
acai01-updated.pptacai01-updated.ppt
acai01-updated.ppt
 
Artificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptxArtificial intyelligence and machine learning introduction.pptx
Artificial intyelligence and machine learning introduction.pptx
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Lect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdfLect 8 learning types (M.L.).pdf
Lect 8 learning types (M.L.).pdf
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Ai planning with evolutionary computing
Ai planning with evolutionary computingAi planning with evolutionary computing
Ai planning with evolutionary computing
 
Robust Agent Execution
Robust Agent ExecutionRobust Agent Execution
Robust Agent Execution
 
50134147-Knowledge-Representation-Using-Rules.ppt
50134147-Knowledge-Representation-Using-Rules.ppt50134147-Knowledge-Representation-Using-Rules.ppt
50134147-Knowledge-Representation-Using-Rules.ppt
 
Sheet1Points0ENGL510 Grading Rubric Group .docx
Sheet1Points0ENGL510 Grading Rubric                        Group .docxSheet1Points0ENGL510 Grading Rubric                        Group .docx
Sheet1Points0ENGL510 Grading Rubric Group .docx
 
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptxThe 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Introduction To Heuristically accelerated Hierarchical RL in RTS Games

  • 1. Introduction to the HAHRL-RTS Platform Omar Enayet Amr Saqr AbdelRahman Al-Ogail Ahmed Atta
  • 2. Agenda Complexity of RTS Games. Analysis of the Strategy Game. The HAHRL-RTS Platform. The Hierarchy. Heuristic Algorithms . Function Approximation. References.
  • 3. Complexity of RTS Games There’s no doubt that strategy games are complex domains: Gigantic set of allowed Actions (almost infinite) Gigantic set of Game States (almost infinite) imperfect information nondeterministic behavior However : Real-time Planning and Reactions are required !
  • 4. Complexity of RTS Games No Model of the Game i.e.: we don’t know exactly how can we go from a state to another. Infinite number of states and actions Result : Infeasible Learning with Raw Reinforcement Learning
  • 5. Solution Solution : Approximation of state space, action space, and value functions. Hierarchical Reinforcement Learning Applying heuristics Others
  • 6. Analysis of the Strategy Game
  • 7. Primitive Actions Primary Primitive Actions Move a unit Train/Upgrade a unit Gather a resource Make a unit attack Make a unit defend Build a building Repair a building NB: Upgrading units or buildings is not available in BosWars but found in most RTS Games.
  • 8. Wining a Game Any player wins by doing 2 types of actions simultaneously, either an action that strengthens him or an action that weakens his enemy (Fig 1).
  • 10. 6 Main Sub-Strategies When a human plays a strategy game, he doesn’t learn everything at the same time. He learns each of the following 6 independent sub-strategies separately:
  • 11. 1-Train What Units ? Train/Build/Upgrade attacking Units:What unit does he need to train?? Will he depend on fast cheep units to perform successive fast attacks or powerful expensive slow units to perform one or two brutal attacks to finish his enemy? Or will it be a combination of the two which is often a better choice? Does his enemy have some weak points concerning a certain unit? Or his enemy has units which can infiltrate his defenses so he must train their anti-units? Does he prefer to spend his money on expensive upgrades or spend it on more amounts of non-upgraded units? NB: I deal with attacking Buildings as static attacking units
  • 12. 2- How to Defend ? Defend:How will he use his current units to defend? Will he concentrate all his units in one force stuck to each other or will he stretch his units upon his borders? Or a mix of the two approaches? Will he keep the defending units (which maybe an attacking building) around his buildings or will he make them guard far from the base to stop the enemy early. Or a mix of the two approaches? If he detects an attack on his radar, will he order the units to attack them at once, or will he wait for the opponent to come to his base and be crushed? Or a mix of the two approaches? How will he defend un-armed units? Will he place armed units near them to for protection or will he prefer to use the armed units in another useful thing? If an un-armed unit is under attack how will he react? What are his reactions to different events while defending?
  • 13. 3- How to Attack ? Attack:How will he use his current units to attack? Will he attack the important buildings first? Or will he prefer to crush all the defensive buildings and units first? Or a mix of the two approaches? Will he divide his attacking force to separate small forces to attack from different places, or will he attack with one big solid force? Or a mix of the two approaches? What are his reactions to different events while attacking?
  • 14. 4- How to Gather Resources ? Gather Resources: How will he gather the resources? Will he train a lot of gatherers to have a large rate of gathering resources? Or will he train a limited amount because it would be a waste of money and he wants to rush (attack early) in the beginning of the game so he needs that money? Or a mix of the two approaches? Will he start gathering the far resources first because the near resources are more guaranteed? Or will he be greedy and acquire the resources the nearer the first? Or a mix of the two approaches?
  • 15. 5- How to construct buildings ? Construct Buildings:How does he place his buildings? Will he stick them to each other in order to defend them easily? Or will he leave large spaces between them to make it harder for the opponent to destroy them? Or a mix of the two approaches?
  • 16. 6- How to Repair ? Repair:How will he do the repairing? Although it’s a minor thing, but different approaches are used. Will he place a repairing unit near every building in case of having an attack, or will he just order the nearest one to repair the building being attacked? Or a mix of the two approaches?
  • 18. The Hierarchy Since the 6 sub-strategies do not depend on each other (think of it and you’ll find them nearly independent),  So, I will divide the AI system to a hierarchy as shown in figure 1, each child node is by itself a Semi-Marcov decision process (SMDP) where Heuristically Accelerated Reinforcement Learning Techniques will be applied. Each child node will be later divided into other sub-nodes of SMDPs.
  • 19. Heuristic Algorithms Aheuristic, is an algorithm that is able to produce an acceptable solution to a problem in many practical scenarios, in the fashion of a general heuristic, but for which there is no formal proof of its correctness. Alternatively, it may be correct, but may not be proven to produce an optimal solution, or to use reasonable resources.
  • 20. Heuristic Algorithms (Cont’d) Firstly : The Splitting of the learning into learning the six sub-strategies is a heuristic Secondly : Using Case-Based Reasoning when choosing actions is a heuristic. Why Heuristics ?? Because they will accelerate the learning dramatically. They will decrease the non-determination of the AI so Testing is easier. Why not Heuristics ? : Programming Increases
  • 21. Feature-Based Function Approximation The Problem: The State-action Space is infinite The Goal: We want to approximate the state-action space but reinforcement learning still becomes efficient.
  • 22. The Approach If the actions are infinite, make them discrete with any appropriate way. For example: In the Resource Gathering Problem, the actions are joining more N number of gatherers to gather this resource, N this could be any number, we will convert it to discrete values such as : [0,1] [2,4] [5,8] [9,15] [16,22] [22,35] Only. Notice that its rare cases when u need to join more than 35 gatherers to the already-working gatherers to gather a resource.
  • 23. The Approach (Cont’d) The states won’t be represented explicitly, but depending on their features. For example: In the Resource Gathering Problem, the states are infinite depending on the combinations of following features: number of gatherers, relative distant between each gatherer and the resource, available resources, wanted resources …etc. which is a huge number, instead we will use features themselves as you will see
  • 26. Result of Approximation So the complexity won’t depend on the number of states*number of actions, Instead it will depend on the number of features*number of actions, so in the Resource Gathering Problem, if we have 6 distinct actions and we approximated the infinite number of states to at least 100 we will learn the Values of at least 600 Q-Value Pairs, but by using this approach if we have 5 features and 6 distinct actions, we will learn 5*6=30 thetas only. We approximated only state space not action space, infinite states to definite number of features. But still exists a problem if the action space is large.
  • 27. References Andrew G. Barto, Sridhar Mahadevan, 2003, Recent Advances in Hierarchical Reinforcement Learning Marina Irodova and Robert H. Sloa, 2005, Reinforcement Learning and Function Approximation Reinaldo A. C. Bianchi, Raquel Ros and Ram´on L´opez de M´antaras, 2009, Improving Reinforcement Learning by using Case Based Heuristics Richard S. Sutton and Andrew G. Barto, 1998, Reinforcement Learning: An Introduction Wikipedia