SlideShare a Scribd company logo
1 of 25
Download to read offline
Parametric Action Pre-Selection for MCTS in
Real-Time Strategy Games
Abdessamed Ouessai, Mohammed Salem, and Antonio M. Mora
University of Mascara,
Algeria
University of Granada,
Spain
VI CoSECiVi-2020
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
Introduction
→ First game AI research domain: Classic board games
→ Evolution of board games is constrained by physics
→ Video games represent an unconstrained medium
→ Real-Time Strategy sub-genre concretized abstract board games (Warfare)
→ RTS Games are an evolution of abstract board games
→ ++ Concrete | ++ Challenging for humans | ++ Complex for AI
1
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
RTS Games & AI
→ Multiplayer, zero-sum, non-deterministic game with imperfect information.
→ Top-down perspective. Recognizable mouse and keyboard-based UI.
General Strategy
Gather Build & Train Confront
Destruction of Opponent’s Forces
Units Structures Resources
Victory
Condition
2
RTS Games & AI
→ What does an RTS game-playing AI have to deal with?
3
Short decision cycles (~50/s) Simultaneous moves for different units
Durative actions (> one decision cycle)
Non-determinismPartial observability (opponent & environment)
Exponential growth of the decision/state spaces
Chess Go StarCraft
Branching Factor 36 180 1050
State Space 1047 10171 101685
Real-Time Aspect
Uncertainty
Complexity Large topographic environments
Approximate
Estimates
RTS Games & AI
→ Notable developments:
→ Scripts: Portfolio Greedy Search (Churchill et al, 2013), Puppet Search (Barriga et al, 2015)
→ Learning: Bayesian Models (Synnaeve et al, 2011), AlphaStar (Vinyals et al, 2019)
→ Planning: NaïveMCTS (Ontañón, 2013), AHTN (Ontañón and Buro, 2015), CCG (Kantharaju et al, 2018)
→ Evaluation: CNN (Stanescu et al, 2016), (Barriga et al, 2019)
→ Competitions:
→ IEEE CoG (StarCraft & µRTS), AAAI AIIDE (StarCraft), SSCAIT
→ RTS AI Testbeds:
→ ORTS – Wargus – BWAPI(SC) – SparCraft – SC2LE – ELF – DeepRTS - µRTS.
4
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
Monte Carlo Tree Search
→ An iterative, anytime, sampling-based search framework
→ Main components:
→ Tree Policy
→ Default Policy
→ Popular variant:
→ UCT (UCB1 as Tree Policy)
→ Popular application:
→ Go (AlphaGo)
→ Downside:
→ Scalability issues
5
Tree Policy
Reward
Default Policy
(4) Backpropagation(3) Simulation(2) Expansion(1) Selection
Monte Carlo Tree Search
→ Proposed solutions to enhance MCTS scalability:
6
CMAB
Abstraction
→ Selection phase framed as a Combinatorial Multi-Armed Bandit problem
→ NaïveMCTS is based on a CMAB formulation and a naïve assumption
𝑎1 𝑎2 𝑎3 … 𝑎 𝑛
𝑣1 𝑣2 𝑣3 … 𝑣 𝑛
𝑢1 𝑢2 𝑢3 … 𝑢 𝑛Units
Player Action
(𝛼 𝑡)
Values
𝑣𝑖 =
𝑛
𝑖=1
𝑉(𝛼 𝑡)
(The naïve assumption)
→ Search the decision space induced by expert-authored scripts instead of the original
decision space
→ Downsides: (1) Sacrifices tactical performance. (2) Performance depends on scripts
→ Successfully adapts MCTS to combinatorial decision spaces (ex. RTS Games)
→ Downside: The algorithm is still affected by the dimensionality of the decision space.
Monte Carlo Tree Search
→ Our proposition:
→ A multi-stage parametric action pre-selection scheme to control the decision space
and its granularity
→ Combine abstraction with CMAB (NaïveMCTS) using small-scale parametric scripts
(heuristics)
→ Define a strategy as a collection of heuristics and parameters
7
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
Parametric Action Pre-Selection
→ Expert-authored scripts usually encode a deterministic strategy using a limited portion of
the decision space
→ How to generate novel strategies that can better exploit the available actions?
→ How to preserve low-level tactical performance?
→ A strategy is a combination of heuristics
8
Direct offense
heuristic
Harvest heuristicTrain heuristic
Worker Rush
Strategy
→ Heuristic: A parametric single-goal procedure for
controlling a sub-group of units
→ Single unit:
ℎ ∈ H ∶ 𝑆 × 𝑈 × 𝐴𝑙
× 𝑅ℎ → 𝐴 𝑘
𝑘 ≤ 𝑙
→ 𝑆 : States, 𝑈 : Units, 𝐴 : Unit-Actions, 𝑅ℎ : Parameters
→ Group of units: applied to each member
→ In expert-authored scripts, 𝑘 = 1 and 𝑅ℎ = 1
Parametric Action Pre-Selection
→ Action Pre-Selection: Downsizing the decision space by selecting a subset of actions satisfying a certain
criterion (strategy), prior to planning
→ When 𝑘 > 1 the final decision will be made by a a search approach (ex. MCTS)
→ A unit partitioning 𝑑 ∈ D determines unit groups (manually or automatically)
→ Each unit group is associated with a heuristic. Heuristics’ output defines the search space
9
Planning (MCTS)Pre-Selected ActionsOriginal Actions
Partitioning
Heuristics
Parameters
Action
Pre-Selection
Parametric Action Pre-Selection
→ The general algorithm:
→ Pre-selected actions are refined over successive phases
→ Parametric Action Pre-Selection: 𝑇(𝑠, 𝑈, 𝐴0, 𝑥1, … , 𝑥 𝑛) with 𝑥𝑖(𝐴𝑖−1, 𝑑𝑖, 𝐻𝑖, 𝜃𝑖)
→ A strategy can be expressed as: 𝜎 = (𝑑1, … , 𝑑 𝑛, 𝐻𝑖, … , 𝐻 𝑛, 𝜃1, … , 𝜃 𝑛)
10
A
d1
g1
gm1
H1
h1
hm1
A
Ò1
d2
g1
gm2
H2
h1
h m2
Ò2
A n-110
dn
g1
gmn
H n
h1
hmn
Òn
Game State s
Units U
A n
Search
Execution
𝑥1 𝑥2 𝑥 𝑛
𝑇
Parametric Action Pre-Selection
→ Proposed implementation: ParaMCTS
→ A 2-phase action pre-selection process using NaïveMCTS for search
→ Inspired by the macro- and micro-management task decomposition
→ 47 parameter govern the behaviour of ParaMCTS, tuned manually
→ NaïveMCTS enhancement: Inactive player-action pruning (previous study)
11
Groups Heuristics Parameters
Harvesters <Harvest> maxU, buildMode, pf,
…
Offense <Attack> maxU, offMode,
maxTargets, pf, …
Defense <Defend> maxU, defMode,
defPerimeter, pf, …
Structures <Train> maxU, trainMode, …
Groups Heuristics Parameters
Front-Line <Front-Line Tactics> maxU, waitDuration,
…
Back <Back Tactics> waitDuration, …
Phase-1 (𝑥1) Phase-2 (𝑥2)
NaïveMCTS
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
Experiments & Results
→ How can MCTS benefit from the downsized decision space?
→ Should we increasing the playout duration, the maximum search depth, or both? By how much?
→ How does the performance of ParaMCTS compare to state-of-the-art agents?
→ Experiments setting:
→ Computation budget: 100𝑚𝑠 per game cycle, Maps: basesWorkers 8 × 8, 16 × 16, 32 × 32
→ Tested maximum search depths: {10, 15, 20, 30, 50}. Tested playout durations: {100, 150, 200, 300, 500}
12
→ A lightweight, AI research-focused RTS simulator
→ Open source, written in Java by Santiago Ontañón
→ Includes a forward model and many baseline agents
→ Subject of a yearly AI competition as part of IEEE CoG
Testbed: µRTS (or microRTS)
Experiments & Results
→ Experiments 1: Two 120 iteration round-robin tournaments
1) Between ParaMCTS variants with a fixed playout duration (100 cycles) and different max search depths
2) Between ParaMCTS variants with a fixed max search depth (10) and different playout duration
→ Total matches: 4800 in each map. Score = Wins + Draws / 2, normalized.
→ Results:
13
Experiments & Results
→ Experiment 2: Maximum search depth and playout duration combinations
→ 100 match between each ParaMCTS(search depth, playout duration) variant and MixedBot
→ Sides switched after 50 matches. ParaMCTS implements a similar strategy to MixedBot
→ Total matches: 2500 in each map
→ Results:
14
Experiments & Results
→ Experiment 3: Vs. state-of-the-art.
→ 100 iteration round-robin tournament
→ Participants:
→ ParaMCTS
→ MixedBot
→ Izanagi
→ Droplet
→ NaïveMCTS*
→ NaïveMCTS
→ Total Matches: 3000 in each map
→ 11.9 to 19.1 overall margin
15
Top ranking agents from
2019’s µRTS competition
Same hyperparameters as
ParaMCTS
Using best hyperparameters
→ Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
Conclusion & Future Work
→ Parametric action pre-selection describes a general action/state abstraction framework,
applicable to any game with similar characteristics to RTS games
→ Using heuristics instead of scripts grants greater flexibility
→ A proposed implementation, ParaMCTS, significantly outperformed state-of-the-art
agents, using manually tuned parameters
→ Recovered computation budget is better used for deeper search
16
Future Work
→ ParaMCTS parameter optimization for different objectives (maps, opponents, …)
→ Dynamic parameter adaptation through RL
→ Heuristic/partitioning discovery
→ Difficulty adjustment given adequate heuristics and parameters
Thank You
abdessamed.ouessai@univ-mascara.dz
salem@univ-mascara.dz
amorag@ugr.es

More Related Content

Similar to CoSECiVi 2020 - Parametric Action Pre-Selection for MCTS in Real-Time Strategy Games

new file best book from the university.pdf
new file best book from the university.pdfnew file best book from the university.pdf
new file best book from the university.pdf
MUKESHKUMAR601613
 
Mastering the game of go with deep neural networks and tree searching
Mastering the game of go with deep neural networks and tree searchingMastering the game of go with deep neural networks and tree searching
Mastering the game of go with deep neural networks and tree searching
Brian Kim
 
Cyber Security Forum: DARPA's Cyber Grand Challenge. What Happened and What'...
Cyber Security Forum: DARPA's Cyber Grand Challenge.  What Happened and What'...Cyber Security Forum: DARPA's Cyber Grand Challenge.  What Happened and What'...
Cyber Security Forum: DARPA's Cyber Grand Challenge. What Happened and What'...
Tim Vidas
 

Similar to CoSECiVi 2020 - Parametric Action Pre-Selection for MCTS in Real-Time Strategy Games (20)

Improving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Improving the Performance of MCTS-Based μRTS Agents Through Move PruningImproving the Performance of MCTS-Based μRTS Agents Through Move Pruning
Improving the Performance of MCTS-Based μRTS Agents Through Move Pruning
 
2017 Fighting Game AI Competition
2017 Fighting Game AI Competition2017 Fighting Game AI Competition
2017 Fighting Game AI Competition
 
Testing hybrid computational intelligence algorithms for general game playing...
Testing hybrid computational intelligence algorithms for general game playing...Testing hybrid computational intelligence algorithms for general game playing...
Testing hybrid computational intelligence algorithms for general game playing...
 
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)
Application of Monte Carlo Tree Search in a Fighting Game AI (GCCE 2016)
 
Alpha go 16110226_김영우
Alpha go 16110226_김영우Alpha go 16110226_김영우
Alpha go 16110226_김영우
 
Towards Automatic StarCraft Strategy Generation Using Genetic Programming
Towards Automatic StarCraft Strategy Generation Using Genetic ProgrammingTowards Automatic StarCraft Strategy Generation Using Genetic Programming
Towards Automatic StarCraft Strategy Generation Using Genetic Programming
 
1.game
1.game1.game
1.game
 
A STRATEGIC HYBRID TECHNIQUE TO DEVELOP A GAME PLAYER
A STRATEGIC HYBRID TECHNIQUE TO DEVELOP A GAME PLAYERA STRATEGIC HYBRID TECHNIQUE TO DEVELOP A GAME PLAYER
A STRATEGIC HYBRID TECHNIQUE TO DEVELOP A GAME PLAYER
 
new file best book from the university.pdf
new file best book from the university.pdfnew file best book from the university.pdf
new file best book from the university.pdf
 
Mastering the game of go with deep neural networks and tree searching
Mastering the game of go with deep neural networks and tree searchingMastering the game of go with deep neural networks and tree searching
Mastering the game of go with deep neural networks and tree searching
 
Streaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same GameStreaming Analytics: It's Not the Same Game
Streaming Analytics: It's Not the Same Game
 
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human KnowledgeAlphaGo Zero: Mastering the Game of Go Without Human Knowledge
AlphaGo Zero: Mastering the Game of Go Without Human Knowledge
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
 
All projects
All projectsAll projects
All projects
 
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
Learning to Reason in Round-based Games: Multi-task Sequence Generation for P...
 
Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...
Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...
Learning Graphs Representations Using Recurrent Graph Convolution Networks Fo...
 
Dynamic Programming and Reinforcement Learning applied to Tetris Game
Dynamic Programming and Reinforcement Learning applied to Tetris GameDynamic Programming and Reinforcement Learning applied to Tetris Game
Dynamic Programming and Reinforcement Learning applied to Tetris Game
 
AI3391 Artificial intelligence Session 15 Min Max Algorithm.pptx
AI3391 Artificial intelligence Session 15  Min Max Algorithm.pptxAI3391 Artificial intelligence Session 15  Min Max Algorithm.pptx
AI3391 Artificial intelligence Session 15 Min Max Algorithm.pptx
 
Cyber Security Forum: DARPA's Cyber Grand Challenge. What Happened and What'...
Cyber Security Forum: DARPA's Cyber Grand Challenge.  What Happened and What'...Cyber Security Forum: DARPA's Cyber Grand Challenge.  What Happened and What'...
Cyber Security Forum: DARPA's Cyber Grand Challenge. What Happened and What'...
 
Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)Applying AI in Games (GDC2019)
Applying AI in Games (GDC2019)
 

More from Sociedad Española para las Ciencias del Videojuego

CoSECiVi 2020 - GRETIVE Un Bot Evolutivo para HearthStone basado en Perfiles
CoSECiVi 2020 - GRETIVE Un Bot Evolutivo para HearthStone basado en PerfilesCoSECiVi 2020 - GRETIVE Un Bot Evolutivo para HearthStone basado en Perfiles
CoSECiVi 2020 - GRETIVE Un Bot Evolutivo para HearthStone basado en Perfiles
Sociedad Española para las Ciencias del Videojuego
 
CoSECiVi 2020 - Las consecuencias del glitch en el entorno virtual interactivo
CoSECiVi 2020 - Las consecuencias del glitch en el entorno virtual interactivoCoSECiVi 2020 - Las consecuencias del glitch en el entorno virtual interactivo
CoSECiVi 2020 - Las consecuencias del glitch en el entorno virtual interactivo
Sociedad Española para las Ciencias del Videojuego
 
CoSECiVi 2020 - Games studies in architectural education: An experimental gra...
CoSECiVi 2020 - Games studies in architectural education: An experimental gra...CoSECiVi 2020 - Games studies in architectural education: An experimental gra...
CoSECiVi 2020 - Games studies in architectural education: An experimental gra...
Sociedad Española para las Ciencias del Videojuego
 
CoSECiVi 2020 - Data mining of deck archetypes in Hearthstone
CoSECiVi 2020 - Data mining of deck archetypes in HearthstoneCoSECiVi 2020 - Data mining of deck archetypes in Hearthstone
CoSECiVi 2020 - Data mining of deck archetypes in Hearthstone
Sociedad Española para las Ciencias del Videojuego
 
CoSECiVi 2020 - Descubrimiento de modelos de comportamiento de perfiles de ju...
CoSECiVi 2020 - Descubrimiento de modelos de comportamiento de perfiles de ju...CoSECiVi 2020 - Descubrimiento de modelos de comportamiento de perfiles de ju...
CoSECiVi 2020 - Descubrimiento de modelos de comportamiento de perfiles de ju...
Sociedad Española para las Ciencias del Videojuego
 
CoSECiVi 2020 - Virtual Reality and Chess. A Video Game for Cognitive Trainin...
CoSECiVi 2020 - Virtual Reality and Chess. A Video Game for Cognitive Trainin...CoSECiVi 2020 - Virtual Reality and Chess. A Video Game for Cognitive Trainin...
CoSECiVi 2020 - Virtual Reality and Chess. A Video Game for Cognitive Trainin...
Sociedad Española para las Ciencias del Videojuego
 
CoSECiVi'16 - Walking in VR. Measuring Presence and Simulator Sickness in Fir...
CoSECiVi'16 - Walking in VR. Measuring Presence and Simulator Sickness in Fir...CoSECiVi'16 - Walking in VR. Measuring Presence and Simulator Sickness in Fir...
CoSECiVi'16 - Walking in VR. Measuring Presence and Simulator Sickness in Fir...
Sociedad Española para las Ciencias del Videojuego
 
CoSECiVi'16 - Sólo puede quedar uno: Evolución de Bots para RTS basada en sup...
CoSECiVi'16 - Sólo puede quedar uno: Evolución de Bots para RTS basada en sup...CoSECiVi'16 - Sólo puede quedar uno: Evolución de Bots para RTS basada en sup...
CoSECiVi'16 - Sólo puede quedar uno: Evolución de Bots para RTS basada en sup...
Sociedad Española para las Ciencias del Videojuego
 
CoSECiVi'16 - Educapiz: Una herramienta para educación infantil basada en ser...
CoSECiVi'16 - Educapiz: Una herramienta para educación infantil basada en ser...CoSECiVi'16 - Educapiz: Una herramienta para educación infantil basada en ser...
CoSECiVi'16 - Educapiz: Una herramienta para educación infantil basada en ser...
Sociedad Española para las Ciencias del Videojuego
 

More from Sociedad Española para las Ciencias del Videojuego (20)

CoSECiVi 2020 - GRETIVE Un Bot Evolutivo para HearthStone basado en Perfiles
CoSECiVi 2020 - GRETIVE Un Bot Evolutivo para HearthStone basado en PerfilesCoSECiVi 2020 - GRETIVE Un Bot Evolutivo para HearthStone basado en Perfiles
CoSECiVi 2020 - GRETIVE Un Bot Evolutivo para HearthStone basado en Perfiles
 
CoSECiVi 2020 - Las consecuencias del glitch en el entorno virtual interactivo
CoSECiVi 2020 - Las consecuencias del glitch en el entorno virtual interactivoCoSECiVi 2020 - Las consecuencias del glitch en el entorno virtual interactivo
CoSECiVi 2020 - Las consecuencias del glitch en el entorno virtual interactivo
 
CoSECiVi 2020 - Games studies in architectural education: An experimental gra...
CoSECiVi 2020 - Games studies in architectural education: An experimental gra...CoSECiVi 2020 - Games studies in architectural education: An experimental gra...
CoSECiVi 2020 - Games studies in architectural education: An experimental gra...
 
CoSECiVi 2020 - Multiresolution Foliage Rendering
CoSECiVi 2020 - Multiresolution Foliage RenderingCoSECiVi 2020 - Multiresolution Foliage Rendering
CoSECiVi 2020 - Multiresolution Foliage Rendering
 
CoSECiVi 2020 - Development of a User-Friendly Application for Creating Tacti...
CoSECiVi 2020 - Development of a User-Friendly Application for Creating Tacti...CoSECiVi 2020 - Development of a User-Friendly Application for Creating Tacti...
CoSECiVi 2020 - Development of a User-Friendly Application for Creating Tacti...
 
CoSECiVi 2020 - Entornos parcialmente no euclidianos en realidad virtual
CoSECiVi 2020 - Entornos parcialmente no euclidianos en realidad virtualCoSECiVi 2020 - Entornos parcialmente no euclidianos en realidad virtual
CoSECiVi 2020 - Entornos parcialmente no euclidianos en realidad virtual
 
CoSECiVi 2020 - An Exploration on Automating Player Personality Identificatio...
CoSECiVi 2020 - An Exploration on Automating Player Personality Identificatio...CoSECiVi 2020 - An Exploration on Automating Player Personality Identificatio...
CoSECiVi 2020 - An Exploration on Automating Player Personality Identificatio...
 
CoSECiVi 2020 - Data mining of deck archetypes in Hearthstone
CoSECiVi 2020 - Data mining of deck archetypes in HearthstoneCoSECiVi 2020 - Data mining of deck archetypes in Hearthstone
CoSECiVi 2020 - Data mining of deck archetypes in Hearthstone
 
CoSECiVi 2020 - Descubrimiento de modelos de comportamiento de perfiles de ju...
CoSECiVi 2020 - Descubrimiento de modelos de comportamiento de perfiles de ju...CoSECiVi 2020 - Descubrimiento de modelos de comportamiento de perfiles de ju...
CoSECiVi 2020 - Descubrimiento de modelos de comportamiento de perfiles de ju...
 
CoSECiVi 2020 - Virtual Reality and Chess. A Video Game for Cognitive Trainin...
CoSECiVi 2020 - Virtual Reality and Chess. A Video Game for Cognitive Trainin...CoSECiVi 2020 - Virtual Reality and Chess. A Video Game for Cognitive Trainin...
CoSECiVi 2020 - Virtual Reality and Chess. A Video Game for Cognitive Trainin...
 
CoSECiVi'16 - Hacia la generación automática de mecánicas de juego: un edito...
CoSECiVi'16 - 	Hacia la generación automática de mecánicas de juego: un edito...CoSECiVi'16 - 	Hacia la generación automática de mecánicas de juego: un edito...
CoSECiVi'16 - Hacia la generación automática de mecánicas de juego: un edito...
 
CoSECiVi'16 - Computación Efímera: identificando retos para la investigación e...
CoSECiVi'16 - Computación Efímera: identificando retos para la investigación e...CoSECiVi'16 - Computación Efímera: identificando retos para la investigación e...
CoSECiVi'16 - Computación Efímera: identificando retos para la investigación e...
 
CoSECiVi'16 - Walking in VR. Measuring Presence and Simulator Sickness in Fir...
CoSECiVi'16 - Walking in VR. Measuring Presence and Simulator Sickness in Fir...CoSECiVi'16 - Walking in VR. Measuring Presence and Simulator Sickness in Fir...
CoSECiVi'16 - Walking in VR. Measuring Presence and Simulator Sickness in Fir...
 
CoSECiVi'16 - Extensión de los grafos de dependencia para incrementar la reju...
CoSECiVi'16 - Extensión de los grafos de dependencia para incrementar la reju...CoSECiVi'16 - Extensión de los grafos de dependencia para incrementar la reju...
CoSECiVi'16 - Extensión de los grafos de dependencia para incrementar la reju...
 
CoSECiVi'16 - Sólo puede quedar uno: Evolución de Bots para RTS basada en sup...
CoSECiVi'16 - Sólo puede quedar uno: Evolución de Bots para RTS basada en sup...CoSECiVi'16 - Sólo puede quedar uno: Evolución de Bots para RTS basada en sup...
CoSECiVi'16 - Sólo puede quedar uno: Evolución de Bots para RTS basada en sup...
 
CoSECiVi'16 - Living-UGR: Una aventura gráfica geolocalizada para difundir el...
CoSECiVi'16 - Living-UGR: Una aventura gráfica geolocalizada para difundir el...CoSECiVi'16 - Living-UGR: Una aventura gráfica geolocalizada para difundir el...
CoSECiVi'16 - Living-UGR: Una aventura gráfica geolocalizada para difundir el...
 
CoSECiVi'16 - Desarrollo de una plataforma basada en Unity3D para la aplicaci...
CoSECiVi'16 - Desarrollo de una plataforma basada en Unity3D para la aplicaci...CoSECiVi'16 - Desarrollo de una plataforma basada en Unity3D para la aplicaci...
CoSECiVi'16 - Desarrollo de una plataforma basada en Unity3D para la aplicaci...
 
CoSECiVi'16 - Educapiz: Una herramienta para educación infantil basada en ser...
CoSECiVi'16 - Educapiz: Una herramienta para educación infantil basada en ser...CoSECiVi'16 - Educapiz: Una herramienta para educación infantil basada en ser...
CoSECiVi'16 - Educapiz: Una herramienta para educación infantil basada en ser...
 
CoSECiVi'15 - Predicting the winner in two player StarCraft games
CoSECiVi'15 - Predicting the winner in two player StarCraft gamesCoSECiVi'15 - Predicting the winner in two player StarCraft games
CoSECiVi'15 - Predicting the winner in two player StarCraft games
 
CoSECiVi'15 - Automatic gameplay testing for message passing architectures
CoSECiVi'15 - Automatic gameplay testing for message passing architecturesCoSECiVi'15 - Automatic gameplay testing for message passing architectures
CoSECiVi'15 - Automatic gameplay testing for message passing architectures
 

Recently uploaded

Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynypptAerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
sreddyrahul
 
The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
Sérgio Sacani
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
Sérgio Sacani
 
Mitosis...............................pptx
Mitosis...............................pptxMitosis...............................pptx
Mitosis...............................pptx
Cherry
 

Recently uploaded (20)

Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
Gliese 12 b, a temperate Earth-sized planet at 12 parsecs discovered with TES...
 
GBSN - Microbiology Lab 2 (Compound Microscope)
GBSN - Microbiology Lab 2 (Compound Microscope)GBSN - Microbiology Lab 2 (Compound Microscope)
GBSN - Microbiology Lab 2 (Compound Microscope)
 
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 Rp
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 RpWASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 Rp
WASP-69b’s Escaping Envelope Is Confined to a Tail Extending at Least 7 Rp
 
The Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdfThe Scientific names of some important families of Industrial plants .pdf
The Scientific names of some important families of Industrial plants .pdf
 
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynypptAerodynamics. flippatterncn5tm5ttnj6nmnynyppt
Aerodynamics. flippatterncn5tm5ttnj6nmnynyppt
 
The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...The importance of continents, oceans and plate tectonics for the evolution of...
The importance of continents, oceans and plate tectonics for the evolution of...
 
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptxPlasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
Plasmapheresis - Dr. E. Muralinath - Kalyan . C.pptx
 
Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...Jet reorientation in central galaxies of clusters and groups: insights from V...
Jet reorientation in central galaxies of clusters and groups: insights from V...
 
GBSN - Microbiology Lab 1 (Microbiology Lab Safety Procedures)
GBSN -  Microbiology Lab  1 (Microbiology Lab Safety Procedures)GBSN -  Microbiology Lab  1 (Microbiology Lab Safety Procedures)
GBSN - Microbiology Lab 1 (Microbiology Lab Safety Procedures)
 
GBSN - Biochemistry (Unit 4) Chemistry of Carbohydrates
GBSN - Biochemistry (Unit 4) Chemistry of CarbohydratesGBSN - Biochemistry (Unit 4) Chemistry of Carbohydrates
GBSN - Biochemistry (Unit 4) Chemistry of Carbohydrates
 
Mitosis...............................pptx
Mitosis...............................pptxMitosis...............................pptx
Mitosis...............................pptx
 
Triploidy ...............................pptx
Triploidy ...............................pptxTriploidy ...............................pptx
Triploidy ...............................pptx
 
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdfMODERN PHYSICS_REPORTING_QUANTA_.....pdf
MODERN PHYSICS_REPORTING_QUANTA_.....pdf
 
Plasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyanPlasma proteins_ Dr.Muralinath_Dr.c. kalyan
Plasma proteins_ Dr.Muralinath_Dr.c. kalyan
 
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
SCHISTOSOMA HEAMATOBIUM life cycle  .pdfSCHISTOSOMA HEAMATOBIUM life cycle  .pdf
SCHISTOSOMA HEAMATOBIUM life cycle .pdf
 
Land use land cover change analysis and detection of its drivers using geospa...
Land use land cover change analysis and detection of its drivers using geospa...Land use land cover change analysis and detection of its drivers using geospa...
Land use land cover change analysis and detection of its drivers using geospa...
 
B lymphocytes, Receptors, Maturation and Activation
B lymphocytes, Receptors, Maturation and ActivationB lymphocytes, Receptors, Maturation and Activation
B lymphocytes, Receptors, Maturation and Activation
 
INSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere UniversityINSIGHT Partner Profile: Tampere University
INSIGHT Partner Profile: Tampere University
 
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
Ostiguy & Panizza & Moffitt (eds.) - Populism in Global Perspective. A Perfor...
 
In-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptxIn-pond Race way systems for Aquaculture (IPRS).pptx
In-pond Race way systems for Aquaculture (IPRS).pptx
 

CoSECiVi 2020 - Parametric Action Pre-Selection for MCTS in Real-Time Strategy Games

  • 1. Parametric Action Pre-Selection for MCTS in Real-Time Strategy Games Abdessamed Ouessai, Mohammed Salem, and Antonio M. Mora University of Mascara, Algeria University of Granada, Spain VI CoSECiVi-2020
  • 2. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 3. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 4. Introduction → First game AI research domain: Classic board games → Evolution of board games is constrained by physics → Video games represent an unconstrained medium → Real-Time Strategy sub-genre concretized abstract board games (Warfare) → RTS Games are an evolution of abstract board games → ++ Concrete | ++ Challenging for humans | ++ Complex for AI 1
  • 5. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 6. RTS Games & AI → Multiplayer, zero-sum, non-deterministic game with imperfect information. → Top-down perspective. Recognizable mouse and keyboard-based UI. General Strategy Gather Build & Train Confront Destruction of Opponent’s Forces Units Structures Resources Victory Condition 2
  • 7. RTS Games & AI → What does an RTS game-playing AI have to deal with? 3 Short decision cycles (~50/s) Simultaneous moves for different units Durative actions (> one decision cycle) Non-determinismPartial observability (opponent & environment) Exponential growth of the decision/state spaces Chess Go StarCraft Branching Factor 36 180 1050 State Space 1047 10171 101685 Real-Time Aspect Uncertainty Complexity Large topographic environments Approximate Estimates
  • 8. RTS Games & AI → Notable developments: → Scripts: Portfolio Greedy Search (Churchill et al, 2013), Puppet Search (Barriga et al, 2015) → Learning: Bayesian Models (Synnaeve et al, 2011), AlphaStar (Vinyals et al, 2019) → Planning: NaïveMCTS (Ontañón, 2013), AHTN (Ontañón and Buro, 2015), CCG (Kantharaju et al, 2018) → Evaluation: CNN (Stanescu et al, 2016), (Barriga et al, 2019) → Competitions: → IEEE CoG (StarCraft & µRTS), AAAI AIIDE (StarCraft), SSCAIT → RTS AI Testbeds: → ORTS – Wargus – BWAPI(SC) – SparCraft – SC2LE – ELF – DeepRTS - µRTS. 4
  • 9. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 10. Monte Carlo Tree Search → An iterative, anytime, sampling-based search framework → Main components: → Tree Policy → Default Policy → Popular variant: → UCT (UCB1 as Tree Policy) → Popular application: → Go (AlphaGo) → Downside: → Scalability issues 5 Tree Policy Reward Default Policy (4) Backpropagation(3) Simulation(2) Expansion(1) Selection
  • 11. Monte Carlo Tree Search → Proposed solutions to enhance MCTS scalability: 6 CMAB Abstraction → Selection phase framed as a Combinatorial Multi-Armed Bandit problem → NaïveMCTS is based on a CMAB formulation and a naïve assumption 𝑎1 𝑎2 𝑎3 … 𝑎 𝑛 𝑣1 𝑣2 𝑣3 … 𝑣 𝑛 𝑢1 𝑢2 𝑢3 … 𝑢 𝑛Units Player Action (𝛼 𝑡) Values 𝑣𝑖 = 𝑛 𝑖=1 𝑉(𝛼 𝑡) (The naïve assumption) → Search the decision space induced by expert-authored scripts instead of the original decision space → Downsides: (1) Sacrifices tactical performance. (2) Performance depends on scripts → Successfully adapts MCTS to combinatorial decision spaces (ex. RTS Games) → Downside: The algorithm is still affected by the dimensionality of the decision space.
  • 12. Monte Carlo Tree Search → Our proposition: → A multi-stage parametric action pre-selection scheme to control the decision space and its granularity → Combine abstraction with CMAB (NaïveMCTS) using small-scale parametric scripts (heuristics) → Define a strategy as a collection of heuristics and parameters 7
  • 13. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 14. Parametric Action Pre-Selection → Expert-authored scripts usually encode a deterministic strategy using a limited portion of the decision space → How to generate novel strategies that can better exploit the available actions? → How to preserve low-level tactical performance? → A strategy is a combination of heuristics 8 Direct offense heuristic Harvest heuristicTrain heuristic Worker Rush Strategy → Heuristic: A parametric single-goal procedure for controlling a sub-group of units → Single unit: ℎ ∈ H ∶ 𝑆 × 𝑈 × 𝐴𝑙 × 𝑅ℎ → 𝐴 𝑘 𝑘 ≤ 𝑙 → 𝑆 : States, 𝑈 : Units, 𝐴 : Unit-Actions, 𝑅ℎ : Parameters → Group of units: applied to each member → In expert-authored scripts, 𝑘 = 1 and 𝑅ℎ = 1
  • 15. Parametric Action Pre-Selection → Action Pre-Selection: Downsizing the decision space by selecting a subset of actions satisfying a certain criterion (strategy), prior to planning → When 𝑘 > 1 the final decision will be made by a a search approach (ex. MCTS) → A unit partitioning 𝑑 ∈ D determines unit groups (manually or automatically) → Each unit group is associated with a heuristic. Heuristics’ output defines the search space 9 Planning (MCTS)Pre-Selected ActionsOriginal Actions Partitioning Heuristics Parameters Action Pre-Selection
  • 16. Parametric Action Pre-Selection → The general algorithm: → Pre-selected actions are refined over successive phases → Parametric Action Pre-Selection: 𝑇(𝑠, 𝑈, 𝐴0, 𝑥1, … , 𝑥 𝑛) with 𝑥𝑖(𝐴𝑖−1, 𝑑𝑖, 𝐻𝑖, 𝜃𝑖) → A strategy can be expressed as: 𝜎 = (𝑑1, … , 𝑑 𝑛, 𝐻𝑖, … , 𝐻 𝑛, 𝜃1, … , 𝜃 𝑛) 10 A d1 g1 gm1 H1 h1 hm1 A Ò1 d2 g1 gm2 H2 h1 h m2 Ò2 A n-110 dn g1 gmn H n h1 hmn Òn Game State s Units U A n Search Execution 𝑥1 𝑥2 𝑥 𝑛 𝑇
  • 17. Parametric Action Pre-Selection → Proposed implementation: ParaMCTS → A 2-phase action pre-selection process using NaïveMCTS for search → Inspired by the macro- and micro-management task decomposition → 47 parameter govern the behaviour of ParaMCTS, tuned manually → NaïveMCTS enhancement: Inactive player-action pruning (previous study) 11 Groups Heuristics Parameters Harvesters <Harvest> maxU, buildMode, pf, … Offense <Attack> maxU, offMode, maxTargets, pf, … Defense <Defend> maxU, defMode, defPerimeter, pf, … Structures <Train> maxU, trainMode, … Groups Heuristics Parameters Front-Line <Front-Line Tactics> maxU, waitDuration, … Back <Back Tactics> waitDuration, … Phase-1 (𝑥1) Phase-2 (𝑥2) NaïveMCTS
  • 18. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 19. Experiments & Results → How can MCTS benefit from the downsized decision space? → Should we increasing the playout duration, the maximum search depth, or both? By how much? → How does the performance of ParaMCTS compare to state-of-the-art agents? → Experiments setting: → Computation budget: 100𝑚𝑠 per game cycle, Maps: basesWorkers 8 × 8, 16 × 16, 32 × 32 → Tested maximum search depths: {10, 15, 20, 30, 50}. Tested playout durations: {100, 150, 200, 300, 500} 12 → A lightweight, AI research-focused RTS simulator → Open source, written in Java by Santiago Ontañón → Includes a forward model and many baseline agents → Subject of a yearly AI competition as part of IEEE CoG Testbed: µRTS (or microRTS)
  • 20. Experiments & Results → Experiments 1: Two 120 iteration round-robin tournaments 1) Between ParaMCTS variants with a fixed playout duration (100 cycles) and different max search depths 2) Between ParaMCTS variants with a fixed max search depth (10) and different playout duration → Total matches: 4800 in each map. Score = Wins + Draws / 2, normalized. → Results: 13
  • 21. Experiments & Results → Experiment 2: Maximum search depth and playout duration combinations → 100 match between each ParaMCTS(search depth, playout duration) variant and MixedBot → Sides switched after 50 matches. ParaMCTS implements a similar strategy to MixedBot → Total matches: 2500 in each map → Results: 14
  • 22. Experiments & Results → Experiment 3: Vs. state-of-the-art. → 100 iteration round-robin tournament → Participants: → ParaMCTS → MixedBot → Izanagi → Droplet → NaïveMCTS* → NaïveMCTS → Total Matches: 3000 in each map → 11.9 to 19.1 overall margin 15 Top ranking agents from 2019’s µRTS competition Same hyperparameters as ParaMCTS Using best hyperparameters
  • 23. → Introduction → RTS Games & AI → Monte Carlo Tree Search → Parametric Action Pre-Selection → Experiments & Results → Conclusion & Future Work Overview
  • 24. Conclusion & Future Work → Parametric action pre-selection describes a general action/state abstraction framework, applicable to any game with similar characteristics to RTS games → Using heuristics instead of scripts grants greater flexibility → A proposed implementation, ParaMCTS, significantly outperformed state-of-the-art agents, using manually tuned parameters → Recovered computation budget is better used for deeper search 16 Future Work → ParaMCTS parameter optimization for different objectives (maps, opponents, …) → Dynamic parameter adaptation through RL → Heuristic/partitioning discovery → Difficulty adjustment given adequate heuristics and parameters