The document describes several case-based planning platforms for real-time strategy (RTS) games:
1. The CAT system uses case-based reasoning to learn to win the RTS game Wargus by defeating opponents it was both trained on and novel opponents.
2. The Darmok system retrieves relevant past cases based on the current game state's shallow and deep features to determine a plan to satisfy the goal of winning.
3. These systems represent cases as goals, states, plans and performances to reuse past experiences to determine the best tactics in new game situations.
4. Case Representation Example of a case Goal: ResourceGoal(Gold, MaxInfluence) State: Number of gold mines Distance between gold mines and nearest gold mine storer Number of free or useless peons Plan: Train(3, peon) Assign(3, peons, GoldMiner) Build(Farm) Build(GoldMineCollector, 200) Performance: Number of gold collected in 1 minute
5. CBP – CAT Abstract The Case-Based Tactician (CAT) system, created by Aha, Molineaux, and Ponsen (2005), uses case-based reasoning to learn to win the real-time strategy game Wargus. Previous work has shown CAT’s ability to defeat a randomly selected opponent from a set against which it has trained. We now focus on the task of defeating a selected opponent while training on others. We describe CAT’s algorithm and report its cross-validation performance against a set of Wargus opponents.
10. Game State.Winning (i.e., by destroying all the enemy units and buildings) requires managing three key resources: buildings, the workforce, and an army. The decision space is the set of possible actions that can be executed at a particular moment
11. CBP - CAT We estimate this (action space) as O(2W(A*P) +2T(D+S) + B(R+C)), Where: W is the current number of workers. A is the number of assignments workers can perform (e.g., create a building, gather gold) P is the average number of workplaces. T is the number of troops (fighters plus workers). D is the average number of directions that a unit can move. S is the choice of troop’s stance (i.e., stand, patrol, attack). B is the number of buildings. R is the average choice of research objectives at a building. C is the average choice of units to create at a building.
14. CBP - CAT Idea of breaking game into periods in order to current available buildings. Building state is time between the constructions of such building to the time the next is built. Building state defines the set of actions available to the player at any one time. In contrast, CAT performs no adaptation during reuse, but does perform case acquisition. Also, CAT focuses on winning a game rather than on performing a subtask.
15. CBP - CAT CAT retrieves cases when a new state in the lattice is entered. The similarity between a stored case C and the current game state S is defined as: SimC, S = (CPerformance/dist(CDescription, S)) - dist(CDescription, S) where dist() is the (unweighted, unnormalized) Euclidean distance between two cases for the eight features. However, to gain experience with all tactics in a state, case retrieval is not performed until each available tactic at that state is selected e times, where e is CAT’s exploration parameter. During exploration, CAT randomly retrieves one of the least frequently used tactics for reuse. Exploration also takes place whenever the highest Performance among the k-nearest neighbors is below 0.5.
16. CBP - CAT Then after applying the case we evaluate by:
17. CBP - CAT Evaluation yields the Performance of a case’s Tactic, which is measured at both a local and global level. That is, CAT records the WARGUS game score for both the player and opponent at the start of each BuildingState and at the game’s end, which occurs when one player eliminates all of the other’s units and buildings. In retaining C’ if we found C with same <Description, Tactic> then we update it. Otherwise create new case
18. CBP - Darmok Darmok starts the execution with the initial goal of “WinWargus”. The system Retriever will try to return a plan to satisfy this goal by going on the following 4 steps:
19. CBP - Darmok 1- Define the current Situation Game State best-first greedy hill-climbing algorithm shallow features a Example of extracted shallow features: lumber (number of trees in the map), food (amount of food), gold (amount of gold of the player), peasants (number of peasants) and units (number of units the player has) Game state Shallow features values Situation b In this example , According to the values of these features we predict that the current situation is BEGINNING
20. CBP - Darmok 2- Return a set of cases related to the current situation Set of cases Case base Situation Returns all cases which have Situation = BEGINNING , and Goal = WINWARGUS
24. CBP - Darmok Shallow Features VS Deep Features Shallow Features Some features used to define the current game situation. Deep features Some features used to discriminate between some cases.
25. References Defeating Novel Opponents in a Real-Time Strategy Game - 2005.pdf – David W. Aha Learning to Win - Case-Based Plan Selection in a RTS Game- 2005.pdf - David W. Aha Case-Based Planning and Execution for RTS Games - 2007.pdf - Santiago Onta˜n´on Situation Assessment for Plan Retrieval in RTS Games - 2009.pdf - Santiago Onta˜n´on