COMMAND AGENTS THAT MAKE HUMAN-LIKE DECISIONS FOR NEW TACTICAL SITUATIONS Masood Raza and Dr  Venkat V S S Sastry
RPD architecture
Soar Cognitive architecture
RPD and Soar Cognitive architecture
RPD-Soar Agent four inputs Three hidden layers, 12 neurons # known  situations
Speaker notes <ul><li>In rule based systems the antecedents of a production rule have to match exactly for the production ...
Experiment – Environment Use NN for recognizing the situation,  not to  produce a plan directly
Experiment - 1 <ul><li>  </li></ul>
Experiment - 2 <ul><li>The neural net is used to prioritize the strategies according to the recognition value of the situa...
Analysis  <ul><li>  </li></ul>
Analysis <ul><li>10 out of 12 situations are correctly recognized </li></ul><ul><li>Situation on the right does not take a...
Summary <ul><ul><li>The neural net successfully recognizes the closest known situation. </li></ul></ul><ul><ul><li>As the ...
Upcoming SlideShare
Loading in...5
×

Command Agents That make Human Like Decisions F

148

Published on

human like decision making using neural networks, RPD and Soar

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
148
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • motivation why human-like decisions? what are tactical situiations? The use of simulation in the military is increasing for training, systems analysis, systems acquisition, evaluation of doctrines and command decision making. As human-in-the-loop simulations are time and personnel intensive, therefore, the need to represent human behavior in military simulations is becoming imperative. Realistic human behavior representation can contribute to the development of more realistic and thus more useful military simulations. Human-like decision making based on correct situational awareness can add realism to human behavior representation for military simulations. Military decision making process ( MDMP ) is based on Multi-attribute utility analysis ( MAUA ), which is a classical decision making approach and forms the basis of decision making model of most present day military simulations. MDMP is very procedural and often cumbersome process. When under stress due to time pressure and in dynamic situations, military commanders use their experience and abbreviate this process and adopt a naturalistic decision making model described by Klein and associates as Recognition primed decision making ( RPD ) [2]. Recognition primed decision making model is suitable for experienced commanders [3]. The basic concept of RPD is the recognition of a situation in a changing context, recalling a suitable course of action for this situation, and then either implementing the course of action straight away or after mentally simulating it to find out if it will work properly [4].
  • The simplest, and probably most common, case within the RPD model is Level 1 (Figure 1), where a decision-maker sizes up a situation, forms expectancies about what&apos;s going to happen next, determines the cues that are most relevant, recognizes the reasonable goals to pursue in the situation, recognizes a typical CoA that is likely to succeed and carries it out. Level 2 is a more difficult case, in which the decision-maker isn&apos;t certain about the nature of the situation. Perhaps some anomaly arises that violates expectancies and forces the decision-maker to question whether the situation is different from what it seems, or perhaps uncertainty might be present from the beginning. Here, decision-makers do deliberate about what&apos;s happening. Level 3 of RPD model is the case in which decision maker arrives at an understanding of a situation and recognizes a typical CoA and then evaluates it by mentally simulating what will happen when it is carried out. In this way, if he spots weaknesses in the plan, he can repair it and improve the plan, or throw it away and evaluate the next plausible action. The model has been tested in variety of applications including fireground command, battle planning, critical care nursing, corporate information management, and chess tournament play.
  • Soar is a symbolic cognitive architecture for general intelligence. It has been used for creating intelligent forces for large and small scale military simulations. Soar is a forward chaining rule based system. Both the declarative and procedural knowledge are represented as production rules. The production rules are condition-action pair. The long term memory ( LTM ) is composed of production rules while the short term memory ( STM ) contains only declarative knowledge. STM in Soar is also the working memory ( WM ) that holds all the dynamic data structures in the form of an identifier, attribute and value triplet called working memory elements ( WME ). The value in a WME may also be an identifier connecting to another attribute, thereby, forming a connected graph. Unlike some rule based systems, Soar is a parallel rule matching and parallel rule firing system. Impasse in Soar is the architecturally detected lack of available knowledge. It has a single architecturally supported goal stack. Each sub-goal creates a sub-state to bring to bear the knowledge to achieve the goal. Soar uses this hierarchical sub-goaling for task decomposition. This capability may also be used to contextualize the situation.
  • Soar provides a convenient framework to model most of the aspects of the RPD model. The elaboration phase in Soar decision cycle is used for situation awareness and the problem space based architecture, automatic sub-goaling and creation of sub-states due to impasse is used for mental simulation
  • The environment is developed in the Java object oriented programming language, the RPD model is implemented in the Soar cognitive architecture and the agent and the environment are interfaced with Soar mark-up language ( SML ). A trained artificial neural network is also integrated with the agent architecture to enhance the ability of the agent in handling new situations. The experiences of the command agent are stored in the LTM in the form of production rules. The success values for the courses of action for specific situations are represented numerically. All atomic actions, such as move, turn, fire, etcetera, expected to be performed by the agent in a simulation are coded by the modeller. The selection of an action for a specific situation in pursuit of single or multiple goals based on corresponding success values is the task of the RPD-Soar agent which forms the behaviour of the agent. This behaviour emerges at the simulation run time.
  • In rule based systems the antecedents of a production rule have to match exactly for the production to fire. If the current situation deviates from the conditions in the rule then the appropriate rule does not fire. Due to rule matching through an efficient algorithm like RETE and also advances in computer technology it is possible in Soar to add a large number of production rules to handle generalization. The RETE algorithm efficiently solves the many-to-many matching problem encountered when rules are matched to facts (Forgy, 1982). Writing large number of rules is possible but is not an efficient method of solving this problem. Alternate approaches like similarity-based generalization, fuzzy logic and artificial neural network can solve this problem in a more efficient way. In this implementation, an artificial neural network is used for situation recognition. There are two reasons for using an artificial neural network in this implementation: first, it has already been used for a similar task with promising results (Liang et al., 2001); second, it has the ability to automatically prioritize the situations according to their level of similarity.  The situations are fed to the trained artificial neural network which matches the new situation to one of the known situations and gives the agent a recognized situation. The recognized situation has the complete set or a subset of its four constituents that are goals, courses of action, cues, and expectations. The agent selects the course of action for the situation and implements it with the help of lower level actions selected through mental simulation if required. The neural network is trained for each agent based on the range of situations it is likely to face. Motivated from the work of Liang et al. (2001), the neural net is a multi-layered normal feed forward network. It consists of an input layer of four nodes, three hidden layers of twelve nodes each, and the number of nodes in the output layer depends upon the number of known situations. The number of nodes in the output layer varies from situation to situation. For the experiments conducted in this research the configuration of the input layer and the hidden layers is not changed but these layers may also be reconfigured if required.
  • The vignette for this experiment is motivated from the work of Liang et al. (2001). The domain is a military ground based operation. A military commander of a troop of tanks consisting of three tanks selects a strategy to attack the enemy tank in the north. The terrain is simple and it can have 0, 1, or 2 passable hills. The terrain with two hills is shown in Figure 2; the enemy is represented with a red square in the north at location (0, 1) and own position is the blue circle in the south at the origin. The agent’s own starting position and the enemy position remains the same through out the experiment. The commander selects a strategy based on the decisions that whether to divide the troop of tanks in an assault group ( AG ) and a fire support ( FS ) group or to use them as one group only. The commander also selects the intermediate and the final locations or location of these groups or group which also dictate the route to be adopted by the group(s). Use NN for recognizing the situation rather than prediction of plan The neural net in this experiment is used for pattern recognition and not for plan generation because, as Liang et al. also realize, the option to generate plan directly from the trained neural network does not prove to be successful. In this experiment the target for NN training in each case is the numeric value of ‘1’ for the output node corresponding to the recognized situation and ‘0’ for the rest of the output nodes. It is assumed that such a clear difference between two target values will produce better results compared to the mixed target values for the output nodes corresponding to different strategies in the work of Liang et al. (2001). Moreover, there is potential advantage in this representation for an RPD model. The advantage in this design is that for a given situation the output node with the highest value is considered as the recognized situation and if the evaluation of the corresponding course of action through mental simulation is not promising then the output node with the second highest value may be considered. The basic situations and corresponding strategies in the work of Liang et al. (2001) is used but some strategies are modified and some more strategies are added in the training set for this experiment based on the knowledge of the author on the subject. The reason for the addition of examples in the training set is to improve the performance of the net in recognition of new situations which are related to number of training examples and also to sufficiently cover the problem space. Each case giving the locations of hills is associated to a situation and there is a corresponding strategy for each situation. The neural net only recognizes the situation and then the agent retrieves the corresponding strategy. The neural net part of the agent is required to recognize new situations produced in the environment. In order to test the ability of the trained neural net in recognizing new situations we developed a set of situations that exhaustively covers the problem space. In the battlefield, the locations of enemy and own positions are fixed. The situational variables are the locations of hills in the terrain. As the locations of the hills are changed in the terrain, new situations are generated. The new situations are the ones that are not included in the training examples and for which the agent has not been trained .
  • The neural net in this experiment recognizes the situation and gives the plan to RPD-Soar agent which in turn produces a strategy. RPD-Soar agent implements the attack plan by selecting lower level actions, such as finding the way to the final locations of fire support and assault groups, to achieve the goal using mental simulation. In order to the explore the problem space we fixed one hill and moved the other hill on the given terrain. As the hill moves new situations are generated and the neural net generates suitable strategies for the new situations by recognizing known situations and the related strategies. One such experiemnt is shown in the next slide.  
  • The fire support group in this Strategy is taking the advantage of the presence of hill. The agent considers it an acceptable plan when the blue force makes use of one out of two hills available in the area. This decision is based on the theory of satisficing and not optimizing. Satisficing is the hall mark of recognition primed decision making. Some times it happens so that if you discard an acceptable plan and try the strategy with the next highest recognition value then this new strategy is providing the force with even lesser advantage than the previous strategy. This situation under discussion clearly demonstrates this case. The agent in this case accepts the previous plan because it made use of one hill but if it did not accept it then the next Strategy in line is shown in the next slide. Human behaviour representation in military simulations is not sufficiently realistic, specially the decision making by synthetic military commanders. The decision making process lacks realistic representation of variability, flexibility, and adaptability exhibited by a single entity across various episodes. It is hypothesized that a widely accepted naturalistic decision model, suitable for military or other domains with high stakes, time stress, dynamic and uncertain environments, based on well established cognitive architecture can address some of these deficiencies. And therefore, we have developed a computer implementation of Recognition Primed Decision Making (RPD) model using Soar cognitive architecture and it is referred to as RPD-Soar agent in this paper. The results of the experiments on RPD-Soar agent clearly demonstrate the ability of the model to improve realism in representing human decision making behaviour by exhibiting the ability to recognise a situation in a changing context, handle new situations, flexible and adaptive  in the decision making process. The experiments account for variability within and across individuals. Due to the ability of the RPD-Soar agent to mentally simulate applicable courses of action it is possible for the agent to handle new situations by mentally simulating all available courses of action and selecting a suitable course of action out of them. In order to make quick decisions the number of courses of action to be mentally simulated need to be kept at a minimum. This is achieved by integrating a trained neural network with the RPD-Soar agent. This trained neural network prioritizes the available courses of action in the order of their suitability to the present situation based on its training examples. RPD-Soar agent evaluates the course of action and if it satisfices the agent implements it otherwise throws it away and selects the next course of action in the order. The proposed implementation is evaluated using prototypical scenarios arising in command decision making in tactical situations of military land operations.
  • In most cases i.e., 10 out of 12 the neural net recognizes the situation correctly which means it identifies the closest known situation. But some times the hills in this new situation are so located that the assault and/or fire support group can not take advantage of the hills for cover from observation and fire from the enemy. This happens when the hills in the new situation are close to the known situation for which a strategy exists and is also recognized but the off set is such that the hill is in the south or east or west of the group&apos;s position and is not located in between the group position and the enemy. The RPD-Soar agent then evaluates the recognized situation and if it is not taking advantage of any of the hills present in the vicinity of the enemy (cover from enemy observation and fire is an overwhelming factor in selecting a strategy to implement) then it discards this strategy and tries the strategy which has the next highest recognition value. That was Strategy number 9 but the blue force is not taking any advantage of the hills therefore the agent discards the Strategy and tries the Strategy that has the next highest recognition value.
  • Instead of straight away discarding the plan with the highest recognition value it should be modified and then evaluated again. And then if it still does not satisfice then it should be discarded and the next in line be evaluated.
  • Transcript of "Command Agents That make Human Like Decisions F"

    1. 1. COMMAND AGENTS THAT MAKE HUMAN-LIKE DECISIONS FOR NEW TACTICAL SITUATIONS Masood Raza and Dr Venkat V S S Sastry
    2. 2. RPD architecture
    3. 3. Soar Cognitive architecture
    4. 4. RPD and Soar Cognitive architecture
    5. 5. RPD-Soar Agent four inputs Three hidden layers, 12 neurons # known situations
    6. 6. Speaker notes <ul><li>In rule based systems the antecedents of a production rule have to match exactly for the production to fire. If the current situation deviates from the conditions in the rule then the appropriate rule does not fire. Due to rule matching through an efficient algorithm like RETE and also advances in computer technology it is possible in Soar to add a large number of production rules to handle generalization. The RETE algorithm efficiently solves the many-to-many matching problem encountered when rules are matched to facts (Forgy, 1982). Writing large number of rules is possible but is not an efficient method of solving this problem. Alternate approaches like similarity-based generalization, fuzzy logic and artificial neural network can solve this problem in a more efficient way. In this implementation, an artificial neural network is used for situation recognition. There are two reasons for using an artificial neural network in this implementation: first, it has already been used for a similar task with promising results (Liang et al., 2001); second, it has the ability to automatically prioritize the situations according to their level of similarity.  </li></ul><ul><li>The situations are fed to the trained artificial neural network which matches the new situation to one of the known situations and gives the agent a recognized situation. The recognized situation has the complete set or a subset of its four constituents that are goals, courses of action, cues, and expectations. The agent selects the course of action for the situation and implements it with the help of lower level actions selected through mental simulation if required. </li></ul><ul><li>The neural network is trained for each agent based on the range of situations it is likely to face. Motivated from the work of Liang et al. (2001), the neural net is a multi-layered normal feed forward network. It consists of an input layer of four nodes, three hidden layers of twelve nodes each, and the number of nodes in the output layer depends upon the number of known situations. The number of nodes in the output layer varies from situation to situation. For the experiments conducted in this research the configuration of the input layer and the hidden layers is not changed but these layers may also be reconfigured if required. </li></ul>
    7. 7. Experiment – Environment Use NN for recognizing the situation, not to produce a plan directly
    8. 8. Experiment - 1 <ul><li>  </li></ul>
    9. 9. Experiment - 2 <ul><li>The neural net is used to prioritize the strategies according to the recognition value of the situation given to the agent. </li></ul><ul><li>The RPD - Soar agent is used to reason with the plan. </li></ul><ul><li>The RPD-Soar agent evlauates the recognized situation and if it is not taking advantage of any of the hills present in the vicinity of the enemy then it discards this strategy and tries the strategy which has the next highest recognition value . </li></ul><ul><li>A set of twelve new situations are presented to the agent that sufficiently explores the problem space. </li></ul>
    10. 10. Analysis <ul><li>  </li></ul>
    11. 11. Analysis <ul><li>10 out of 12 situations are correctly recognized </li></ul><ul><li>Situation on the right does not take advantage of the hills, and the Soar-RPD agent picks the next highest matching situation </li></ul>
    12. 12. Summary <ul><ul><li>The neural net successfully recognizes the closest known situation. </li></ul></ul><ul><ul><li>As the strategy is fixed for known situation therefore some times the hills are located in such a way in a new situation that although the fire support or the assault group is very close to the hill but is not able to take tactical advantage from the hill. </li></ul></ul><ul><ul><li>Satisficing seems a good strategy and reduces the search in the problem space. </li></ul></ul><ul><ul><li>Consider re-evaluation of plans, instead of discarding them </li></ul></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×