Successfully reported this slideshow.

Intelligent Agents: Technology and Applications

1,266 views

Published on

  • Be the first to comment

Intelligent Agents: Technology and Applications

  1. 1. Intelligent Agents: Technology and Applications Multi-agent Learning IST 597B Spring 200 3 John Yen
  2. 2. Learning Objectives <ul><li>How to identify goals for agent projects? </li></ul><ul><li>How to design agents? </li></ul><ul><li>How to identify risks/obstacles early on? </li></ul>
  3. 3. Multi-Agent Learning
  4. 4. Multi-Agent Learning <ul><li>The learned behavior can be used as a basis for more complex interactive behavior </li></ul><ul><li>Enables agent to participate in higher level collaborative or adversarial learning situations </li></ul><ul><li>Learning would not be possible if the agent was isolated </li></ul>
  5. 5. Examples <ul><li>Examples of single agent learning in a multi-agent environment: </li></ul><ul><li>Reinforcement Learning agent which incorporates information gathered by another agent (Tan, 93) </li></ul><ul><li>Agent learning negotiating techniques of another using Bayesian Learning (Zeng & Sycara, 96) </li></ul><ul><ul><li>Class of multi-agent learning in which an agent attempts to model another agent </li></ul></ul>
  6. 6. Examples <ul><li>Training scenario in which a novice agent learns from a knowledgeable agent (Clouse, 96) </li></ul><ul><li>A common thing among all the examples is that the learning agent is interacting with other agents </li></ul>
  7. 7. Predator/Pray (Pursuit) Domain <ul><li>Introduced by Bends et. al (86) </li></ul><ul><li>Four predators and one prey </li></ul><ul><li>Goal: to capture (or surround) the prey </li></ul><ul><li>Not a complex real-world, toy domain that helps concretize concepts </li></ul>
  8. 8. Predator/Pray (Pursuit) Domain
  9. 9. Taxonomy of MAS <ul><li>Taxonomy organized along </li></ul><ul><ul><li>the degree of heterogeneity, and </li></ul></ul><ul><ul><li>the degree of communication </li></ul></ul><ul><li>Homogenous, Non-Communicating Agents </li></ul><ul><li>Heterogeneous, Non-Communicating Agents </li></ul><ul><li>Homogenous, Communicating Agents </li></ul><ul><li>Heterogeneous, Communicating Agents </li></ul>
  10. 10. Taxonomy of MAS
  11. 11. Taxonomy of MAS
  12. 12. 1. Homogenous, Non-Communicating Agents <ul><li>All agents have the same internal structure </li></ul><ul><ul><ul><li>Goals </li></ul></ul></ul><ul><ul><ul><li>Domain knowledge </li></ul></ul></ul><ul><ul><ul><li>Actions </li></ul></ul></ul><ul><li>The only difference is their sensory input and the actions that they take </li></ul><ul><ul><ul><li>They are situated differently in the world </li></ul></ul></ul><ul><li>Korf (1992) introduces a policy for each predictor based on an attractive force to the prey and a repulsive force from other preditors </li></ul>
  13. 13. 1. Homogenous, Non-Communicating Agents <ul><li>Korf concludes that explicit cooperation is not necessary </li></ul><ul><li>Haynes & Sen show that Korf’s heuristic does not work for certain instantiation of the domain </li></ul>
  14. 14. 1. Homogenous, Non-Communicating Agents <ul><li>Issues: </li></ul><ul><li>Reactive vs. deliberative agents </li></ul><ul><li>Local vs. global perspective </li></ul><ul><li>Modeling of other agents </li></ul><ul><li>How to affect others </li></ul><ul><li>Further learning opportunities </li></ul>
  15. 15. 1: Reactive vs. Deliberative Agents <ul><li>Reactive agents do not maintain an internal state and simply retrieve pre-set behaviors </li></ul><ul><li>Deliberative agents maintain an internal state and behave by searching through a space of behaviors, predicting the action of other agents and the effect of actions </li></ul>
  16. 16. 2: Local vs. Global Perspective <ul><li>How much sensory input should be available to agents? (observability) </li></ul><ul><li>Having a global view might lead to sub-optimal results </li></ul><ul><li>Better performance by agents with less knowledge: “Ignorance is Bliss” </li></ul>
  17. 17. 3: Modeling of Other Agents <ul><li>Since agents are identical, they can predict each others actions given the sensory input </li></ul><ul><li>Recursive Modeling Method: to model the internal state of another agent in order to predict its actions </li></ul><ul><li>Each predator bases its move on the predicted move of other predators and vice versa </li></ul><ul><li>Since reasoning can recurse indefinitely, it should be limited in terms of time or recursion </li></ul>
  18. 18. 3: Modeling of Other Agents <ul><li>If agents know too much, RMM could recurse indefinitely </li></ul><ul><li>For coordination to be possible, some potential knowledge must be ignored </li></ul><ul><li>Schmidhuber (1996) shows that agents can cooperate without modeling each other </li></ul><ul><li>They consider each other as part of the environment </li></ul>
  19. 19. 4: How to Affect Others <ul><li>Without communication, agents cannot affect each other directly </li></ul><ul><li>Can affect each other indirectly in several ways </li></ul><ul><li>They can be sensed by other agents </li></ul><ul><li>Change the state of another agent (e.g. by pushing it) </li></ul><ul><li>Affect each other by stigmergy (Becker, 94) </li></ul>
  20. 20. 4: How to Affect Others <ul><li>Active stigmergy : </li></ul><ul><ul><li>an agent alters the environment so as to effect the sensory input of another agent. E.g. an agent might leave a marker for other agents to observe </li></ul></ul><ul><li>Passive stigmergy : </li></ul><ul><ul><li>altering the environment so that the effect of another agents’ actions change. If an agent turns of the main water valve of a building, the effect of another agent turning on the faucet is altered </li></ul></ul>
  21. 21. 4: How to Affect Others <ul><li>Example : A number of robots in an area with many pucks scattered around. Robots reactively move straight (turning at walls) until they are pushing 3 or more pucks. Then they back up and turn away </li></ul><ul><li>Although robots do not communicate, they can collect the pucks in a single pile over time </li></ul><ul><li>When a robot approaches an existing pile, it adds the pucks and turns away </li></ul><ul><li>A robot approaching an existing pile obliquely might take a puck away, but over time the desired result is accomplished </li></ul>
  22. 22. 5: Further Learning Opportunities <ul><li>An agent might try to learn to take actions that will not help it directly in the current situation, but may allow other agents to be more effective in the future. </li></ul><ul><li>In Traditional RL, if an action leads to a reward by another agent, the acting agent may have no way of reinforcing that action </li></ul>
  23. 23. 2. Heterogeneous, Non-Communicating Agents <ul><li>Can be heterogeneous in any of following: </li></ul><ul><ul><ul><li>Goals </li></ul></ul></ul><ul><ul><ul><li>Actions </li></ul></ul></ul><ul><ul><ul><li>Domain knowledge </li></ul></ul></ul><ul><li>In the pursuit domain, the prey can be modeled as an agent </li></ul><ul><li>Haynes et. al. have used GA and case-based reasoning to make predators learn to cooperate in absence of communication </li></ul>
  24. 24. 2. Heterogeneous, Non-Communicating Agents <ul><li>They also explore the possibility of evolving both predators and the prey </li></ul><ul><ul><ul><li>Predators use Korf’s greedy heuristic </li></ul></ul></ul><ul><li>Though one might think this will result in repeated improvement of predator and prey with no convergence, a prey behavior emerges that always succeeds </li></ul><ul><ul><ul><li>Prey simply moves in a constant straight line </li></ul></ul></ul><ul><li>Haynes et. al. conclude Korf’s greedy algorithm relies on random prey movement </li></ul>
  25. 25. 2. Heterogeneous, Non-Communicating Agents <ul><li>Issues: </li></ul><ul><li>Benevolence vs. competitiveness </li></ul><ul><li>Fixed vs. learning agents </li></ul><ul><li>Modeling of other agents </li></ul><ul><li>Resource management </li></ul><ul><li>Social conventions </li></ul>
  26. 26. 1: Benevolence vs. Competitiveness <ul><li>Can be benevolent even if they have different goals (if they are willing to help each other) </li></ul><ul><li>Selfish agents: more effective and biologically plausible </li></ul><ul><li>Agents cooperate because it is in their own best interest </li></ul>
  27. 27. 1: Benevolence vs. Competitiveness <ul><li>Prisoners dilemma : two burglars are captured. Each has to choose whether or not to confess and implicate the other. If neither confess, they will both serve 1 year. If both confess they will both serve 10 years. If one confesses and the other does not, the one who has collaborated will go free and the other will serve for 20 years </li></ul>
  28. 28. 1: Benevolence vs. Competitiveness
  29. 29. 1: Benevolence vs. Competitiveness <ul><li>Each agent will decide to confess to maximize its own interest </li></ul><ul><li>If both confess, they will get 10 years each </li></ul><ul><li>If they had acted “irrationally” and kept quiet, they would each get 1 year </li></ul><ul><li>Mor et.al. (1995) show that in repeated prisoner’s dilemma cooperative behavior can emerge </li></ul>
  30. 30. 1: Benevolence vs. Competitiveness <ul><li>In zero-sum games cooperation is not sensible </li></ul><ul><li>If a third dimension was to be added to the taxonomy, besides the degree of heterogeneity and communication, it would be benevolence vs. competitiveness </li></ul>
  31. 31. 2: Fixed vs. Learning Agents <ul><li>Learning agents desirable in dynamic environments </li></ul><ul><li>Competitive vs. cooperative learning </li></ul><ul><li>Possibility of “ arms race ” in competitive learning. Competing agents continually adapt to each other in more and more specialized ways, never stabilizing at a good behavior </li></ul>
  32. 32. 2: Fixed vs. Learning Agents <ul><li>Credit-assignment problem : when performance of an agent improves, it is not clear whether the improvement is due to an improvement in the agent’s behavior or a negative behavior in the opponent’s behavior. Same problem if the performance of an agent gets worse. </li></ul><ul><li>One solution is to fix the one agent while allowing the other to learn and the to switch. Encourages more arms race than ever! </li></ul>
  33. 33. 3: Modeling of other agents <ul><li>Goals, actions and domain knowledge of other agents may be unknown and need modeling </li></ul><ul><li>Without communication, modeling is done strictly through observation </li></ul><ul><li>RMM is good for modeling the states of homogenous agents </li></ul><ul><li>Tambe (1995) takes it one step further, studying how agents can learn models of teams of agents </li></ul>
  34. 34. 4: Resource Management <ul><li>Examples: </li></ul><ul><ul><li>Network traffic problem: several agents send information through the same network (GA) </li></ul></ul><ul><ul><li>Load balancing: several users have limited amount of computing power to share among them (RL) </li></ul></ul><ul><li>Braess’ Paradox (Glance et. al., 1995): adding more resources to a network but getting worse performance </li></ul>
  35. 35. 5: Social Conventions <ul><li>Imagine you are to meet a friend in Paris. You both arrive on the same day but were unable to get in touch to set a time and place. Where will you go and when? </li></ul><ul><li>75% of audience at AAAI-95 Symposium on Active Learning answered (without prior communication) they would go to Eiffel tower at noon. </li></ul><ul><li>Even without communication agents are able to coordinate actions </li></ul>
  36. 36. 3. Homogenous, Communicating Agents <ul><li>Communication can be either broadcast or point-to-point </li></ul><ul><li>Issues: </li></ul><ul><ul><li>Distributed sensing </li></ul></ul><ul><ul><ul><li>Distributed vision project (Matsuyama, 1997) </li></ul></ul></ul><ul><ul><ul><li>Trafficopter system (Moukas et. al., 1997) </li></ul></ul></ul><ul><ul><li>Communication content </li></ul></ul><ul><ul><ul><li>What they should communicate? states, or goals? </li></ul></ul></ul><ul><ul><li>Further learning opportunities: </li></ul></ul><ul><ul><ul><li>When to communicate? </li></ul></ul></ul>
  37. 37. 4. Heterogeneous, Communicating Agents <ul><li>Tradeoff between cost and freedom </li></ul><ul><li>Osawa suggests predators should go through 4 phases: </li></ul><ul><ul><li>Autonomy, communication, negotiation, and control </li></ul></ul><ul><ul><li>When they stop making progress using one strategy, they should move to the next expensive strategy </li></ul></ul><ul><li>Increasing order of cost (decreasing order of freedom) </li></ul>
  38. 38. 4. Heterogeneous, Communicating Agents <ul><li>Important issues: </li></ul><ul><ul><li>Understanding each other </li></ul></ul><ul><ul><li>Planning communication acts </li></ul></ul><ul><ul><li>Negotiation </li></ul></ul><ul><ul><li>Commitment/decommitment </li></ul></ul><ul><ul><li>Further learning opportunities </li></ul></ul>
  39. 39. 1: Understanding Each Other <ul><li>Need some set protocol for communication </li></ul><ul><li>Aspects of the protocol: </li></ul><ul><li>Information content: KIF (Genesereth, 92) </li></ul><ul><li>Message Format: KQML (Finin, 94) </li></ul><ul><li>Coordination: COOL (Barbuceanu, 95) </li></ul>
  40. 40. 2: Planning Communication Acts <ul><li>The theory of communication as action is called speech acts </li></ul><ul><li>Communication acts have precondition and effects </li></ul><ul><li>Effects might be to alter an agent’s belief about the state of another agent or agents </li></ul>
  41. 41. 3: Negotiation <ul><li>Design negotiating MAS based on law of supply and demand </li></ul><ul><li>Contract nets (Smith, 1990): </li></ul><ul><ul><ul><li>Agents have their own goals, are self-interested, and have limited reasoning resources. They bid to accept tasks from other agents and can then either perform the task or subcontract it to another agent. Agent must pay to contract their tasks. </li></ul></ul></ul>
  42. 42. 3: Negotiation <ul><li>MAS controlling air temperature in different rooms of a building: </li></ul><ul><ul><ul><li>An agent can set the thermostat to any temperature. Depending on the actual air temperature, the agent can ‘buy’ hot or cold air from another room that has an excess. At the same time the agent can sell the excess air at the current temperature to other rooms. Modeling the loss of heat in transfer from one room to another, the agents try to buy and sell at the best possible prices. </li></ul></ul></ul>
  43. 43. 4: Commitment/Decommitment <ul><li>Agent agrees to pursue a given goal regardless of how much it serves its own interest </li></ul><ul><li>Commitments can make the systems run more smoothly by making agents trust each other </li></ul><ul><li>Unclear how to make self-interested agents to commit to others </li></ul><ul><li>Belief/desire/intention (BDI) a popular technique for modeling other agents </li></ul><ul><ul><li>Used in OASIS: air traffic control </li></ul></ul>
  44. 44. 5: Further Learning Opportunities <ul><li>Instead of predefining a protocol, allow the agents to learn for themselves what to communicate and how to interpret it </li></ul><ul><li>Possible result would be more efficient communication </li></ul>
  45. 45. Q Learning <ul><li>Assess state action pairs (s, a) using a Q value </li></ul><ul><li>Learn the Q value using rewards/feedback </li></ul><ul><li>A reward receives at time t is discounted to previous state-action pairs (using a discount factor) </li></ul><ul><li>Goal of learning is to find an optimal policy for selecting actions. </li></ul>
  46. 46. The Q value R: Reward P xy : The probability of reaching state y from x by taking action action alpha. Gamma: Discount factor (between 0 and 1). V*(y): The expected total discounted return starting in y following the policy *. Policy: a sequence of actions.
  47. 47. The Expected Total Discount Return V for a state is the maximal Q value among all actions that can be taken at the state (following the rest of the policy).
  48. 49. Learning Rule for Q value Alpha: learning rate
  49. 50. and for all and Do Forever: the current state that maximizes over all Carry out action in the world. Let the short term reward be , and the new state be For each state-action pair do <ul><ul><li>Choose an action </li></ul></ul>1. 2. (a) (b) (c) (d) (e) (f) (g) (h)
  50. 51. Probability for the agent to select action a i based on Q values T: “temperature” parameter to determine the randomness of decisions.
  51. 52. Towards Collaborative and Adversarial Learning A Case Study in Robotic Soccer Peter Stone & Manuela Veloso
  52. 53. Introduction <ul><li>Layered learning, to develop complex multi-agent behaviors from simple ones </li></ul><ul><li>Simple multi-agent behavior in Robotic Soccer, to shoot a moving ball </li></ul><ul><ul><ul><li>Passer </li></ul></ul></ul><ul><ul><ul><li>Shooter </li></ul></ul></ul><ul><li>Behavior to be learnt: When the shooter should begin to move (shooting policy) </li></ul>
  53. 54. Simple Behavior
  54. 55. Parameters <ul><li>Ball speed (fixed vs. variable) </li></ul><ul><li>Ball trajectory (fixed vs. variable) </li></ul><ul><li>Goal location (fixed vs. variable) </li></ul><ul><li>Action quadrant (fixed vs. variable) </li></ul>
  55. 56. Parameters
  56. 57. Fixed Ball Motion <ul><li>Simple shooting policy: begin accelerating when the balls distance to its projected point of intersection with the agent’s path reaches 110 units </li></ul><ul><ul><ul><li>100% success rate if shooter position fixed </li></ul></ul></ul><ul><ul><ul><li>61% success rate if shooter position variable </li></ul></ul></ul><ul><li>Use Neural network, </li></ul><ul><li>Inputs to NN (coordinate independent): </li></ul><ul><ul><ul><li>Ball distance </li></ul></ul></ul><ul><ul><ul><li>Agent distance </li></ul></ul></ul><ul><ul><ul><li>Heading offset </li></ul></ul></ul><ul><li>Output: 1 or 0 (shot successful or not) </li></ul><ul><li>Use random shooting policy for training </li></ul>
  57. 58. Neural Network
  58. 59. Results
  59. 60. Varying Ball Speed <ul><li>Add a fourth input to NN, Ball Speed </li></ul>
  60. 61. Varying Ball’s Trajectory <ul><li>Use the same shooting policy </li></ul><ul><li>Use another NN to determine the direction the shooter should steer (shooter’s aiming policy) </li></ul>
  61. 62. Moving the Goal <ul><li>Can think of it as aiming for different parts of the goal </li></ul><ul><li>Change nothing but the shooter’s knowledge of the goal location </li></ul>
  62. 63. Cooperative Learning <ul><li>Passing a moving ball </li></ul><ul><ul><ul><li>Passer: where to aim the pass, </li></ul></ul></ul><ul><ul><ul><li>Shooter: where to position itself </li></ul></ul></ul>
  63. 64. Cooperative Learning
  64. 65. Adversarial Learning
  65. 66. References <ul><li>Peter Stone, Manuela Veloso, 2000, “Multi-Agent Systems: A Survey from a Machine Learning Perspective” </li></ul><ul><li>Ming Tan, 1993, “Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents” </li></ul><ul><li>Peter Stone, Manuela Veloso, 1998, “Toward Collaborative and Adversarial Learning: A Case Study in Robotic Soccer” </li></ul>

×