Your SlideShare is downloading. ×
MS Word
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

MS Word

492

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
492
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Learning agents Project Mid-semester Report October 22nd, 2002 Group participants: Huayan Gao (huayan.gao@uconn.edu), Thibaut Jahan (thj@ifrance.com), David Keil (dmkeil@att.net), Jian Lian (lianjian@yahoo.com) Students in CSE 333 Distributed Component Systems Professor Steven Demurjian Department of Computer Science & Engineering The University of Connecticut
  • 2. Learning agents midsemester 10/22/02 CONTENTS CONTENTS.........................................................................................................................1 1. Objectives and goals........................................................................................................1 .............................................................................................................................................1 2. Topic summary................................................................................................................2 2.1 Definition and classification of agents and intelligent agents...................................2 2.2 Learning.....................................................................................................................2 2.3 Platform......................................................................................................................3 3. Topic breakdown.............................................................................................................5 3.1 Machine learning (David)..........................................................................................5 3.2 A maze problem (David)...........................................................................................6 3.3 Agent platform (Jian).................................................................................................7 3.4 Agent computing (Huayan)........................................................................................7 3.5 Distributed computing (Jian).....................................................................................8 3.6 Implementation using Together, UML, and Java (Thibaut)......................................8 3.7 Extension to UML needed for multi-agent systems(Huayan)...................................9 4. Progress on project, changes in direction and focus......................................................12 5. Planned activities...........................................................................................................14 5.1 Oct. 23 – Oct. 29:.....................................................................................................14 5.2 Oct. 30 – Nov. 5:......................................................................................................14 5.3 Nov. 6 – Nov. 12:.....................................................................................................14 5.4 Nov. 13 – Nov. 19:...................................................................................................14 5.5 Nov. 20 – Nov. 26:...................................................................................................14 5.6 Nov. 27 – Dec. 2:.....................................................................................................14 6. References......................................................................................................................15 Appendix A: Risks.............................................................................................................17 Appendix B: Categories of agent computing.....................................................................17 Appendix C: Q-learning Algorithm...................................................................................18
  • 3. Learning agents midsemester 10/22/02 1 1. Objectives and goals Our ambition is to build a general-architecture model of components for learning agents. The project will investigate current research on software learning agents and will implement a simple system of such agents. We will demonstrate our work with a distributed learning agent system that interactively finds a policy for navigating a maze. Our implementation will be component-based, using UML and Java. We will begin with the notion intelligent agent, seeking to implement this in a distributed agent environment on a pre-existing agent platform. Agents, in the sense of mobile or distributed agents implemented, we will refer to as “deployed agents.” We will implement the different “generic” components so they can be assembled easily into an agent. The project may also include investigation on scalability, robustness, and adaptability of the system. Four candidate components of a distributed learning agent are perception, action, communication, and learning. Our design and implementation effort will focus narrowly on an artifact of realistic limited scope that solves a well-defined arbitrarily simplifiable maze problem using Q- learning. We will relate the features of our implementation to recent research in the same narrow area and to broader concepts encountered in the sources. We select JADE (Java Agent Development Framework) as our software development framework aimed at developing multi-agent systems and applications conforming to FIPA (Foundation of Intelligent Physical Agents) standards for learning agents.
  • 4. Learning agents midsemester 10/22/02 2 2. Topic summary In this section we will discuss the following questions in detail: What is an agent? What is learning? How are learning and agents combined? What agent platform will we use? 2.1 Definition and classification of agents and intelligent agents Researchers involved in agent have offered a variety of definitions. Some general features that characterize agents are: autonomy, goal-orientedness, collaboration, flexibility, ability to be self-starting, temporal continuity, character, adaptiveness, mobility, and capacity to learn. According to a definition from IBM, “Intelligent agents are software entities that carry out some set of operations on behalf of a user or another program with some degree of independence or autonomy, and in so doing, employ some knowledge or representation of the user's goals or desires.” “An autonomous agent is a system situated within and a part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future.”[fra-gei01]. The latter broad definition is close to the notion of intelligent agent used in the artificial-intelligence field, replacing the logic- programming knowledge-base-oriented paradigm. 2.2 Learning Machine learning is a branch of artificial intelligence concerned with enabling intelligent agents to improve their behavior. Among many categories of learning, we will focus on reinforcement learning and the special case, Q-learning.
  • 5. Learning agents midsemester 10/22/02 3 Reinforcement learning is online rational policy search and uses ideas associated with adaptive systems and related to optimal control and dynamic programming [sut- bar98]. It is distinguished from traditional machine-learning research approaches that assumed offline learning, separated from the application of knowledge learned during a separate, training phase. In the broader definition of intelligent agents, the agent responds to its environment under a policy, which maps from a perceived state of the environment (determined by agent percepts) to actions. An agent’s actions are a series of responses to previously unknown, dynamically generated percepts. A rational agent is one that acts to maximize its expected future reward or performance measure. Because its actions may affect the environment, such an agent must incorporate thinking or planning ahead into its computations. Because it obtains information from its environment only through percepts, it may have incomplete knowledge of the environment. The agent must conduct a trial-and-error search for a policy that obtains a high performance measure. Reinforcement by means of rewards is part of that search. For intelligent agents that use reinforcement learning, unlike systems that learn by training examples, the issue arises of exploitation of obtained knowledge versus exploration to obtain new information. Exploration gains no immediate reward and is only useful if it can improve utility by improving future expected reward. Failing to explore, however, means sacrificing any benefit of learning. 2.3 Platform JADE (Java Agent Development Framework) is a software framework fully implemented in the Java language. It simplifies the implementation of multi-agent
  • 6. Learning agents midsemester 10/22/02 4 systems through a middle-ware platform that claims to comply with the FIPA specifications and through a set of tools that supports the debugging and deployment phase. The agent platform can be distributed across machines and the configuration can be controlled via a remote GUI. According to the FIPA specification, agents communicate via asynchronous message passing, where objects of the ACL Message class are the exchanged payloads. JADE creates and manages a queue of incoming ACL messages; agents can access their queue via a combination of several modes: blocking, polling, timeout and pattern matching based. As for the transport mechanism protocol, Java RMI, event-notification, and IIOP are currently used. The standard model of an agent platform is represented in the following figure. Fig. 2.3.1 The standard model of an agent platform JADE is FIPA-compliant Agent Platform, which includes the AMS (Agent Management System), the DF (Directory Facilitator), and the ACC (Agent Communication Channel). All these three components are automatically activated at the agent platform start-up. The AMS provides white-page and life-cycle service, maintaining a directory of agent identifiers (AID) and agent state. Each agent must register with an AMS in order to get a valid AID. The Directory Facilitator (DF) is the
  • 7. Learning agents midsemester 10/22/02 5 agent who provides the default yellow page service in the platform. The Message Transport System, also called Agent Communication Channel (ACC), is the software component controlling all the exchange of messages within the platform, including messages to/from remote platforms. 3. Topic breakdown Our project will be focused on grid-based problems for learning agents. It will be a similar aim to the one expounded in [rus-nor95], but we have extended that simple example further. Our realization will be more interesting using learning multi-agents and maybe undecided rewards and walls. We will use JADE (Java Agent Development Framework) as our main agent platform to develop and implement the maze. 3.1 Machine learning (David) Part of this project will consist of investigating the literature on machine learning, particularly reinforcement learning. David will lead this work. The problem of learning in interaction with the agent’s environment is that of reinforcement learning (RL). The learner executes a policy search, in some solutions using a critic to aid the reward inputs as guides to improving policy (see figure below). Fig. 3.1.1 Learning agent
  • 8. Learning agents midsemester 10/22/02 6 Within reinforcement learning we will address Q-learning, a variant in which the agent incrementally computes, from its interaction with its environment, a table of expected aggregate future rewards, with values discounted as they extend into the future. As it proceeds, the agent modifies the values in the table to refine its estimates. The Q function returns the optimal action, given a state. The evolving table of estimated Q values is called Qˆ. 3.2 A maze problem (David) The concrete problem described below will help to define how the project breaks down into components: Both [mitchelt97] and [sut-bar98] present a simple example consisting of a maze for which the learner must find a policy, where the reward is determined by eventually reaching or not reaching a goal location in the maze. Fig. 3.2.1 A maze problem We propose to modify the original problem definition by permitting multiple distributed agents that communicate, either directly or via the environment. Either the multi-agent system, or each agent, will use Q-learning. The mazes can be made arbitrarily simple or complex to fit the speed, computational power, and effectiveness of the system we are able to develop in the time available.
  • 9. Learning agents midsemester 10/22/02 7 A further interesting variant of the problem would be to allow the maze to change dynamically, either autonomously or in response to the learning agents. Robust reinforcement learners will adapt successfully to such changes. 3.3 Agent platform (Jian) There are many kinds of agent platform we may choose from http://www.ece.arizona.edu/~rinda/compareagents.html. We choose JADE (Java Agent Development Framework) as a deployed-agent platform. JADE (Java Agent Development Framework) is a software framework fully implemented in the Java language. It simplifies the implementation of multi-agent systems through middleware using a set of tools that supports the debugging and deployment phase. The agent platform can be distributed across machines (which do not even need to share the same OS) and the configuration can be controlled via a remote GUI. The configuration can be even changed at run time by moving agents from one machine to another one, as and when required. 3.4 Agent computing (Huayan) We will survey the agent paradigm of computing, focusing on rational agents, as described in part 2 above. We will apply these concepts to the problem of machine learning, as is done in much reinforcement-learning research. We have defined an intelligent agent as a software entity that can monitor its environment and act autonomously on behalf of a user or creator. To do this, an agent must perceive relevant aspects of its environment, plan and carry out proper actions, and communicate its knowledge to other agents and users. Learning agents will help us to
  • 10. Learning agents midsemester 10/22/02 8 achieve these goals. Learning agents are adaptive, so that in difficult changing environments they may change their behavior based on its previous experience. The real problem with any intelligent agent system is the amount of trust placed in the agent's ability to cope with the information provided by its sensors in its environment. Sometimes the agent’s learning capability is not so good to achieve the anticipated goal. This would be the emphasis when we study the agent. Advantages of learning agents are their ability to adapt to environmental change, their customizability, and their manageable flexibilities as the anticipated way. Disadvantages are the time needed to learn/relearn, their ability only to automate preexisting patterns, and thereby their lack of common sense. 3.5 Distributed computing (Jian) In multi-agent learning in the strong sense, a common learning goal is pursued or, in the weaker sense, agents pursue separate goals but share information. Distributed agents may identify or execute distinct learning subtasks [weiss99]. We will survey the literature on distributed computing, looking for connections to learning agents, and will apply what we find in an attempt to build a distributed system of cooperating learning agents. 3.6 Implementation using Together, UML, and Java (Thibaut) The maze described above could be represented as a bitmap or a two-dimensional array of squares. Starting with a simple example is useful in order to concentrate on good component design and successful implementation. We used the Together CC software to reverse engineer existing code of examples of learning agents. We used two examples, the cat and mouse example, and the dog and cat
  • 11. Learning agents midsemester 10/22/02 9 example, explained below. We are using these examples to extract from the class diagrams a possible design for our agents. Multi-Agent systems being actors and software, their design do not follow typical UML design. The paper [fla-gei01], by Flake and Geiger suggest that UML does not offer the full possibility of designing these agents. We plan on using the Together CC software to implement these agents, starting by their UML design. We have so far identified several distinct components that we think will be used in these learning agents. These Java-implemented agents would then be executed through the JADE environment. The communication component will have to be specific to the Agent Communication Language (ACL) used in JADE. This should be the only environment- dependent component. We will try to make the other components (learning, perception, action) to be as “generic” as possible. Besides the design and implementation of the agents, we also have to design the environment (maze,…). 3.7 Extension to UML needed for multi-agent systems(Huayan) Nowadays, Unified Modeling Language (UML) has been widely used in software engineering. So it is easy to think of applying UML to the design of Agents Systems. But Many UML applications are focused on macro aspect of agent systems like agent interaction and communication, the design of micro aspects of such agents like goals, complex strategies, knowledge, etc. has often been missed out. So the standard UML could not afford to provide the complete solutions to multi-agent systems. A detail
  • 12. Learning agents midsemester 10/22/02 10 description about how to use extended-UML to implement Multi-agents systems can be seen from [fla-gei01]. A Dog-Cat Use-Case Diagram is given as follows: Fig. 3.7.1 Dog-Cat Use-Case Diagram In the above graph, agents are modeled as actors with square heads, and elements of the environment are modeled as clouds. A goal case serves as a means of capturing high level goals of an agent. Reaction cases are used to model how the environment directly influences agents. An arc between an actor and a reactive use case expresses that the actor is the source of events triggering this use case. Figure 1 illustrates Dog-Cat use- case: the dog triggers the reactive use case DogDetected in the cat agent. In the environment, the tree triggers the TreeDetected use case in the cat. In the following, we will give a similar Use-Case Diagram of Cat-Mouse and Maze. The rules of the Cat and Mouse game are: cat catches mouse and mouse escapes cat, mouse catches cheese, and game is over when the cat catches the mouse. The Cat-Mouse use-case Diagram is as follows:
  • 13. Learning agents midsemester 10/22/02 11 Fig. 3.7.2 Cat-Mouse Use-Case Diagram To the well-known maze problem, as we have mentioned in section3.2, we give the following use-case Diagram: Fig. 3.7.3 The Maze Problem Use-Case Diagram
  • 14. Learning agents midsemester 10/22/02 12 4. Progress on project, changes in direction and focus We meet at least every Tuesday after class. Our main change of focus has been the identification of an existing Q-learning package, “Cat and Mouse” (URL: http://www.cse.unsw.edu.au/~aek/catmouse/followup.html), implemented in Java, and an existing agent platform, JADE. Thibaut generated a class diagram of the Cat and Mouse Java code using Together. Jian installed the Java code into the JADE platform to create a distributed environment for the learner. Our goal is to implement agents that learn to pursue moving or stationary goals (cat pursues mouse, mouse pursues cheese) or avoid negative rewards (mouse flees cat). Huayan found a similar example, “Dog and Cat,” described with use cases, and located other sources related to agent-based systems. The source for “Dog and Cat” [Flake, Geiger, 2001], raised the issue of the limitations of standard UML use-case diagramming for the purpose of depicting multi- agent systems. Cat, for example, has the use case Escape while Dog has Chase. But these two use cases denote the same set of events, seen from opposite perspectives. David coded a simple maze reinforcement learner based on [RN95] in C++, writing classes for the maxe, individual states in the maze, and the learning agent. At a later stage this code could easily be ported to java. David also wrote C++ code for a system based on (Michie, Chambers, 1968) that uses reinforcement learning to solve the classic problem of pole balancing, in which a controller nudges a cart that sits on a track, with a pole balanced on it, trying to avoid letting the pole fall. In this problem, physical states are on a continuum in four
  • 15. Learning agents midsemester 10/22/02 13 dimensions, but may be quantized into a tractable number of discrete states from the standpoint of the learner, leading to a solution. The two directions taken so far by group members are somewhat complementary. The group may have to choose between them, however. Use of the existing Cat-and- Mouse system will allow us with certainty to address harder problems, where the learner’s environment changes in response to the agent (e.g., cat flees dog). Using JADE has the best chance to allow us to attain our goal of implementing distributed learning agents that communicate. We may then seek to extend the existing solution by adding to its Java code. The approach of coding known solutions from scratch, on the other hand, guarantees that at least one member of the group will understand the code, and all members will understand it if all members participate in the coding or if the code explains the code to the others. We noticed that the Java code for Cat-and-Mouse is quite lengthy.
  • 16. Learning agents midsemester 10/22/02 14 5. Planned activities 5.1 Oct. 23 – Oct. 29: Consultation with instructor on platform and problem choices to be made; discussion on selection of problem and platform. Decision on role in this project of UML extension to multi-agent systems. 5.2 Oct. 30 – Nov. 5: Java implementation of the learning aspect of the agents and enhancement of communication efficiency. Each participant will code the components decided on and described in the design part. Once these components are tested, they will be integrated and the resulting system tested. 5.3 Nov. 6 – Nov. 12: Extensions to code. Circulation of draft report. 5.4 Nov. 13 – Nov. 19: Preliminary preparation of slides 5.5 Nov. 20 – Nov. 26: Preparation of the final report and last adjustments of the learning agents. 5.6 Nov. 27 – Dec. 2: Polishing of report and slides.
  • 17. Learning agents midsemester 10/22/02 15 6. References [aga-bek97] Arvin Agah and George A. Bekey. Phylogenetic and ontogenetic learning in a colony of interacting robots. Autonomous Robots 4, pp. 85-100, 1997. [anders02] Chuck Anderson. Robust Reinforcement Learning with Static and Dynamic Stability. http://www.cs.colostate.edu/~anderson/res/rl/nsf2002.pdf, 2002. [durfee99] Edmund H. Durfee. Distributed problem solving and planning. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence. MIT Press, 1999, pp. 121ff, 1999. [d’Inverno01] Mark d’Inverno, Michael Luck. Understanding Agent Systems. [PUB?] 2001. [fla-gei01] Stephan Flake, Christian Geiger, Jochen M. Kuster. Towards UML-based analysis and design of multi-agent systems. International NAISO Symposium on Information Science Innovations in Engineering of Natural and Artificial Intelligent Systems (ENAIS’2001), Dubai, March 2001. [fra-gra96] Stan Franklin and Art Graesser. Is it an agent, or just a program?: A taxonomy for autonomous agents. Proceedings of the Third International Workshop on Agent Theories, Architectures, and Languages, 1996. www.msci.memphis.edu/ ~franklin/AgentProg.html [huh-ste99] Michael N. Huhns and Larry M. Stephens. Multiagent systems and societies of agents. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 79-120, 1999. [jac-byl] Ivar jacobson and Stefan Bylund. A multi-agent system assisting software developers. Downloaded. [Knapik98] Michael Knapik, Jay Johnson. Developing Intelligent Agents for Distributed Systems, 1998 [lam-lyn90] Leslie Lamport and Nancy Lynch. Distributed computing: models and methods. In Jan van Leeuwen, ed., Handbook of Theoretical Computer Science, Vol. B, MIT Press, 1990, pp. 1158-1199. [mitchelt97] Tom M. Mitchell. Machine learning. McGraw-Hill, 1997. [mor-mii96] David E. Moriarty and Risto Miikkulainen. Efficient reinforcement learning through symbiotic evolution. Machine Learning 22, pp. 11-33, 1996. [petrie96] Charles J. Petrie. Agent-based engineering, the web, and intelligence. IEEE Expert, December 1996. [rus-nor95] Stuart Russell and Peter Norvig. Artificial intelligence: A modern approach. Prentice Hall, 1995. [SAG97] Software Agents Group MIT Media Laboratory. “CHI97 Software Agents Tutorial”, http://pattie.www.media.mit.edu/people/pattie/CHI97/.
  • 18. Learning agents midsemester 10/22/02 16 [sandho99] Tuomas W. Sandholm. Distributed rational decision making. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 201-258, 1999. [sen-wei99] Sandip Sen and Gerhard Weiss. Learning in multiagent systems. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 259-298, 1999. [shen94] Wei-Min Shen. Autonomous learning from the environment. Computer Science Press, 1994. [sut-bar98] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An introduction. MIT Press, 1998. [syc-pan96] Katia Sycara, Anandeep Pannu, Mike Williamson, Dajun Zeng, Keith Decker. Distributed intelligent agents. IEEE Expert, December 1996, pp. 36-45. [venners97] Bill Venners. The architecture of aglets. Java World, April, 1997. [wal-wya94] Jim Waldo, Geoff Wyant, Ann Wollrath, Sam Kenall. A note on distributed computing. Sun Microsystems technical report SMLI TR-94-29, November 1994. [weiss99] Gerhard Weiss, Ed. Multiagent systems: A modern approach to distributed artificial intelligence. MIT Press, 1999. [wooldr99] Michael Wooldridge. Intelligent agents. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 27-77. Reference to get title, author: [xx99] http://www.cs.helsinki.fi/research/hallinto/TOIMINTARAPORTIT/1999/report99/node2.html.
  • 19. Learning agents midsemester 10/22/02 17 Appendix A: Risks Our objectives include avoiding several possible risks, including (1) the construction of “toy worlds,” i.e., problem specifications tailored to the envisioned solution; (2) complexity of design without performance gain; (3) overfitting the generalizable components to the specific problem at hand, putting reusability at risk; (4) premature commitment to a specific solution (Q-learning) as opposed to exploration of various alternatives. Appendix B: Categories of agent computing A wide range of agent types exists. • Interface agents are computer programs that employ artificial intelligence techniques to provide active assistance to a user with computer-based tasks. • Mobile agents are software processes capable of moving around networks such as the World Wide Web, interacting with hosts, gathering information on behalf of their owner and returning with requested information that is found. • Co-operative agents can communicate with, and react to, other agents in a multi-agent systems within a common environment. Such an agent's view of its environment might be very narrow due to its limited sensory capacity. Co- operation exists when the actions of an agent achieve not only the agent's own goals, but also the goals of agents other than itself. • Reactive Agents do not possess internal symbolic models of their environment. Instead, the reactive agent “reacts” to a stimulus or input that is
  • 20. Learning agents midsemester 10/22/02 18 governed by some state or event in its environment. This environmental event triggers a reaction or response from the agent. The application field of agent computing includes economics, business (commercial databases), management, telecommunications (network management) and e-societies (as for e-commerce). Techniques from databases, statistics, and machine learning are widely used in agent applications. In the telecommunication field, agent technology is used to support efficient (in terms of both cost and performance) service provision to fixed and mobile users in competitive telecommunications environments. Appendix C: Q-learning Algorithm With a known model (M below) of the learner’s transition probabilities given a state and an action, the following constraint equation holds for Q-values, where a is an action, i and j states, and R a reward: Q(a, i) = R (i) + ΣjMaij maxa′ Q(a′, j) Using the temporal-difference learning approach, which does not require a model, we have the following update formula that is calculated after the learn goes from state i to state j: Q(a, i) ← Q (a, i) + α(R(i) + maxa′ Q(a′, j) - Q (a, i)) Within the objective of a simple implementation, we will aim to provide an analysis of the time complexithy, adaptability to dynamic environments, and scalability of Q- learning agents as compared to more primitive reinforcement learners.

×