CSE333 project initial spec: Learning agents


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

CSE333 project initial spec: Learning agents

  1. 1. Learning agents spec for CSE333 9/19/02 1 CSE333 project initial spec: Learning agents Participants: Huayan Gao (huayan.gao@uconn.edu), Thibaut Jahan, (thj@ifrance.com) David Keil, (DavidKeil@aol.com) Jian Lian (lianjian@yahoo) 1. Objectives and goals This project will investigate current research on software learning agents and will implement a simple system that demonstrates such agents. Our goal is to build a distributed learning agent system that interactively finds a policy for navigating a maze. Our implementation will be component-based, using UML and Java. It may also include investigation on scalability, robustness, and adaptability of the system. Four component candidates of a distributed learning agent are perception, action, communication, and learning. Our ambition is to build a general architecture model of components for learning agents. We implement the different "generic" components so they can be assembled easily into an agent. Learning in interaction with the agent’s environment is the problem of reinforcement learning. We will therefore address reinforcement learning (within Q-learning). 2. Topic summary Reinforcement learning Reinforcement learning is rational policy search and revives ideas associated with adaptive systems and related to optimal control and dynamic programming [sut-bar98]. Traditional machine-learning research approaches assumed that learning was offline (separated from application of knowledge learned).
  2. 2. Learning agents spec for CSE333 9/19/02 2 A policy maps from agent states (shaped by percepts) to actions, defining an agent’s actions as a series of responses to previously unknown, dynamically generated percepts. A rational agent is one that acts to maximize its expected utility or future reward or performance measure. Because their actions may affect the environment, such agents must incorporate thinking or planning ahead into their computations. Because they obtain information from their environments only through percepts, they have incomplete knowledge of the environment and must conduct a trial-and-error search for a policy that obtains a high performance measure. Q-learning Q-learning is a variant of reinforcement learning in which the agent incrementally computes, from its interaction with its environment, a table of expected aggregate future rewards, with values discounted as they extend into the future. As it proceeds, the agent modifies the values in the table to refine its estimates. The Q function returns the optimal action, given a state. The evolving table of estimated Q values is called Qˆ. Intelligent agents The term “agent” is used in two senses: (a) programs that act on behalf of humans to gather information [syc-pan96]; (b) entities that interact with their environments by retrieving percepts and generating actions. We will use the common restriction of (b) to rational agents that act in such a way as to maximize future expected reward. In this project, we only consider software agents instead of autonomous robots, expert assistance, etc.
  3. 3. Learning agents spec for CSE333 9/19/02 3 The real problem with any intelligent agent system is the amount of trust placed in the agent's ability to cope with the information provided by its sensors in its environment. This would be the emphasis when we study the agent. The Agent application fields concern Economics, Business (Commercial databases), Management, Telecommunications (Network Management) and e-societies (like for e- commerce). Those areas combine techniques from databases, statistics, and machine learning and agent applications are widely used. In telecommunication field, agent technology is used to support efficient (in terms of both cost and performance) service provision to fixed and mobile users in competitive telecommunications environments. 3. Topic breakdown An example problem The concrete problem described below will help to define how the project breaks down into components: Both [mitchelt97] and [sut-bar98] present a simple example consisting of a maze for which the learner must find a policy, where the reward is determined by eventually reaching or not reaching a goal location in the maze. We propose to modify the original problem definition by permitting multiple distributed agents that communicate, either directly or via the environment. Either the multi-agent system, or each agent, will use Q-learning. The mazes can be made arbitrarily simple or complex to fit the speed and computational power and effectiveness of the system we are able to develop in the time available.
  4. 4. Learning agents spec for CSE333 9/19/02 4 A further interesting variant of the problem would be to allow the maze to change dynamically, either autonomously or in response to the learning agents. Robust reinforcement learners will adapt successfully to such changes. Topic breakdown 1. Machine learning Part of this project will consist of investigating the literature on machine learning, particularly reinforcement learning, and defining an approach based on this literature that is realistically implementable by the team. 2. Agent computing We will survey the agent paradigm of computing, focusing on rational agents, as described in part 2 above. We will apply these concepts to the problem of machine learning, as is done in much reinforcement-learning research. 3. Distributed computing In multiagent learning in the strong sense, a common learning goal is pursued or, in the weaker sense, agents pursue separate goals but share information. Distributed agents may identify or execute distinct learning subtasks [weiss99]. We will survey the literature on distributed computing, looking for connections to learning agents, and will apply what we find in an attempt to build a distributed system of cooperating learning agents. 4. Implementation using Together, UML, and Java The maze described above could be represented as a bitmap or two-dimensional array of squares. Starting with a simple example is useful in order to concentrate on good component design and successful implementation.
  5. 5. Learning agents spec for CSE333 9/19/02 5 Division of labor The team members will work together on each main aspect of the project; however, it is envisioned that leadership of the work in the respective areas will be distributed as follows: • Machine learning: David • Agent computing: Jian • Distributed computing: Huayan • Tools and implementation: Thibaut Scope We will consult sources to gain a survey knowledge of the fields of agent computing, distributed computing, and reinforcement learning, especially Q-learning. Our design and implementation effort will focus narrowly on an artifact of realistic limited scope that solves a well-defined arbitrarily simplifiable maze problem using Q-learning. We will relate the features of our implementation to recent research in the same narrow area and to broader concepts encountered in the sources. 4. Planned activities Sept 19th – Oct 22nd: We will acquire the knowledge needed in order for us to design learning agents. Concurrently, we will start designing the learning agents using UML; design issues are critical for the future implementation. Simple Java prototypes of standalone agents that navigate a maze will be built, beginning with classes generated by Together.
  6. 6. Learning agents spec for CSE333 9/19/02 6 Oct 12th – Oct 21st: Further source research in sources on distributed learning agents. Drafting of summary of source research. Design and implementation of communicating distributed agents with simple learning features. Preparation of the mid-term report. Oct 22nd – mid November: Java implementation of the learning aspect of the agents and enhancement of communication efficiency. Each participant will code the components decided on and described in the design part. Once these components are tested, they will be integrated and the resulting system tested. End of November: Preparation of the final report and last adjustments of the learning agents.
  7. 7. Learning agents spec for CSE333 9/19/02 7 APPENDICES Appendix A: References The list of references below will be reduced to those actually used in writing the paper, and cited, or used in the implementation. [aga-bek97] Arvin Agah and George A. Bekey. Phylogenetic and ontogenetic learning in a colony of interacting robots. Autonomous Robots 4, pp. 85-100, 1997. [anders02] Chuck Anderson. Robust Reinforcement Learning with Static and Dynamic Stability. http://www.cs.colostate.edu/~anderson/res/rl/nsf2002.pdf, 2002. [durfee99] Edmund H. Durfee. Distributed problem solving and planning. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence. MIT Press, 1999, pp. 121ff, 1999. [fra-gra96] Stan Franklin and Art Graesser. Is it an agent, or just a program?: A taxonomy for autonomous agents. Proceedings of the Third International Workshop on Agent Theories, Architectures, and Languages, 1996. www.msci.memphis.edu/ ~franklin/AgentProg.html [huh-ste99] Michael N. Huhns and Larry M. Stephens. Multiagent systems and societies of agents. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 79-120, 1999. [lam-lyn90] Leslie Lamport and Nancy Lynch. Distributed computing: models and methods. In Jan van Leeuwen, ed., Handbook of Theoretical Computer Science, Vol. B, MIT Press, 1990, pp. 1158-1199. [mitchelt97] Tom M. Mitchell. Machine learning. McGraw-Hill, 1997. [mor-mii96] David E. Moriarty and Risto Miikkulainen. Efficient reinforcement learning through symbiotic evolution. Machine Learning 22, pp. 11-33, 1996. [petrie96] Charles J. Petrie. Agent-based engineering, the web, and intelligence. IEEE Expert, December 1996. [rus-nor95] Stuart Russell and Peter Norvig. Artificial intelligence: A modern approach. Prentice Hall, 1995. [SAG97] Software Agents Group MIT Media Laboratory. “CHI97 Software Agents Tutorial”, http://pattie.www.media.mit.edu/people/pattie/CHI97/. [sandho99] Tuomas W. Sandholm. Distributed rational decision making. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 201-258, 1999. [sen-wei99] Sandip Sen and Gerhard Weiss. Learning in multiagent systems. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 259-298, 1999.
  8. 8. Learning agents spec for CSE333 9/19/02 8 [shen94] Wei-Min Shen. Autonomous learning from the environment. Computer Science Press, 1994. [sut-bar98] Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An introduction. MIT Press, 1998. [syc-pan96] Katia Sycara, Anandeep Pannu, Mike Williamson, Dajun Zeng, Keith Decker. Distributed intelligent agents. IEEE Expert, December 1996, pp. 36-45. [venners97] Bill Venners. The architecture of aglets. Java World, April, 1997. [wal-wya94] Jim Waldo, Geoff Wyant, Ann Wollrath, Sam Kenall. A note on distributed computing. Sun Microsystems technical report SMLI TR-94-29, November 1994. [weiss99] Gerhard Weiss, Ed. Multiagent systems: A modern approach to distributed artificial intelligence. MIT Press, 1999. [wooldr99] Michael Wooldridge. Intelligent agents. In Gerhard Weiss, Ed., Multiagent systems: A modern approach to distributed artificial intelligence, MIT Press, 1999, pp. 27-77. Appendix B: Definition and classification of agents Definition of agents Researchers involved in agent have offered a variety of definitions. If we use some general feature of agent to characterize it, it should be autonomous, goal-oriented, collaborative, flexible, self-starting, temporal continuity, character, adaptive, mobile, and learning. According to definition from IBM, "Intelligent agents are software entities that carry out some set of operations on behalf of a user or another program with some degree of independence or autonomy, and in so doing, employ some knowledge or representation of the user's goals or desires”. From Stan Franklin, “An autonomous agent is a system situated within and a part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda and so as to effect what it senses in the future”. From the feature of agent, there is a wide set of agent types.
  9. 9. Learning agents spec for CSE333 9/19/02 9 Interface Agents Computer programs that employ Artificial Intelligence techniques to provide active assistance to a user with computer-based tasks Mobile Agents Software processes capable of moving around networks such as the world wide web (WWW), interacting with other hosts, gathering information on behalf of their owner and returning with any information it found that was requested by the owner. Co-operative Agents A co-operative agent can communicate with, and react to its environment. An agent's view of its environment would be very narrow due to its limited sensors. Co-operation exists when the actions of an agent achieves not only the agent's own goals, but also the goals of agents other than itself. Reactive Agents Reactive agents are a special type of agent, which do not possess internal symbolic models of their environment. Instead, the reactive agent "reacts" to a stimulus or input that is governed by some state or event in its environment. This environmental event triggers a reaction or response from the agent Appendix C: Agent Development and Implementation JADE (Java Agent DEvelopment Framework) is a software framework fully implemented in Java language. It simplifies the implementation of multi-agent systems through a middle-ware through a set of tools that supports the debugging and deployment phase. The agent platform can be distributed across machines (which not even need to share the same OS) and the configuration can be controlled via a remote GUI. The
  10. 10. Learning agents spec for CSE333 9/19/02 10 configuration can be even changed at run-time by moving agents from one machine to another one, as and when required. JADE is completely implemented in Java language and the minimal system requirement is the version 1.2 of JAVA (the run time environment or the JDK). Appendix D: Pros and cons of smart/learning agents and applications The Pros of learning agents are 1) Agent adapts to environment change 2) Agent can be customized 3) Agent has manageable flexibilities. The Cons are 1) Agents need time to learn/relearn 2) Agents can only automate preexisting pattern 3) Agents have no common sense Appendix E: Exploitation and exploration in learning For agents that use reinforcement learning, unlike systems that learn by training examples, the issue arises of exploitation of obtained knowledge versus exploration to obtain new information. Exploration gains no immediate reward and is only useful if it can improve future utility. An exploitation-only policy, on the other hand, would mean sacrificing any learning, to improve future expected reward, in favor of immediate reward.
  11. 11. Learning agents spec for CSE333 9/19/02 11 Appendix F: Risks We will seek to avoid several possible obstacles, including: • The construction of “toy worlds,” i.e., problem specifications tailored to the envisioned solution; • Complexity of design without performance gain; • Overfitting the generalizable components to the specific problem at hand, putting reusability at risk; Premature commitment to a specific solution (Q-learning) as opposed to exploration of various alternatives. Reference to get title, author: [xx99] http://www.cs.helsinki.fi/research/hallinto/TOIMINTARAPORTIT/1999/report99/node2.html.