Your SlideShare is downloading. ×
0
A Retrospective Look at
  Classifier System Research



Lashon B. Booker
The MITRE Corporation




                        ...
Early Motivations for Learning Classifier
System (LCS) Research

  Design symbolic problem solvers that avoid brittleness ...
Requirements for Non-Brittle Rule-Based
Behavior

  Need to identify and take advantage of the exploitable
   regularitie...
Observations about early research
 The Holland and Reitman collaboration placed a strong
   emphasis on cognition and cha...
Influence of reinforcement learning

                                               Reinforcement learning problems are
   ...
Value-based generalizations aren’t often intuitive
                                                                       ...
Off-policy Methods Learn Different
Behaviors

                                    Since Q learning is an off-policy method...
Soar Architecture of Intelligent Rule-based
Behavior

                                                 I/O

              ...
What kind of architecture makes sense for
classifier systems?
     !*
     !
                                       The ke...
DARPA/IPTO Focus on Cognitive Systems




       Darpa views a cognitive system as one that
   
        –   can reason, u...
Some Open Problems for Reinforcement
Learning (Sutton) - and Classifier Systems

    Incomplete state information
    Exp...
Upcoming SlideShare
Loading in...5
×

A Retrospective Look at A Retrospective Look at Classifier System ResearchClassifier System Research

435

Published on

Lashon Booker presents the glance to the past of LCS and how that connects to the current and future efforts.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
435
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "A Retrospective Look at A Retrospective Look at Classifier System ResearchClassifier System Research"

  1. 1. A Retrospective Look at Classifier System Research Lashon B. Booker The MITRE Corporation © 2006 The MITRE Corporation. All rights reserved.
  2. 2. Early Motivations for Learning Classifier System (LCS) Research  Design symbolic problem solvers that avoid brittleness in realistic (uncertain and continually varying) domains involving – On-line, adaptive control of behaviors Representations and procedures must adjust without unnecessarily disrupting existing capabilities – Discovering relevant categories in a complex and unlabeled stream of input Inputs must be incrementally grouped together into plausible classes  This is especially difficult when behavior requires more knowledge representation and processing capability than is available with simple empirical associations between inputs and outputs © 2006 The MITRE Corporation. All rights reserved.
  3. 3. Requirements for Non-Brittle Rule-Based Behavior  Need to identify and take advantage of the exploitable regularities in the environment  Generalizations must be selective, pragmatic and subject to exceptions  Learning must be incremental and closely coupled with performance and with unfolding reality  Rules must be treated as tentative hypotheses (not logical assertions) subject to testing and conformation – Hypothesis “strength” is derived from experienced-based predictions of performance – Strength is used to determine rule fitness and infer plausibility © 2006 The MITRE Corporation. All rights reserved.
  4. 4. Observations about early research  The Holland and Reitman collaboration placed a strong emphasis on cognition and characterized the problems of interest  Viewed classifier systems as symbolic problem solvers that avoid brittle behavior (an alternative to expert systems) – Treat rule set as a model and rules as parts in a context – Evaluation of parts is context dependent (i.e., aspects are non-stationary)  Learning emphasized policy search and value estimation – Rules are policy elements along with performance estimators – Adjust policy via natural selection among rule types – The Pitt approach preserved this idea, using the GA for direct policy search  Included provisions for motivation, affect and introspection  These ideas provided the foundation for a comprehensive theory of induction (rule clusters, distributed representations, associations, spreading activation, etc.) © 2006 The MITRE Corporation. All rights reserved.
  5. 5. Influence of reinforcement learning Reinforcement learning problems are  faced by agents that must learn action sequences from trial-and-error – Framework provides attractive formalisms based on estimating value functions (with Environment key contributions from Sutton and Barto) State – Algorithms provide useful benchmarks for comparisons input Emphasis on value functions has had  Learning a strong influence on LCS research Agent Action – The primary niche is learning compact scalar value function representations for off- feedback policy temporal difference methods – But, the RL community has good alternatives Solution strategies: • Search the space of possible behaviors It is not clear if we are learning the  best generalizations, or giving • Estimate utility of taking actions in sufficient emphasis to policy world states improvement © 2006 The MITRE Corporation. All rights reserved.
  6. 6. Value-based generalizations aren’t often intuitive Start 0 0 0 0 0 0 0 0 50 50 75 75 75 50 50 0 50 75 75 50 125 125 250 250 500 500 500 500 250 125 125 125 250 500 500 250 125 1000 1000 1000 1000 1000 1000 Grefenstette’s 9x32 abstract state space There are many obvious intuitive solution strategies:  – E.g. Move left or right to column with highest reward, then go straight Classifier systems tend to learn piecemeal strategies rather than coherent  ones – Many narrowly-focused general rules are needed to get the overall solution – Generalizations correspond to symmetries in the reward distribution e.g., (Row = 111) (Column = #011#)  RIGHT ) not the key attribute-based concepts. – This distinction has been irrelevant in most classifier system test problems (e.g., multiplexor and Woods problems) © 2006 The MITRE Corporation. All rights reserved.
  7. 7. Off-policy Methods Learn Different Behaviors Since Q learning is an off-policy method  (i.e., behavior policy may differ from estimation policy), it does not suffer negative consequences for exploration Sarsa (i.e. the bucket brigade) is an on-  policy method, so its solution accounts for the consequences of exploration In real problems where on-line errors  are costly, this distinction is important This also has architectural implications  (e.g., how to approximate the value function) Bottom line: we need to identify and build on the strengths of the LCS approach. The key may be in specifying a set of organizing principles that go beyond implementation diagrams © 2006 The MITRE Corporation. All rights reserved.
  8. 8. Soar Architecture of Intelligent Rule-based Behavior I/O Low Faster Intelligence Reaction Deliberation Learning Reflection High Slower Intelligence  Derived by Newell and his students (~1980), also as a response to the expert system phenomenon  Based on a theory of problem solving (i.e., problem spaces), along with a companion view of learning (i.e., chunking)  The theory was operationalized as an architecture that has served that community well © 2006 The MITRE Corporation. All rights reserved.
  9. 9. What kind of architecture makes sense for classifier systems? !* !  The key role of policy policy improvement suggests that evaluation an actor-critic structure may value learning be a good start Critic  The idea is to intermix value Actor iteration and policy improvement continually policy (state by state, action by improvement V, *Q * action, sample by sample) greedification V,Q  Is there an organizing principle that extends this concept to cover many forms of induction at different scales? (including perception, reasoning, and action) © 2006 The MITRE Corporation. All rights reserved.
  10. 10. DARPA/IPTO Focus on Cognitive Systems Darpa views a cognitive system as one that  – can reason, using substantial amounts of appropriately represented knowledge – can learn from its experience so that it performs better tomorrow than it did today – can explain itself and be told what to do – can be aware of its own capabilities and reflect on its own behavior – can respond robustly to surprise Learning is ubiquitous. Different forms operate at different times and  places What niche is the LCS community best suited to fill?Corporation. All rights reserved.  © 2006 The MITRE
  11. 11. Some Open Problems for Reinforcement Learning (Sutton) - and Classifier Systems  Incomplete state information  Exploration  Structured states and actions  Incorporating prior knowledge  Using teachers  Theory of RL with function approximators  Modular and hierarchical architectures  Integration with other problem–solving and planning methods © 2006 The MITRE Corporation. All rights reserved.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×