1. Introduction to Machine
Learning
Lecture 23
Learning Classifier Systems
Albert Orriols i Puig
http://www.albertorriols.net
htt // lb t i l t
aorriols@salle.url.edu
Artificial Intelligence – Machine Learning
g g
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull
2. Recap of Lectures 21-22
Value functions
Vπ(s): Long-term reward estimation
from s a e s following po cy π
o state o o g policy
Qπ(s,a): Long-term reward estimation
from s a e s e ecu g ac o a
o state executing action
and then following policy π
The long term reward is a recency weighted average of
recency-weighted
the received rewards
…r …
at rt+1 at+1 rt+2 at+2 rt+3 at+3
t
st st+1 st+2 st+3
Slide 2
Artificial Intelligence Machine Learning
3. Recap of Lectures 21-22
Q
Q-learning
g
Slide 3
Artificial Intelligence Machine Learning
4. Today’s Agenda
The Origins of LCSs
Michigan-style LCSs
Pittsburg-style LCS
Pitt b t l LCSs
Michigan-style LCSs
Slide 4
Artificial Intelligence Machine Learning
5. Original Idea of LCS
Holland’s envision: Cognitive Systems
g y
Create true artificial intelligence itself
True intelligence requires adaptive behavior in the face of changing
circumstances (Holland & Reitman, 1978)
Holland s
Holland’s vision going back to late 50s and early 60s of roving bands
of computer programs.
Holland’s notion of genetic search as program searching (1962)
The free generation procedure. . . Requires the generators (and
combinations of generators) to “shift” and “connect” at random in the
shift connect
computer…two or more generators occupying adjacent modules (“in
contact”) may become connected. Such connected sets of
generators are to shift as a unit.
t t hift it
From stimulus-response t internal states and modifiable d t t
F ti l to i t l tt d difi bl detectors
and effectors
Slide 5
Artificial Intelligence Machine Learning
6. First LCS Implementation
CS-1 (Holland & Reitman, 1978)
Post-production system
General memory containing classifiers
Process:
Code the situation and find in memory
the actions that are appropriate to
both CS-1 goal and situation
Store in memory the consequences of
these actions (learning)
Generate new good productions
(classifiers) t endure.
(l ifi ) to d
Population of classifiers Current system knowledge
Performance component Short term behavior of the system
Rule discovery component Get new promising rules
Slide 6
Artificial Intelligence Machine Learning
7. Meanwhile, in Pitts University
Smith’s interpretation of Holland’s GA envision
p
Smith’s notion of learning as adaptive search (1980, 1983)
LS-1: “Learns a set of heuristics, represented as production
LS 1 “L t fh i ti td d ti
system programs, to govern the application of a set of
operators in performing a particular task”
Great success! LS-1 took
Waterman’s poker player to the
cleaners (not bluffing)
Slide 7
Artificial Intelligence Machine Learning
8. Two Models
And here, two ways started: Michigan vs Pitts LCSs
, y g
Pittsburgh-style LCSs
Michigan-style LCSs
Straight GA
Cognitive system
Individual = set of rules
Individual = rule
Solution: best individual
Solution: all the
population
Usually offline systems
U ll ffli
Apportionment of credit
Reinforcement learning
We focus on Michigan-style LCS
Slide 8
Artificial Intelligence Machine Learning
9. Michigan-style LCSs
General schema
Environment
Sensorial
Action Reward
state
Online rule evaluator:
• XCS: Q-Learning (Sutton & Barto, 1998)
Classifier 1
Learning
Any Representation:
y p Uses Widrow-Hoff delta rule
Classifier 2
Classifier
production rules,
genetic programs, System
Classifier n
perceptrons,
SVMs
Rule evolution:
Genetic Typically, a GA (Holland, 75;
Algorithm Goldberg, 89) applied on the
population.
Slide 9
Artificial Intelligence Machine Learning
10. Knowledge Representation
The knowledge representation consists of
g p
Population of classifiers
Usually independent of each other
Each classifier has
Condition
C diti part C
t
Action part A
Prediction
P di ti part P
t
Interpreted as:
If condition C is satisfied and action A is executed, then P is
executed
expected to be true
Solution for a new problem
Get the classifiers that match the sensorial state
Decide which action should be used among the actions of
the selected classifiers
Slide 10
Artificial Intelligence Machine Learning
11. Condition Structures
Condition structure depends on the types of attributes
p yp
Binary
Ternary encoding {0, 1 #}
{0 1,
If v1 is ‘0’ and v2 is ‘1’ and v3 is ‘#’ … and vn in ‘0’ then actioni
Continuous
Interval-based encoding
If v1 in [l1,u1] and v2 in [l2,u2] … and vn in [ln,un] then actioni
u u u
Hyperellipsoids
Slide 11
Artificial Intelligence Machine Learning
12. Condition Structures
Condition structure depends on the types of attributes
p yp
Many other representations
Partial matching (Booker 1985)
(Booker,
Default hierarchies (Holland et al., 1986)
Fuzzy conditions (Bonarini 2000; Valenzuela Rendón 1991; Casillas et
(Bonarini, Valenzuela-Rendón,
al., 2008, Orriols et al., 2009)
Neural-network-based encodings (Bull & O’Hara, 2002)
GP tree encodings with S-expressions (Lanzi, 1999)
Slide 12
Artificial Intelligence Machine Learning
13. Prediction
Prediction can be:
Scalar number
Line
Polynomial
Neural network
…
We ill
W will consider the initial idea: prediction is a scalar number
id th i iti l id di ti i l b
Slide 13
Artificial Intelligence Machine Learning
14. Learning Interaction in XCS
ENVIRONMENT
Match Set [M]
Problem
instance
1C A PεF num as ts exp
Selected
3C A PεF num as ts exp
action
5C A PεF num as ts exp
Population [P] 6C A PεF num as ts exp
Match set
REWARD
…
generation
1C A PεF num as ts exp
2C A PεF num as ts exp
Prediction
3C A PεF num as ts exp
Array
4C A PεF num as ts exp
5C A PεF num as ts exp
6C A PεF num as ts exp Selected action
…
Action Set [A]
[] Classifier
1C A PεF num as ts exp Parameters
Deletion Selection, reproduction,
3C A PεF num as ts exp Update
and mutation
5C A PεF num as ts exp
(Widrow-Hoff rule)
6C A PεF num as ts exp
… Delayed reward [A-1]
Genetic Algorithm
Competition Fitness Sharing
Action Set [A]-1
in the niche
1C A PεF num as ts e p
u exp
3C A PεF num as ts exp
5C A PεF num as ts exp
6C A PεF num as ts exp
… Slide 14
Artificial Intelligence Machine Learning
15. Estimate Classifier Prediction
Three key p
y parameters
Prediction: What I will get if I select the action
Error: Error on that prediction Does it sound familiar?
Q-learning!
Fitness: How good is my classifier
g y
These parameters are estimated on-line
Slide 15
Artificial Intelligence Machine Learning
16. Evolutionary Search
GA applied time to time to [A]
pp []
Select two parents
Cross th
C them
Mutate them
Introduce the two new offspring into the population
If the population is full
t e popu at o s u remove poo classifiers
e o e poor c ass e s
Slide 16
Artificial Intelligence Machine Learning
17. LCS Learning Pressures
Parameter updates identifies most accurate classifiers
Different pressures caused by the GA:
S t pressure t
toward generality
d lit
Set
Fitness pressure toward highly fit classifiers
Mutation pressure pressuring toward diversification
Subsumption pressure toward the deletion
of accurate, over-specialized
classifiers
Slide 17
Artificial Intelligence Machine Learning
18. Next Class
Applications of LCS
A li ti f
Slide 18
Artificial Intelligence Machine Learning
19. Introduction to Machine
Learning
Lecture 23
Learning Classifier Systems
Albert Orriols i Puig
http://www.albertorriols.net
htt // lb t i l t
aorriols@salle.url.edu
Artificial Intelligence – Machine Learning
g g
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull