Lecture23

Introduction to Machine
Learning
Lecture 23
Learning Classifier Systems

Albert Orriols i Puig
http://www.albertorriols.net
htt // lb t i l t
aorriols@salle.url.edu

Artificial Intelligence – Machine Learning
g g
Enginyeria i Arquitectura La Salle
Universitat Ramon Llull

Recap of Lectures 21-22
Value functions
Vπ(s): Long-term reward estimation
from s a e s following po cy π
o state o o g policy
Qπ(s,a): Long-term reward estimation
from s a e s e ecu g ac o a
o state executing action
and then following policy π
The long term reward is a recency weighted average of
recency-weighted
the received rewards

…r …
at rt+1 at+1 rt+2 at+2 rt+3 at+3
t
st st+1 st+2 st+3

Slide 2
Artificial Intelligence Machine Learning

Recap of Lectures 21-22
Q
Q-learning
g

Slide 3

Today’s Agenda

The Origins of LCSs
Michigan-style LCSs
Pittsburg-style LCS
Pitt b t l LCSs

Michigan-style LCSs

Slide 4

Original Idea of LCS
Holland’s envision: Cognitive Systems
g y
Create true artificial intelligence itself
True intelligence requires adaptive behavior in the face of changing
circumstances (Holland & Reitman, 1978)
Holland s
Holland’s vision going back to late 50s and early 60s of roving bands
of computer programs.

Holland’s notion of genetic search as program searching (1962)
The free generation procedure. . . Requires the generators (and
combinations of generators) to “shift” and “connect” at random in the
shift connect
computer…two or more generators occupying adjacent modules (“in
contact”) may become connected. Such connected sets of
generators are to shift as a unit.
t t hift it

From stimulus-response t internal states and modifiable d t t
F ti l to i t l tt d difi bl detectors
and effectors
Slide 5

First LCS Implementation

CS-1 (Holland & Reitman, 1978)

Post-production system
General memory containing classifiers
Process:
Code the situation and find in memory
the actions that are appropriate to
both CS-1 goal and situation
Store in memory the consequences of
these actions (learning)
Generate new good productions
(classifiers) t endure.
(l ifi ) to d

Population of classifiers Current system knowledge

Performance component Short term behavior of the system

Rule discovery component Get new promising rules

Slide 6

Meanwhile, in Pitts University
Smith’s interpretation of Holland’s GA envision
p

Smith’s notion of learning as adaptive search (1980, 1983)
LS-1: “Learns a set of heuristics, represented as production
LS 1 “L t fh i ti td d ti
system programs, to govern the application of a set of
operators in performing a particular task”

Great success! LS-1 took
Waterman’s poker player to the
cleaners (not bluffing)

Slide 7

Two Models
And here, two ways started: Michigan vs Pitts LCSs
, y g

Pittsburgh-style LCSs
Michigan-style LCSs
Straight GA
Cognitive system
Individual = set of rules
Individual = rule
Solution: best individual
Solution: all the
population
Usually offline systems
U ll ffli
Apportionment of credit
Reinforcement learning

We focus on Michigan-style LCS
Slide 8

Michigan-style LCSs
General schema

Environment
Sensorial
Action Reward
state

Online rule evaluator:
• XCS: Q-Learning (Sutton & Barto, 1998)
Classifier 1
Learning
Any Representation:
y p Uses Widrow-Hoff delta rule
Classifier 2
Classifier
production rules,
genetic programs, System
Classifier n
perceptrons,
SVMs

Rule evolution:
Genetic Typically, a GA (Holland, 75;
Algorithm Goldberg, 89) applied on the
population.

Slide 9

Knowledge Representation
The knowledge representation consists of
g p
Population of classifiers
Usually independent of each other
Each classifier has
Condition
C diti part C
t
Action part A
Prediction
P di ti part P
t
Interpreted as:
If condition C is satisfied and action A is executed, then P is
executed
expected to be true

Solution for a new problem
Get the classifiers that match the sensorial state
Decide which action should be used among the actions of
the selected classifiers
Slide 10

Condition Structures
Condition structure depends on the types of attributes
p yp
Binary
Ternary encoding {0, 1 #}
{0 1,

If v1 is ‘0’ and v2 is ‘1’ and v3 is ‘#’ … and vn in ‘0’ then actioni

Continuous
Interval-based encoding

If v1 in [l1,u1] and v2 in [l2,u2] … and vn in [ln,un] then actioni
u u u

Hyperellipsoids

Slide 11

Condition Structures
Condition structure depends on the types of attributes
p yp
Many other representations
Partial matching (Booker 1985)
(Booker,

Default hierarchies (Holland et al., 1986)
Fuzzy conditions (Bonarini 2000; Valenzuela Rendón 1991; Casillas et
(Bonarini, Valenzuela-Rendón,
al., 2008, Orriols et al., 2009)

Neural-network-based encodings (Bull & O’Hara, 2002)
GP tree encodings with S-expressions (Lanzi, 1999)

Slide 12

Prediction
Prediction can be:
Scalar number
Line
Polynomial
Neural network
…

We ill
W will consider the initial idea: prediction is a scalar number
id th i iti l id di ti i l b

Slide 13

Learning Interaction in XCS

ENVIRONMENT

Match Set [M]
Problem
instance
1C A PεF num as ts exp
Selected
action
Population [P] 6C A PεF num as ts exp
Match set
REWARD
…
generation
Prediction
Array
6C A PεF num as ts exp Selected action
…
Action Set [A]
[] Classifier
1C A PεF num as ts exp Parameters
Deletion Selection, reproduction,
3C A PεF num as ts exp Update
and mutation
(Widrow-Hoff rule)
… Delayed reward [A-1]
Genetic Algorithm
Competition Fitness Sharing
Action Set [A]-1
in the niche
1C A PεF num as ts e p
u exp
… Slide 14

Estimate Classifier Prediction
Three key p
y parameters
Prediction: What I will get if I select the action

Error: Error on that prediction Does it sound familiar?
Q-learning!

Fitness: How good is my classifier
g y

These parameters are estimated on-line
Slide 15

Evolutionary Search
GA applied time to time to [A]
pp []
Select two parents
Cross th
C them
Mutate them
Introduce the two new offspring into the population
If the population is full
t e popu at o s u remove poo classifiers
e o e poor c ass e s

Slide 16

LCS Learning Pressures
Parameter updates identifies most accurate classifiers

Different pressures caused by the GA:
S t pressure t
toward generality
d lit
Set

Fitness pressure toward highly fit classifiers

Mutation pressure pressuring toward diversification

Subsumption pressure toward the deletion
of accurate, over-specialized
classifiers

Slide 17

Next Class

Applications of LCS
A li ti f

Slide 18

Lecture23

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Lecture23

Similar to Lecture23 (20)

Recently uploaded

Recently uploaded (20)

Lecture23