Computed Prediction: So far, so good. What now?

Computed Prediction
So far, so good. What now?

Pier Luca Lanzi

Politecnico di Milano, Italy
Illinois Genetic Algorithms Laboratory,
University of Illinois at Urbana Champaign, USA

What is the problem?

Agent

stt+1 at
rt+1
How much future reward
when action at is performed in state st?
Environment
What is the expected payoff for st and at?

Compute a value function Q(st,at) mapping
GOAL: maximize the amount of
state-action pairs into expected future payoffs
reward received in the long run

Example: The Mountain Car
rt = 0 when goal is
reached, -1 otherwise.
GOAL
Value Function
Q(st,at)
st = position,

ac c.
velocity
no , ac
c.
ht, eft
rig c. l
ac
a=
t

Task: drive an underpowered
car up a steep mountain road

What are the issues?

Learning the unknown payoff function
while also trying to approximate it

Approximator works on intermediate estimates
but it also tries to provide information for the
learning
 Exact representation infeasible
 Approximation mandatory not guaranteed
Convergence is
 The function is unknown,
it is learnt online from experience

Learning Classifier Systems

Solve reinforcement learning problems

Represent the payoff function Q(st, at) as
a population of rules, the classifiers.

Classifiers are evolved while
Q(st, at) is learnt online

What is a classifier?

IF condition C is true for input s
Generalization depends on a is p well
THEN the payoff of action how
conditions can partition the problem space
Accurate
approximations
What is the best representation for the
payoff
payoff

problem? surface for A
p

General conditions
Several representations have been
covering large portions
Condition
of the developed to improve generalization
problem space
C(s)=l≤s≤u
s
l u

What is computed prediction?

Replace the prediction p by
a parametrized function
p(x,w) Which type of
approximation?
payoff
payoff
p(x,w)=w0+xw1
landscape of A

Which Representation?
Condition
C(s)=l≤s≤u
x
l u

IF condition C is true for input s

Computed Prediction:
Linear approximation
 Each classifier has a vector of parameters w
 Classifier prediction is computed as,

 Classifier weights are updated using
Widrow-Hoff update,

What are the differences?

Gradient
Convex Hulls
Descent
GOAL: Learn the
Linear
Boolean
APPROXIMATOR

payoff function
Prediction
Representatio LCS approach asks:
Typical Boolean
Typical RL approach:
Radial Basis REPRESENTATION
n Representation
What is the best representation
SigmoidPredict best approximator?
What is the Neural
ion intervals for messy problem? PredictionHulls
the
0/1/# NNs ellipsoid Symbol
ComputedBull
Real Intervals s
(O’hara &
Neural 2004)
Prediction
Tile Coding
Prediction

To represent or to approximate?

Experiment
 Powerful representations allow the solution of
difficult problems with basic approximators
Consider a very powerful approximator
 Powerful approximators may make the
that we know it can solve a certain RL problem
choice of the representation less critical
Use it to compute classifier prediction in an LCS
and apply the LCS to solve the same problem

Does genetic search still
provide an advantage?

Computed prediction with Tile Coding

 Powerful approximator developed in
the reinforcement learning community
 Tile coding can solve the mountain car problem
given an adequate parameter setting

What should we expect?
 Classifier prediction is computed using tile coding
 Each tile coding has a different parameter settings
 When using tile coding to compute
classifier prediction, one classifier can
solve the whole problem

The performance?

Computed prediction can perform as well as the
approximator with the most adequate configuration

The evolution of a population of classifiers
provides advantages over one approximator

Even if the same approximator alone
might solve the whole problem

What now?
REPRESENTATION

Which approximator?
Which
Let evolution decide!
representation?
APPROXIMATOR

Population of classifiers using different
approximators to compute prediction
Proble
The genetic algorithm m
selects the best
Which
approximators for each problem subspace
approximator?

Evolving the best approximator

What next?
REPRESENTATION

Which approximator?
Which
Let evolution decide!
representation?
APPROXIMATOR

Population of classifiers using different
approximators to compute prediction
Proble
m
Even if the same approximator alone
Which
might solve the whole problem
approximator?

Evolving Heterogeneous Approximators

Heterogeneous
Approximators

Most Powerful
Approximator

What next?
Probably done
for Boolean
 Allow different representations Conditions
in the same populations
 Let evolution evolve the most adequate
representation for each problem subspace

 Then, allow different representations and
different approximators evolve all together

Acknowledgements

 Daniele Loiacono
 Matteo Zanini
 All the current and former
members of IlliGAL

Computed Prediction: So far, so good. What now?

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (6)

Similar to Computed Prediction: So far, so good. What now?

Similar to Computed Prediction: So far, so good. What now? (20)

More from Xavier Llorà

More from Xavier Llorà (20)

Recently uploaded

Recently uploaded (20)

Computed Prediction: So far, so good. What now?