Computed Prediction: So far, so good. What now? - Presentation Transcript
Computed Prediction
So far, so good. What now?
Pier Luca Lanzi
Politecnico di Milano, Italy
Illinois Genetic Algorithms Laboratory,
University of Illinois at Urbana Champaign, USA
RL
What is the problem?
Agent
stt+1 at
rt+1
How much future reward
when action at is performed in state st?
Environment
What is the expected payoff for st and at?
Compute a value function Q(st,at) mapping
GOAL: maximize the amount of
state-action pairs into expected future payoffs
reward received in the long run
Example: The Mountain Car
rt = 0 when goal is
reached, -1 otherwise.
GOAL
Value Function
Q(st,at)
st = position,
ac c.
velocity
no , ac
c.
ht, eft
rig c. l
ac
a=
t
Task: drive an underpowered
car up a steep mountain road
What are the issues?
Learning the unknown payoff function
while also trying to approximate it
Approximator works on intermediate estimates
but it also tries to provide information for the
learning
Exact representation infeasible
Approximation mandatory not guaranteed
Convergence is
The function is unknown,
it is learnt online from experience
Classifiers
Learning Classifier Systems
Solve reinforcement learning problems
Represent the payoff function Q(st, at) as
a population of rules, the classifiers.
Classifiers are evolved while
Q(st, at) is learnt online
What is a classifier?
IF condition C is true for input s
Generalization depends on a is p well
THEN the payoff of action how
conditions can partition the problem space
Accurate
approximations
What is the best representation for the
payoff
payoff
problem? surface for A
p
General conditions
Several representations have been
covering large portions
Condition
of the developed to improve generalization
problem space
C(s)=l≤s≤u
s
l u
What is computed prediction?
Replace the prediction p by
a parametrized function
p(x,w) Which type of
approximation?
payoff
payoff
p(x,w)=w0+xw1
landscape of A
Which Representation?
Condition
C(s)=l≤s≤u
x
l u
IF condition C is true for input s
Computed Prediction:
Linear approximation
Each classifier has a vector of parameters w
Classifier prediction is computed as,
Classifier weights are updated using
Widrow-Hoff update,
Summary
What are the differences?
Gradient
Convex Hulls
Descent
GOAL: Learn the
Linear
Boolean
APPROXIMATOR
payoff function
Prediction
Representatio LCS approach asks:
Typical Boolean
Typical RL approach:
Radial Basis REPRESENTATION
n Representation
What is the best representation
SigmoidPredict best approximator?
What is the Neural
ion intervals for messy problem? PredictionHulls
the
0/1/# NNs ellipsoid Symbol
ComputedBull
Real Intervals s
(O’hara &
Neural 2004)
Prediction
Tile Coding
Prediction
To represent or to approximate?
Experiment
Powerful representations allow the solution of
difficult problems with basic approximators
Consider a very powerful approximator
Powerful approximators may make the
that we know it can solve a certain RL problem
choice of the representation less critical
Use it to compute classifier prediction in an LCS
and apply the LCS to solve the same problem
Does genetic search still
provide an advantage?
Computed prediction with Tile Coding
Powerful approximator developed in
the reinforcement learning community
Tile coding can solve the mountain car problem
given an adequate parameter setting
What should we expect?
Classifier prediction is computed using tile coding
Each tile coding has a different parameter settings
When using tile coding to compute
classifier prediction, one classifier can
solve the whole problem
The performance?
Computed prediction can perform as well as the
approximator with the most adequate configuration
The evolution of a population of classifiers
provides advantages over one approximator
Even if the same approximator alone
might solve the whole problem
How do parameters evolve?
What now?
What now?
REPRESENTATION
Which approximator?
Which
Let evolution decide!
representation?
APPROXIMATOR
Population of classifiers using different
approximators to compute prediction
Proble
The genetic algorithm m
selects the best
Which
approximators for each problem subspace
approximator?
Evolving the best approximator
What next?
REPRESENTATION
Which approximator?
Which
Let evolution decide!
representation?
APPROXIMATOR
Population of classifiers using different
approximators to compute prediction
Proble
m
Even if the same approximator alone
Which
might solve the whole problem
approximator?
Evolving Heterogeneous Approximators
Heterogeneous
Approximators
Most Powerful
Approximator
What next?
Probably done
for Boolean
Allow different representations Conditions
in the same populations
Let evolution evolve the most adequate
representation for each problem subspace
Then, allow different representations and
different approximators evolve all together
Acknowledgements
Daniele Loiacono
Matteo Zanini
All the current and former
members of IlliGAL
0 comments
Post a comment