One Step Fits All

OneStepFitsAllFittedQIteration with XCS Daniele Loiacono

XCS(F) in multistep problems XCS(F) was successfully applied to complex and largesingle step problems. In contrast, even rather simplemultistep problems might be very challenging for XCS(F) Connections with methods of generalized reinforcement learning have been widely studied and so common issues: over-generalization unstable learning process divergence (with computed prediction) Advanced prediction mechanisms (e.g. Tile Coding) generally help but do not provide any guarantee

XCS(F) searches for the best generalizations in the problem space Generalizations might prevent from learning the optimal payoff landscape The payoff landscape learned affects the search for generalizations in the problem space

What is this talk about? We introduce an alternative approach to multistep problems based on Fitted Q Iteration involving a sequence of single step problems We will show only some preliminary results to test the presented approach Agenda Fitted Q Iteration Fitted Q Iteration + XCS Preliminary Results Discussion Future Works

Fitted Q Iteration (Ernst et al., 2005) Qi(s,a) rt+1 Agent st delay at st+1 Problem {<st,at,rt+1,st+1>} Learner

Fitted Q Iteration (Ernst et al., 2005) Q1(s,a) Q2(s,a) QL(s,a) … {<st,at,rt+1,st+1>}

Fitted Q Iteration +XCS XCS is applied to the target multistep problem The interaction between XCS and the problem is sampled A sequence of single step regression problems is generated the state is the concatenation of the state and the action of the original multistep problem no actions training set is built for all the <st,at> pairs collected test set is built for all the <st+1,-> collected XCS is applied iteratively to each single step problem generated Qi(s,a) is computed as the system prediction on the test set

Experimental Design Woods 14 Woods 1 Maze 5 Maze 6

Experimental Results: Woods 1 XCS + Sampling for 50 problems

Experimental Results: Maze 5 XCS + Sampling for 25 problems

Experimental Results: Maze 6 XCS + Sampling for 15 problems

Experimental Results: Woods 14 XCS + Sampling for 15 problems

Experimental Results: Woods 14

Discussion Fitted Q Iteration + XCS offers several advantages efficient learning generalization over the action space However… no real-time learning assumes a static environment how to perform a good problem space sampling and how does it affect the performance? how does XCS compares to other supervised learning techniques in this task?

Future Works Integrating Fitted Q-Iteration and XCS in an incremental/iterated fashion Test on more challenging problems that requires generalization (e.g., Butz and Lanzi, 2010) Investigate sampling strategies Extends XCS based on some principles of Fitted Q Iteration?

Some hints about problem sampling

Results of a bad sampling on Woods 1

One Step Fits All

Recommended

Recommended

More Related Content

Similar to One Step Fits All

Similar to One Step Fits All (20)

More from Daniele Loiacono

More from Daniele Loiacono (20)

Recently uploaded

Recently uploaded (20)

One Step Fits All