2. XCS(F) in multistep problems XCS(F) was successfully applied to complex and largesingle step problems. In contrast, even rather simplemultistep problems might be very challenging for XCS(F) Connections with methods of generalized reinforcement learning have been widely studied and so common issues: over-generalization unstable learning process divergence (with computed prediction) Advanced prediction mechanisms (e.g. Tile Coding) generally help but do not provide any guarantee
3. XCS(F) searches for the best generalizations in the problem space Generalizations might prevent from learning the optimal payoff landscape The payoff landscape learned affects the search for generalizations in the problem space
4. What is this talk about? We introduce an alternative approach to multistep problems based on Fitted Q Iteration involving a sequence of single step problems We will show only some preliminary results to test the presented approach Agenda Fitted Q Iteration Fitted Q Iteration + XCS Preliminary Results Discussion Future Works
5. Fitted Q Iteration (Ernst et al., 2005) Qi(s,a) rt+1 Agent st delay at st+1 Problem {<st,at,rt+1,st+1>} Learner
7. Fitted Q Iteration +XCS XCS is applied to the target multistep problem The interaction between XCS and the problem is sampled A sequence of single step regression problems is generated the state is the concatenation of the state and the action of the original multistep problem no actions training set is built for all the <st,at> pairs collected test set is built for all the <st+1,-> collected XCS is applied iteratively to each single step problem generated Qi(s,a) is computed as the system prediction on the test set
17. Discussion Fitted Q Iteration + XCS offers several advantages efficient learning generalization over the action space However… no real-time learning assumes a static environment how to perform a good problem space sampling and how does it affect the performance? how does XCS compares to other supervised learning techniques in this task?
18. Future Works Integrating Fitted Q-Iteration and XCS in an incremental/iterated fashion Test on more challenging problems that requires generalization (e.g., Butz and Lanzi, 2010) Investigate sampling strategies Extends XCS based on some principles of Fitted Q Iteration?