One Step Fits All


Published on

Daniele Loiacono. "One Step Fits All: Fitted Q Iteration with XCS". IWLCS, 2011

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

One Step Fits All

  1. 1. OneStepFitsAllFittedQIteration with XCS<br />Daniele Loiacono<br />
  2. 2. XCS(F) in multistep problems<br />XCS(F) was successfully applied to complex and largesingle step problems.<br />In contrast, even rather simplemultistep problems might be very challenging for XCS(F)<br />Connections with methods of generalized reinforcement learning have been widely studied and so common issues:<br />over-generalization<br />unstable learning process <br />divergence (with computed prediction)<br />Advanced prediction mechanisms (e.g. Tile Coding) generally help but do not provide any guarantee<br />
  3. 3. XCS(F) searches for the best generalizations in the problem space<br />Generalizations might prevent from learning the optimal payoff landscape<br />The payoff landscape learned affects the search for generalizations in the problem space<br />
  4. 4. What is this talk about?<br />We introduce an alternative approach to multistep problems<br />based on Fitted Q Iteration<br />involving a sequence of single step problems<br />We will show only some preliminary results to test the presented approach<br />Agenda<br />Fitted Q Iteration<br />Fitted Q Iteration + XCS<br />Preliminary Results<br />Discussion<br />Future Works<br />
  5. 5. Fitted Q Iteration (Ernst et al., 2005)<br />Qi(s,a)<br />rt+1<br />Agent<br />st<br />delay<br />at<br />st+1<br />Problem<br />{<st,at,rt+1,st+1>}<br />Learner<br />
  6. 6. Fitted Q Iteration (Ernst et al., 2005)<br />Q1(s,a)<br />Q2(s,a)<br />QL(s,a)<br />…<br />{<st,at,rt+1,st+1>}<br />
  7. 7. Fitted Q Iteration +XCS<br />XCS is applied to the target multistep problem<br />The interaction between XCS and the problem is sampled<br />A sequence of single step regression problems is generated<br />the state is the concatenation of the state and the action of the original multistep problem<br />no actions<br />training set is built for all the <st,at> pairs collected<br />test set is built for all the <st+1,-> collected<br />XCS is applied iteratively to each single step problem generated <br />Qi(s,a) is computed as the system prediction on the test set<br />
  8. 8. Experimental Design<br />Woods 14 <br />Woods 1<br />Maze 5<br />Maze 6<br />
  9. 9. Experimental Results: Woods 1<br />XCS +<br />Sampling for 50 problems<br />
  10. 10. Experimental Results: Woods 1<br />
  11. 11. Experimental Results: Maze 5<br />XCS +<br />Sampling for 25 problems<br />
  12. 12. Experimental Results: Maze 5<br />
  13. 13. Experimental Results: Maze 6<br />XCS +<br />Sampling for 15 problems<br />
  14. 14. Experimental Results: Maze 6<br />
  15. 15. Experimental Results: Woods 14<br />XCS +<br />Sampling for 15 problems<br />
  16. 16. Experimental Results: Woods 14<br />
  17. 17. Discussion<br />Fitted Q Iteration + XCS offers several advantages<br />efficient learning<br />generalization over the action space<br />However…<br />no real-time learning<br />assumes a static environment<br />how to perform a good problem space sampling and how does it affect the performance?<br />how does XCS compares to other supervised learning techniques in this task?<br />
  18. 18. Future Works<br />Integrating Fitted Q-Iteration and XCS in an incremental/iterated fashion <br />Test on more challenging problems that requires generalization (e.g., Butz and Lanzi, 2010) <br />Investigate sampling strategies<br />Extends XCS based on some principles of Fitted Q Iteration?<br />
  19. 19. Some hints about problem sampling<br />
  20. 20. Some hints about problem sampling<br />
  21. 21. Some hints about problem sampling<br />
  22. 22. Results of a bad sampling on Woods 1<br />
  23. 23. Results of a bad sampling on Woods 1<br />