Relational Transfer in Reinforcement Learning

686 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
686
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Relational Transfer in Reinforcement Learning

  1. 1. Lisa Torrey<br />University of Wisconsin – Madison<br />CS 540<br />Transfer Learning<br />
  2. 2. Education<br />Hierarchical curriculum<br />Learning tasks share common stimulus-response elements<br />Abstract problem-solving<br />Learning tasks share general underlying principles<br />Multilingualism<br />Knowing one language affects learning in another<br />Transfer can be both positive and negative<br />Transfer Learning in Humans<br />
  3. 3. Transfer Learning in AI<br />Given<br />Learn<br />Task T<br />Task S<br />
  4. 4. Goals of Transfer Learning<br />higher asymptote<br />higher slope<br />performance<br />higher start<br />training<br />
  5. 5. Inductive Learning<br />Search<br />Allowed Hypotheses<br />All Hypotheses<br />
  6. 6. Transfer in Inductive Learning<br />Search<br />Allowed Hypotheses<br />All Hypotheses<br />Thrun and Mitchell 1995: Transfer slopes for gradient descent<br />
  7. 7. Transfer in Inductive Learning<br />Bayesian methods<br />Bayesian Learning<br />Bayesian Transfer<br />Prior<br />distribution<br />+<br />Data<br />=<br />Posterior <br />Distribution<br />Raina et al.2006: Transfer a Gaussian prior<br />
  8. 8. Transfer in Inductive Learning<br />Hierarchical methods<br />Pipe<br />Surface<br />Circle<br />Line<br />Curve<br />Stracuzzi2006: Learn Boolean concepts that can depend on each other<br />
  9. 9. Transfer in Inductive Learning<br />Dealing with Missing Data or Labels<br />Task T<br />Task S<br />Shi et al. 2008: Transfer via active learning<br />
  10. 10. Reinforcement Learning<br />Agent<br />Q(s1, a) = 0<br />π(s1) = a1<br />Q(s1, a1)  Q(s1, a1) + Δ<br />π(s2) = a2<br />s2<br />s3<br />a1<br />a2<br />r2<br />r3<br />s1<br />Environment<br />δ(s2, a2) = s3<br />r(s2, a2) = r3<br />δ(s1, a1) = s2<br />r(s1, a1) = r2<br />
  11. 11. Transfer in Reinforcement Learning<br />Starting-point <br />methods<br />Hierarchical <br />methods<br />Alteration<br />methods<br />Imitation <br />methods<br />New RL algorithms<br />
  12. 12. Transfer in Reinforcement Learning<br />Starting-point methods<br />Initial Q-table<br />transfer<br />Source task<br />no transfer<br />target-task training<br />Taylor et al. 2005: Value-function transfer<br />
  13. 13. Transfer in Reinforcement Learning<br />Hierarchical methods<br />Soccer<br />Pass<br />Shoot<br />Run<br />Kick<br />Mehta et al. 2008: Transfer a learned hierarchy<br />
  14. 14. Transfer in Reinforcement Learning<br />Alteration methods<br />Task S<br />Original states<br />Original actions<br />Original rewards<br />New states<br />New actions<br />New rewards<br />Walsh et al. 2006: Transfer aggregate states<br />
  15. 15. Transfer in Reinforcement Learning<br />New RL Algorithms<br />Agent<br />Q(s1, a) = 0<br />π(s1) = a1<br />Q(s1, a1)  Q(s1, a1) + Δ<br />π(s2) = a2<br />a1<br />a2<br />s2<br />s3<br />s1<br />r2<br />r3<br />Environment<br />δ(s2, a2) = s3<br />r(s2, a2) = r3<br />δ(s1, a1) = s2<br />r(s1, a1) = r2<br />Torrey et al. 2006: Transfer advice about skills<br />
  16. 16. Transfer in Reinforcement Learning<br />Imitation methods<br />source<br />policy used<br />target<br />training<br />Torrey et al. 2007: Demonstrate a strategy<br />
  17. 17. My Research<br />Starting-point <br />methods<br />Hierarchical <br />methods<br />Hierarchical <br />methods<br />Imitation <br />methods<br />New RL algorithms<br />Skill<br />Transfer<br />Macro<br />Transfer<br />
  18. 18. RoboCup Domain<br />3-on-2 KeepAway<br />3-on-2 BreakAway<br />2-on-1 BreakAway<br />3-on-2 MoveDownfield<br />
  19. 19. Inductive Logic Programming<br />IF [ ]<br />THEN pass(Teammate)<br />IF distance(Teammate) ≤ 5 <br />THEN pass(Teammate)<br />IF distance(Teammate) ≤ 10 <br />THEN pass(Teammate)<br />…<br />IF distance(Teammate) ≤ 5<br /> angle(Teammate, Opponent) ≥ 15 <br />THEN pass(Teammate)<br />IF distance(Teammate) ≤ 5<br /> angle(Teammate, Opponent) ≥ 30 <br />THEN pass(Teammate)<br />
  20. 20. Advice Taking<br />Batch Reinforcement Learning via Support Vector Regression (RL-SVR)<br />Agent<br />Agent<br />Compute Q-functions<br />…<br />Environment<br />Environment<br />Batch 2<br />Batch 1<br />Find Q-functions that minimize: ModelSize + C × DataMisfit<br />
  21. 21. Advice Taking<br />Batch Reinforcement Learning with Advice (KBKR)<br />Agent<br />Agent<br />Compute Q-functions<br />…<br />Environment<br />Environment<br />Advice<br />Batch 1<br />Batch 2<br />+ µ × AdviceMisfit<br />Find Q-functions that minimize: ModelSize + C × DataMisfit<br />
  22. 22. Skill Transfer Algorithm<br />Source<br />ILP<br />IF distance(Teammate) ≤ 5<br /> angle(Teammate, Opponent) ≥ 30<br />THEN pass(Teammate)<br />Mapping<br />Advice Taking<br />Target<br />[Human advice]<br />
  23. 23. Selected Results<br />Skill transfer to 3-on-2 BreakAway from several tasks<br />
  24. 24. Macro-Operators<br />pass(Teammate)<br />move(Direction)<br />IF [ ... ] <br />THEN pass(Teammate)<br />IF [ ... ] <br />THEN move(ahead)<br />IF [ ... ] <br />THEN shoot(goalRight)<br />IF [ ... ] <br />THEN shoot(goalLeft)<br />IF [ ... ] <br />THEN pass(Teammate)<br />IF [ ... ] <br />THEN move(left)<br />IF [ ... ] <br />THEN shoot(goalRight)<br />IF [ ... ] <br />THEN shoot(goalRight)<br />shoot(goalRight)<br />shoot(goalLeft)<br />
  25. 25. Demonstration<br />An imitation method<br />source<br />policy used<br />target<br />training<br />
  26. 26. Macro Transfer Algorithm<br />Source<br />ILP<br />Demonstration<br />Target<br />
  27. 27. Macro Transfer Algorithm<br />Learning structures<br />Positive: BreakAway games that score<br />Negative: BreakAway games that didn’t score<br />ILP<br />IF actionTaken(Game, StateA, pass(Teammate), StateB)<br /> actionTaken(Game, StateB, move(Direction), StateC)<br /> actionTaken(Game, StateC, shoot(goalRight), StateD)<br /> actionTaken(Game, StateD, shoot(goalLeft), StateE)<br />THEN isaGoodGame(Game)<br />
  28. 28. Macro Transfer Algorithm<br />Learning rules for arcs<br />Positive: states in good games that took the arc<br />Negative: states in good games that could have taken the arc but didn’t<br />ILP<br />pass(Teammate)<br />shoot(goalRight)<br />IF [ … ]<br />THEN loop(State, Teammate))<br />IF [ … ]<br />THEN enter(State)<br />
  29. 29. Selected Results<br />Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway<br />
  30. 30. Machine learning is often designed in standalone tasks<br />Transfer is a natural learning ability that we would like to incorporate into machine learners<br />There are some successes, but challenges remain, like avoiding negative transfer and automating mapping<br />Summary<br />

×