Relational Transfer in Reinforcement Learning

Lisa Torrey University of Wisconsin – Madison CS 540 Transfer Learning

Education Hierarchical curriculum Learning tasks share common stimulus-response elements Abstract problem-solving Learning tasks share general underlying principles Multilingualism Knowing one language affects learning in another Transfer can be both positive and negative Transfer Learning in Humans

Transfer Learning in AI Given Learn Task T Task S

Goals of Transfer Learning higher asymptote higher slope performance higher start training

Inductive Learning Search Allowed Hypotheses All Hypotheses

Transfer in Inductive Learning Search Allowed Hypotheses All Hypotheses Thrun and Mitchell 1995: Transfer slopes for gradient descent

Transfer in Inductive Learning Bayesian methods Bayesian Learning Bayesian Transfer Prior distribution + Data = Posterior Distribution Raina et al.2006: Transfer a Gaussian prior

Transfer in Inductive Learning Hierarchical methods Pipe Surface Circle Line Curve Stracuzzi2006: Learn Boolean concepts that can depend on each other

Transfer in Inductive Learning Dealing with Missing Data or Labels Task T Task S Shi et al. 2008: Transfer via active learning

Reinforcement Learning Agent Q(s1, a) = 0 π(s1) = a1 Q(s1, a1)  Q(s1, a1) + Δ π(s2) = a2 s2 s3 a1 a2 r2 r3 s1 Environment δ(s2, a2) = s3 r(s2, a2) = r3 δ(s1, a1) = s2 r(s1, a1) = r2

Transfer in Reinforcement Learning Starting-point methods Hierarchical methods Alteration methods Imitation methods New RL algorithms

Transfer in Reinforcement Learning Starting-point methods Initial Q-table transfer Source task no transfer target-task training Taylor et al. 2005: Value-function transfer

Transfer in Reinforcement Learning Hierarchical methods Soccer Pass Shoot Run Kick Mehta et al. 2008: Transfer a learned hierarchy

Transfer in Reinforcement Learning Alteration methods Task S Original states Original actions Original rewards New states New actions New rewards Walsh et al. 2006: Transfer aggregate states

Transfer in Reinforcement Learning New RL Algorithms Agent Q(s1, a) = 0 π(s1) = a1 Q(s1, a1)  Q(s1, a1) + Δ π(s2) = a2 a1 a2 s2 s3 s1 r2 r3 Environment δ(s2, a2) = s3 r(s2, a2) = r3 δ(s1, a1) = s2 r(s1, a1) = r2 Torrey et al. 2006: Transfer advice about skills

Transfer in Reinforcement Learning Imitation methods source policy used target training Torrey et al. 2007: Demonstrate a strategy

My Research Starting-point methods Hierarchical methods Hierarchical methods Imitation methods New RL algorithms Skill Transfer Macro Transfer

RoboCup Domain 3-on-2 KeepAway 3-on-2 BreakAway 2-on-1 BreakAway 3-on-2 MoveDownfield

Inductive Logic Programming IF [ ] THEN pass(Teammate) IF distance(Teammate) ≤ 5 THEN pass(Teammate) IF distance(Teammate) ≤ 10 THEN pass(Teammate) … IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 15 THEN pass(Teammate) IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate)

Advice Taking Batch Reinforcement Learning via Support Vector Regression (RL-SVR) Agent Agent Compute Q-functions … Environment Environment Batch 2 Batch 1 Find Q-functions that minimize: ModelSize + C × DataMisfit

Advice Taking Batch Reinforcement Learning with Advice (KBKR) Agent Agent Compute Q-functions … Environment Environment Advice Batch 1 Batch 2 + µ × AdviceMisfit Find Q-functions that minimize: ModelSize + C × DataMisfit

Skill Transfer Algorithm Source ILP IF distance(Teammate) ≤ 5 angle(Teammate, Opponent) ≥ 30 THEN pass(Teammate) Mapping Advice Taking Target [Human advice]

Selected Results Skill transfer to 3-on-2 BreakAway from several tasks

Macro-Operators pass(Teammate) move(Direction) IF [ ... ] THEN pass(Teammate) IF [ ... ] THEN move(ahead) IF [ ... ] THEN shoot(goalRight) IF [ ... ] THEN shoot(goalLeft) IF [ ... ] THEN pass(Teammate) IF [ ... ] THEN move(left) IF [ ... ] THEN shoot(goalRight) IF [ ... ] THEN shoot(goalRight) shoot(goalRight) shoot(goalLeft)

Demonstration An imitation method source policy used target training

Macro Transfer Algorithm Source ILP Demonstration Target

Macro Transfer Algorithm Learning structures Positive: BreakAway games that score Negative: BreakAway games that didn’t score ILP IF actionTaken(Game, StateA, pass(Teammate), StateB) actionTaken(Game, StateB, move(Direction), StateC) actionTaken(Game, StateC, shoot(goalRight), StateD) actionTaken(Game, StateD, shoot(goalLeft), StateE) THEN isaGoodGame(Game)

Macro Transfer Algorithm Learning rules for arcs Positive: states in good games that took the arc Negative: states in good games that could have taken the arc but didn’t ILP pass(Teammate) shoot(goalRight) IF [ … ] THEN loop(State, Teammate)) IF [ … ] THEN enter(State)

Selected Results Macro transfer to 3-on-2 BreakAway from 2-on-1 BreakAway

Machine learning is often designed in standalone tasks Transfer is a natural learning ability that we would like to incorporate into machine learners There are some successes, but challenges remain, like avoiding negative transfer and automating mapping Summary

Relational Transfer in Reinforcement Learning

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Relational Transfer in Reinforcement Learning

Similar to Relational Transfer in Reinforcement Learning (20)

More from butest

More from butest (20)

Relational Transfer in Reinforcement Learning