Relational Transfer in Reinforcement Learning

Lisa Torrey and Jude ShavlikUniversity of WisconsinMadison WI, USAPolicy Transfer viaMarkov Logic Networks

BackgroundApproaches for transfer in reinforcement learningRelational transfer with Markov Logic NetworksTwo new algorithms for MLN transferOutline

Transfer LearningGivenLearnTask TTask S

Reinforcement LearningAgentExplorationExploitationQ(s1, a) = 0policy π(s1) = a1Q(s1, a1)  Q(s1, a1) + Δπ(s2) = a2s2s3a1a2r2r3Maximize rewards1Environmentδ(s2, a2) = s3r(s2, a2) = r3δ(s1, a1) = s2r(s1, a1) = r2

Learning Curveshigher asymptotehigher slopeperformancehigher starttraining

RoboCup Domain2-on-1 BreakAway3-on-2 BreakAwayHand-coded defendersSingle learning agent

Madden & Howley 2004Learn a set of rulesUse during exploration stepsCroonenborghs et al. 2007Learn a relational decision treeUse as an additional actionOur prior work, 2007Learn a relational macroUse as a demonstrationRelated Work

pass(t1)goalLeftpass(t2)goalRightIF distance(GoalPart) > 10AND angle(ball, Teammate, Opponent) > 30THENRelational Transferpass(Teammate)

Markov Logic NetworksRichardson and Domingos, Machine Learning 2006Formulas (F)evidence1(X) AND query(X)evidence2(X) AND query(X)Weights (W)w0 = 1.1w1 = 0.9query(x1)query(x2)e1e1……e2e2ni(world) = # true groundings of ith formula in world

Transfer with MLNsAlgorithm 1: Transfer source-task Q-function as an MLNTask TTask SMLN Q-functionAlgorithm 2: Transfer source-task policy as an MLNTask TTask SMLN Policy

Demonstration MethodUse MLNUse regular target-task training

MLN Q-function Transfer AlgorithmSourceAleph, AlchemyMLN Q-functionMLN foraction 1StateQ-valueMLN foraction 2StateQ-value…DemonstrationTarget

MLN Q-functionProbabilityProbabilityProbabilityBin NumberBin NumberBin Number0 ≤ Qa < 0.20.2 ≤ Qa < 0.40.4 ≤ Qa < 0.6…………

Learning an MLN Q-functionBins: Hierarchical clusteringIF … THEN 0 < Q < 0.2IF … THEN 0 < Q < 0.2IF … THEN 0 < Q < 0.2Formulas for each bin: Aleph (Srinivasan)w0= 1.1w1 = 0.9…Weights: Alchemy (U. Washington)

Selecting Rules to be MLN FormulasAdd torulesetAleph rulesDoes rule increase F-score of ruleset?yesRule 1 Precision=1.0Rule 2 Precision=0.99Rule3 Precision=0.96… …F = 2 x Precision x Recall Precision + Recall

MLN Q-function RulesExamples for transfer from 2-on-1 BreakAway to 3-on-2 BreakAwayIF distance(me, GoalPart) ≥ 42 distance(me, Teammate) ≥ 39 THEN pass(Teammate) falls into [0, 0.11]IF angle(topRight, goalCenter, me) ≤ 42 angle(topRight, goalCenter, me) ≥ 55 angle(goalLeft, me, goalie) ≥ 20 angle(goalCenter, me, goalie) ≤ 30THEN pass(Teammate) falls into [0.11, 0.27]IF distance(Teammate, goalCenter) ≤ 9 angle(topRight, goalCenter, me) ≤ 85THEN pass(Teammate) falls into [0.27, 0.43]

MLN Q-function ResultsTransfer from 2-on-1 BreakAway to 3-on-2 BreakAway

MLN Policy-Transfer AlgorithmSourceAleph, AlchemyMLN PolicyStateActionMLN(F,W)ProbabilityDemonstrationTarget

MLN Policymove(ahead)pass(Teammate)shoot(goalLeft)…………Policy: choose highest-probability action

IF … THEN pass(Teammate)IF … THEN pass(Teammate)IF … THEN pass(Teammate)Learning an MLN PolicyFormulas for each action: Aleph (Srinivasan)w0= 1.1w1 = 0.9…Weights: Alchemy (U. Washington)

MLN Policy RulesExamples for transfer from 2-on-1 BreakAway to 3-on-2 BreakAwayIF angle(topRight, goalCenter, me) ≤ 70 timeLeft ≥ 98 distance(me, Teammate) ≥ 3 THEN pass(Teammate)IF distance(me, GoalPart) ≥ 36 distance(me, Teammate) ≥ 12 timeLeft ≥ 91 angle(topRight, goalCenter, me) ≤ 80THEN pass(Teammate)IF distance(me, GoalPart) ≥ 27 angle(topRight, goalCenter, me) ≤ 75 distance(me, Teammate) ≥ 9 angle(Teammate, me, goalie) ≥ 25THEN pass(Teammate)

MLN Policy ResultsMLN policy transfer from 2-on-1 BreakAway to 3-on-2 BreakAway

ILP rulesets can represent a policy by themselvesDoes the MLN provide extra benefit?Yes, MLN policies perform as well or betterMLN policies can include action-sequence knowledgeDoes this improve transfer?No, the Markov assumption appears to hold in RoboCupAdditional Experimental Findings

MLN transfer can improve reinforcement learningHigher initial performancePolicies transfer better than Q-functionsSimpler and more generalPolicies can transfer better than macros, but not alwaysMore detailed knowledge, risk of overspecializationMLNs transfer better than rulesetsStatistical-relational over pure relationalAction-sequence information is redundantMarkov assumption holds in our domain Conclusions

Future WorkRefinement of transferred knowledgeRevising weightsRelearning rules(Mihalkova et al. 2007) Too-specificclauseBetterclauseToo-generalclauseBetterclause

Future WorkProbabilityBin NumberRelational reinforcement learningQ-learning with MLN Q-functionPolicy search with MLN policies or macroMLN Q-functions lose too much information:

Co-author: Jude ShavlikGrantsDARPA HR0011-04-1-0007DARPA FA8650-06-C-7606Thank You

Relational Transfer in Reinforcement Learning

More Related Content

What's hot

Viewers also liked

Similar to Relational Transfer in Reinforcement Learning

More from butest

Relational Transfer in Reinforcement Learning