• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Empirical Game-Theoretic Analysis for Practical Strategic Reasoning
 

Empirical Game-Theoretic Analysis for Practical Strategic Reasoning

on

  • 265 views

Keynote by Michael Wellman at The 15th International Conference on Principles and Practice of Multi-Agent Systems (PRIMA-2012) during the Knowledge Technology Week (KTW2012). September 3 - 7, 2012. ...

Keynote by Michael Wellman at The 15th International Conference on Principles and Practice of Multi-Agent Systems (PRIMA-2012) during the Knowledge Technology Week (KTW2012). September 3 - 7, 2012. Kuching, Sarawak, Malaysia

Statistics

Views

Total Views
265
Views on SlideShare
163
Embed Views
102

Actions

Likes
1
Downloads
0
Comments
0

2 Embeds 102

http://ktw.mimos.my 100
http://www.slashdocs.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Empirical Game-Theoretic Analysis for Practical Strategic Reasoning Empirical Game-Theoretic Analysis for Practical Strategic Reasoning Document Transcript

    • M.  Wellman   6  Sep  12   Empirical  Game-­‐Theore<c  Analysis   for  Prac<cal  Strategic  Reasoning   Michael  P.  Wellman   University  of  Michigan   Planning  in  Strategic  Environments   •  Planning  problem   –  find  agent  behavior  sa<sfying/op<mizing  objec<ves   wrt  environment   –  strives  for  ra<onality   Agent1   Environment   Agent2     •  When  environment  contains  other  agents   –  model  them  as  ra#onal  planners  as  well   –  problem  is  a  game   –  search  now  mul<-­‐dimensional,  different  (global)   objec<ve  PRIMA-­‐12   1  
    • M.  Wellman   6  Sep  12   Real-­‐World  Games   complex  dynamics  and  uncertainty       •  rich  strategy  space   –  strategy:  obs*  ×  <me       ac<on     •  severely  incomplete   two  approaches   informa<on   1.  analyze  (stylized)   –  interdependent  types  (signals)   approxima<ons   –  info  par<ally  revealed  over   –  one-­‐shot,  complete  info…   <me   2.  simula<on-­‐based   ➙ analy<c  game-­‐theore<c   methods   solu<ons  few  and  far   –  search   between   –  empirical:  sta<s<cs,   machine  learning,…   Empirical  Game-­‐Theore<c  Analysis   (EGTA)   •  Game  described  procedurally,  no  directly   usable  analy<cal  form   •  Parametrize  strategy  space  based  on  agent   architecture   •  Selec<vely  explore  strategy/profile  space   •  Induce  game  model  (payoff  func<on)  from   simula<on  data   Empirical  game  PRIMA-­‐12   2  
    • M.  Wellman   6  Sep  12   EGTA  Process   2.  Es<mate  empirical  game   Payoff   Empirical   Profile   Simulator   Data   Game   Game   Profile  Space   Analysis  (NE)   3.  Solve  empirical  game   Strategy  Set   1.  Parametrize  strategy  space   Simula<on-­‐Based  Game  Modeling   …   5,1   0,2   6,8  PRIMA-­‐12   3  
    • M.  Wellman   6  Sep  12   TAC  Supply  Chain  Mgmt  Game   suppliers manufacturers Pintel CPU Manufacturer 1 IMD component RFQs Manufacturer 2 Basus Motherboard PC RFQs Macrostar supplier offers Manufacturer 3 PC bids customer PC orders Memory Mec Manufacturer 4 component Queenmax orders Manufacturer 5 Watergate 10 component types Hard Disk 16 PC types Manufacturer 6 Mintor 220 simulation days 15 seconds per day Two-­‐Strategy  Game  (Unpreempted)  PRIMA-­‐12   4  
    • M.  Wellman   6  Sep  12   Two-­‐Strategy  Game  (Unpreempted)   Three-­‐Strategy  Game:  Devia<ons  PRIMA-­‐12   5  
    • M.  Wellman   6  Sep  12   Ranking  Strategies   •  O`en  want  to  know:  which  is  “beber”,   strategy  A  or  strategy  B?   •  Problem:     –  Depends  on  what  other  agents  do   –  Cannot  evaluate  independent  of  strategic  context   •  Which  context?   –  Self-­‐play   –  Fixed  propor<ons  of  other  agents   –  Equilibrium  (NE  Regret)   Ranking  Strategies:  TAC/SCM-­‐07   SCM-­‐07  Tournament   SCM-­‐07  EGTA   from  PR  Jordan  PhD  Thesis,  2009  PRIMA-­‐12   6  
    • M.  Wellman   6  Sep  12   Strategy  Ranking  (TAC  Travel)   50 Strategies  ranked  with   24 49 42 respect  to  the  final   43 5 47 20 equilibrium  context   31 40 44 17 9 3 25 from  LJ  Schvartzman  PhD  Thesis,  2009   7 18 39 30 16 26 4 32 28 22 8 6 27 29 19 45 46 23 10 35 36 34 37 15 41 21 14 38 33 11 12 13 1 2 −1400 −1200 −1000 −800 −600 −400 −200 0 200 Deviation Gain Strategy  Ranking  (CDA)   strategy   NE1  regret   NE2  regret   symm.   profile  payoff   GDX   0   1.32   247.98   GD   0.49   3.26   248.57   RB   2.20   8.64   248.08   ZIP   2.90   9.86   247.95   Kaplan   4.56   24.55   2.02   ZIbtq   14.67   17.44   247.45   ZI   16.42   16.82   248.07  PRIMA-­‐12   7  
    • M.  Wellman   6  Sep  12   Strategy  Ranking  (SimSPSB)   SC  Local:  Heuris<c  search  for  op<mal  bid  in  response  to   self-­‐confirming  prices   SC Local SC BidEval Local BidEval AvgMU U[6,4] U[5,5] U[5,8] H[5,3] H[5,5] SCLocalBidSearchS5K6_HB SCLocalBidSearch_K16Z_HB SCLocalBidSearch_K16_HB SCBidXEvaluatorMixA_K16_HB SCBidEvaluatorMixA_K16_HB LocalBidSearch_K16_HB BidXEvaluatorMixA_K16_HB BidXEvaluatorMix3_K16_HB BidEvaluatorMixA BidEvaluatorMix_E8S32K8_HB AverageMU64Z_HB AverageMU64_HB 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0 2 4 6 8 10 12 0 2 4 6 8 10 NE Regret bobom  line:      SC  Local  most  effec<ve  overall,  robust  across  environments   Scaling  #Players   log(|G|)   strategic   complexity   #players  PRIMA-­‐12   8  
    • M.  Wellman   6  Sep  12   Improving  Scalability   •  Exploit  locality  of  interac<on   –  graphical  games,  MAIDs,   ac<on-­‐graph  games,  …   •  Aggregate  agents   –  hierarchical  reduc<on   (Wellman  et  al.  AAAI-­‐05)   –  clustering  (Ficici  et  al.  UAI-­‐08)   –  devia<on-­‐preserving   reduc<on  (Wiedenbeck  &  W,   2012)   Game  Size   Number  of  profiles:   N +|S| 1 N “profile”  PRIMA-­‐12   9  
    • M.  Wellman   6  Sep  12   Player  Reduc<on   ⇡ [Wellman,  et  al.  2005]   Hierarchical  Reduc<on  PRIMA-­‐12   10  
    • M.  Wellman   6  Sep  12   [Ficici,  et  al.  2008]   Twins  Reduc<on   vs   [Ficici,  et  al.  2008]   Twins  Reduc<on   vs  PRIMA-­‐12   11  
    • M.  Wellman   6  Sep  12   Devia<on-­‐Preserving  Reduc<on   vs   Devia<on-­‐Preserving  Reduc<on   vs  PRIMA-­‐12   12  
    • M.  Wellman   6  Sep  12   6   HR   24   HR   12p-­‐6s  Network  Forma<on  Game   100p-­‐2s  Conges<on  Game   5   20   4   DPR   16   DPR   3   12   2   8   TR   TR   1   4   0   0   0   500   1000   1500   0   5   10   15   20   4   12p-­‐6s  Local  Effect  Game   4   HR   12p-­‐6s  Conges<on  Game   HR   3   3   DPR   DPR   2   2   TR   TR   1   1   0   0   0   500   1000   1500   0   500   1000   1500  PRIMA-­‐12   13  
    • M.  Wellman   6  Sep  12   Role-­‐Symmetric  Games   32   Hierarchical  Reduc<on  PRIMA-­‐12   14  
    • M.  Wellman   6  Sep  12   Devia<on-­‐Preserving  Reduc<on   vs   Devia<on-­‐Preserving  Reduc<on   vs  PRIMA-­‐12   15  
    • M.  Wellman   6  Sep  12   Itera<ve  EGTA  Process   Game  Model   Induc<on   Sampling  Control   Problem   Problem   Payoff   Empirical   Profile   Simulator   Data   Game   Select   Game   Profile  Space   Analysis  (NE)   More   More   Strategy  Set   Add  Strategy   Strategy  Space   Strategies     Refine?   Samples   N   Strategy  Explora<on   Problem   End   Sampling  Control  Problem   •  Revealed  payoff  model   –  sample  provides  exact  payoff   –  minimum-­‐regret-­‐first  search  (MRFS)   •  abempts  to  refute  best  current  candidate   •  Noisy  payoff  model   –  sample  drawn  from  payoff  distribu<on   –  informa<on  gain  search  (IGS)   •  sample  profile  maximizing  entropy  difference  wrt   probability  of  being  min-­‐regret  profile  PRIMA-­‐12   16  
    • M.  Wellman   6  Sep  12   Min-­‐Regret-­‐First  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 0 start r1 9,5 3,3 2,5 4,8 (arbitrary) r2 6,4 8,8 3,0 5,3 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 0 (r1,c2) 2 evaluated r1 9,5 3,3 2,5 4,8 best r2 6,4 8,8 3,0 5,3 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 Select  random  devia<on  from  current  best  profile  PRIMA-­‐12   17  
    • M.  Wellman   6  Sep  12   Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 0 (r1,c2) 2 evaluated (r2,c1) 3 r1 9,5 3,3 2,5 4,8 best r2 6,4 8,8 3,0 5,3 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 0 (r1,c2) 2 evaluated (r2,c1) 3 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 r2 6,4 8,8 3,0 5,3 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3PRIMA-­‐12   18  
    • M.  Wellman   6  Sep  12   Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 3 (r1,c2) 5 evaluated (r2,c1) 3 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 (r1,c4) 0 r2 6,4 8,8 3,0 5,3 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 3 (r1,c2) 5 evaluated (r2,c1) 3 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 (r1,c4) 1 r2 6,4 8,8 3,0 5,3 (r2,c4) 1 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3PRIMA-­‐12   19  
    • M.  Wellman   6  Sep  12   Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 3 (r1,c2) 5 evaluated (r2,c1) 4 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 (r1,c4) 1 r2 6,4 8,8 3,0 5,3 (r2,c4) 5 (r2,c2) 0 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 3 (r1,c2) 5 evaluated (r2,c1) 4 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 (r1,c4) 1 r2 6,4 8,8 3,0 5,3 (r2,c4) 5 (r2,c2) 0 (r2,c3) 8 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3PRIMA-­‐12   20  
    • M.  Wellman   6  Sep  12   Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 3 (r1,c2) 5 evaluated (r2,c1) 4 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 (r1,c4) 1 r2 6,4 8,8 3,0 5,3 (r2,c4) 5 (r2,c2) 0 (r2,c3) 8 r3 2,2 2,1 3,2 4,6 (r3,c2) 6 r4 4,4 2,0 2,2 9,3 Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 3 (r1,c2) 5 evaluated (r2,c1) 4 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 (r1,c4) 1 r2 6,4 8,8 3,0 5,3 (r2,c4) 5 (r2,c2) 0* (r2,c3) 8 NE r3 2,2 2,1 3,2 4,6 (r3,c2) 6 Confirmed! (r4,c2) 6 r4 4,4 2,0 2,2 9,3PRIMA-­‐12   21  
    • M.  Wellman   6  Sep  12   Finding  Approximate  PSNE   Itera<ve  EGTA  Process   Game  Model   Induc<on   Sampling  Control   Problem   Problem   Payoff   Empirical   Profile   Simulator   Data   Game   Select   Game   Profile  Space   Analysis  (NE)   More   More   Strategy  Set   Add  Strategy   Strategy  Space   Strategies     Refine?   Samples   N   Strategy  Explora<on   Problem   End  PRIMA-­‐12   22  
    • M.  Wellman   6  Sep  12   Construct  Empirical  Game   •  Simplest  approach:  direct  es<ma<on   –  employ  control  variates  and  other  variance   reduc<on  techniques   Empirical  Game   (s1,u(s1))   ?   ...   u(•)   (sL,u(sL))   Payoff  data  from  selected  profiles   Payoff  Func<on  Regression   Si  =  [0,1]   generate  data  (simula<ons)   FPSB2  Example   0   0.5   1   0   3,3   1,4   1,1   0.5   4,1   2,2   4,1   1,1   1,0   3,3   1   learn  regression   solve  learned  game   eq  =  (0.32,0.32)   Vorobeychik  et  al.,  ML  2007  PRIMA-­‐12   23  
    • M.  Wellman   6  Sep  12   Generaliza<on  Risk  Approach   •  Model  varia<ons   –  func<onal  forms,  rela<onship   Cross  ValidaHon   structures,  parameters   –  strategy  granularity   Observa#on  Data   •  Approach:   –  Treat  candidate  game  model Fold  1   Fold  2   Fold  3   as  a  predictor  for  payoff  data   –  Adopt  loss  func<on  for   predictor   Training   Valida#on   –  Select  model  candidate   minimizing  expected  loss   Jordan  et  al.,  AAMAS-­‐09   Itera<ve  EGTA  Process   Game  Model   Induc<on   Sampling  Control   Problem   Problem   Payoff   Empirical   Profile   Simulator   Data   Game   Select   Game   Profile  Space   Analysis  (NE)   More   More   Strategy  Set   Add  Strategy   Strategy  Space   Strategies     Refine?   Samples   N   Strategy  Explora<on   Problem   End  PRIMA-­‐12   24  
    • M.  Wellman   6  Sep  12   Learning  New  Strategies:  EGTA+RL   Payoff   Empirical   Profile   Simulator   Data   Game   Select   Game   Profile  Space   Online   Analysis  (NE)   Learning   New   RL:  Best  response   More   More   Strategy  Set   Strategy   to  NE   Strategies     Refine?   Samples   Add  new   N   Strategy   Y   Y   N   Improve   N   Deviates?   RL  Model?   End   CDA  Learning  Problem  Setup   H1:  Moving  average     H2:  Frequency  weighted  ra<o,   Ac<ons   History  of   recent   threshold=  V   A:  Offset  from  V   trades   H3:  Frequency  weighted  ra<o,   threshold=  A     Quotes   Q1:  Opposite  role   State   Q2:  Same  role   Space     Rewards   T1:  Total   Time   T2:  Since  last  trade     R:  Difference  between   U:  Number  of  trades  le`   unit  valua<on  and  trade   Pending   V:  Value  of  next  unit  to  be  traded   price   Trades  PRIMA-­‐12   25  
    • M.  Wellman   6  Sep  12   EGTA/RL  Round  1   Strategies   Payoff   NE   Learning   Strategy   Dev.   Payoff   Kaplan   ZI   248.1   1.000      ZI   L1   268.7   ZIbtq   L1   242.5   1.000      L1   EGTA/RL  Round  2   Strategies   Payoff   NE   Learning   Strategy   Dev.   Payoff   Kaplan   ZI   248.1   1.000      ZI   L1   268.7   ZIbtq   L1   242.5   1.000      L1   ZIP   248.0   1.000      ZIP   L2-­‐L8   -­‐-­‐-­‐   GD   248.6   1.000      GD   L9   251.8   0.531      GD   L10   252.1   L9   246.1   0.469      L9  PRIMA-­‐12   26  
    • M.  Wellman   6  Sep  12   EGTA/RL  Rounds  3+   Strategies   Payoff   NE   Learning   Strategy   Dev.  Payoff   …   …   …   …   …   L10   248.0   0.191      GD   L11   251.0   0.809      L10   L11   246.2   1.000      L11   GDX   245.8   0.192      GDX   L12   248.3   0.808      L11   L12   245.8   0.049      L11   L13   245.9   0.951      L12   Final  champion   L13   245.6   0.872      L12   L14   245.6   0.128      L13   RB   245.6   0.872      L12   0.128      L13   Strategy  Explora<on  Problem   •  Premise:   –  Limited  ability  to  cover  profile  space   –  Expecta<on  to  reasonably  evaluate  all  considered   strategies   •  Need  deliberate  policy  to  decide  which  strategies   to  introduce   •  RL  for  strategy  explora<on   –  abempt  at  best  response  to  current  equilibrium   –  is  this  a  good  heuris<c  (even  assuming  ideal  BR  calc?)  PRIMA-­‐12   27  
    • M.  Wellman   6  Sep  12   Example" Introduce  strategies  in  order:   A1   A2   A3   A4   A1,  A2,  A3,  A4   A1   1,  1   1,  2   1,  3   1,  4   A2   2,  1   2,  2   2,  3   2,  6   Regret may increase A3   3,  1   3,  2   3,  3   3,  8   over subsequent steps!" A4   4,  1   6,  2   8,  3   4,  4   Strategy  Set   Candidate  Eq.   Regret  wrt  True  Game   {A1}   (A1,A1)   3   {A1,A2}   (A2,A2)   4   {A1,A2,A3}   (A3,A3)   5   {A1,A2,A3,A4}   (A4,A4)   0   FPSB2  Regret  Surface   BR 0.1 E(DEV) 0.14 DEV [MESH] 0.09 0.12 0.08 0.1 0.07 0.08 !(kj) 0.06 0.06 0.05 0.04 0.04 0.02 0.03 0 0 0.02 0.5 0.01 0 kj 0.2 0.4 0.6 1 0.8 1 kiPRIMA-­‐12   28  
    • M.  Wellman   6  Sep  12   Explora<on  Policies   •  RND:  Random  (uniform)  selec<on   •  Devia<on-­‐Based   –  DEV:  Uniform  among  strategies  that  deviate  from  current   equilibrium   –  BR:  Best  response  to  current  equilibrium   –  BR+DEV:  Alternate  on  successive  itera<ons   –  ST(t):  So`max  selec<on  among  deviators,  propor<onal  to  gain   •  MEMT:     –  Select  strategy  that  maximizes  the  gain  (regret)  from   devia<ng  to  a  strategy  outside  the  set  from  any   mixture  over  the  set.   CDA↓4" 103 MEMT DEV Expected Regret 102 RND BR 101 ST10 ST1 ST0.1 100 10!1 10!2 10!3 1 3 5 7 9 11 13 StepPRIMA-­‐12   29  
    • M.  Wellman   6  Sep  12   EGTA  Applica<ons   •  Market  games   –  TAC:  Travel,  Supply  Chain,  Ad  Auc<on   –  Canonical  auc<ons:  SimAAs,  CDAs,  SimSPSBs,…   –  Equity  premium  in  financial  trading   •  Other  domains   –  Privacy:  informa<on  sharing  abacks   –  Networking:  rou<ng,  wireless  AP  selec<on   –  Credit  network  forma<on   •  Mechanism  design   Conclusion:  EGTA  Methodology   •  Extends  scope  of  GT  to  procedurally  defined   scenarios   •  Embraces  sta<s<cal  underpinnings  of  strategic   reasoning   •  Search  process:   –  GT  for  establishing  salient  strategic  context   –  Strategy  explora<on:     •  e.g.,  RL  to  search  for  best  response  to  that  context   → Principled  approach  to  evaluate  complex  strategy   spaces   •  Growing  toolbox  of  EGTA  techniques  PRIMA-­‐12   30