SlideShare a Scribd company logo
1 of 30
Download to read offline
M.	
  Wellman	
                                                                                     6	
  Sep	
  12	
  




                       Empirical	
  Game-­‐Theore<c	
  Analysis	
  
                        for	
  Prac<cal	
  Strategic	
  Reasoning	
  

                                        Michael	
  P.	
  Wellman	
  
                                       University	
  of	
  Michigan	
  




                    Planning	
  in	
  Strategic	
  Environments	
  
                    •  Planning	
  problem	
  
                       –  find	
  agent	
  behavior	
  sa<sfying/op<mizing	
  objec<ves	
  
                          wrt	
  environment	
  
                       –  strives	
  for	
  ra<onality	
  

                                          Agent1	
            Environment	
            Agent2	
  

                    	
  
                    •  When	
  environment	
  contains	
  other	
  agents	
  
                       –  model	
  them	
  as	
  ra#onal	
  planners	
  as	
  well	
  
                       –  problem	
  is	
  a	
  game	
  
                       –  search	
  now	
  mul<-­‐dimensional,	
  different	
  (global)	
  
                          objec<ve	
  




PRIMA-­‐12	
                                                                                                      1	
  
M.	
  Wellman	
                                                                                                    6	
  Sep	
  12	
  




                                            Real-­‐World	
  Games	
  
                    complex	
  dynamics	
  and	
  uncertainty	
     	
  
                                                                    	
  
                    •  rich	
  strategy	
  space	
  
                      –  strategy:	
  obs*	
  ×	
  <me	
  	
       	
  
                         ac<on	
                                    	
  
                    •  severely	
  incomplete	
                     two	
  approaches	
  
                       informa<on	
                                    1.  analyze	
  (stylized)	
  
                      –  interdependent	
  types	
  (signals)	
            approxima<ons	
  
                      –  info	
  par<ally	
  revealed	
  over	
          –  one-­‐shot,	
  complete	
  info…	
  
                         <me	
  
                                                                    2.  simula<on-­‐based	
  
                    ➙ analy<c	
  game-­‐theore<c	
                      methods	
  
                     solu<ons	
  few	
  and	
  far	
                     –  search	
  
                     between	
                                           –  empirical:	
  sta<s<cs,	
  
                                                                            machine	
  learning,…	
  




                         Empirical	
  Game-­‐Theore<c	
  Analysis	
  
                                         (EGTA)	
  
                    •  Game	
  described	
  procedurally,	
  no	
  directly	
  
                       usable	
  analy<cal	
  form	
  
                    •  Parametrize	
  strategy	
  space	
  based	
  on	
  agent	
  
                       architecture	
  
                    •  Selec<vely	
  explore	
  strategy/profile	
  space	
  
                    •  Induce	
  game	
  model	
  (payoff	
  func<on)	
  from	
  
                       simula<on	
  data	
                                 Empirical	
  game	
  




PRIMA-­‐12	
                                                                                                                     2	
  
M.	
  Wellman	
                                                                                                                                6	
  Sep	
  12	
  




                                                     EGTA	
  Process	
  
                                                                                               2.	
  Es<mate	
  empirical	
  game	
  

                                                                                       Payoff	
                 Empirical	
  
                        Profile	
                            Simulator	
  
                                                                                        Data	
                  Game	
  


                                                                                                                Game	
  
                    Profile	
  Space	
                                                                        Analysis	
  (NE)	
  


                                                                                                        3.	
  Solve	
  empirical	
  game	
  
                    Strategy	
  Set	
     1.	
  Parametrize	
  strategy	
  space	
  




                     Simula<on-­‐Based	
  Game	
  Modeling	
  



                     …	
                                      5,1	
  
                                                              0,2	
  
                                                              6,8	
  




PRIMA-­‐12	
                                                                                                                                                 3	
  
M.	
  Wellman	
                                                                                                                   6	
  Sep	
  12	
  




                     TAC	
  Supply	
  Chain	
  Mgmt	
  Game	
  
                                   suppliers     	

                 manufacturers       	

                                          	

                                     Pintel
                    	

                     CPU

                                                                      Manufacturer 1	


                                     IMD	

            component
                                                         RFQs	

                    	


                                                                     Manufacturer 2	

                                     Basus	

                     Motherboard




                                                                                                PC RFQs	


                                   Macrostar     	

   supplier
                                                        offers	

                                                                     Manufacturer 3	

         PC bids	

                                                                                                                 customer   	

                                                                                               PC orders	

                                           	

                    	

                     Memory




                                      Mec                            Manufacturer 4	

                                                       component
                                   Queenmax   	

        orders	

                                                                     Manufacturer 5	


                                   Watergate	

                                                             10 component types
                    	

                     Hard Disk




                                                                                                            16 PC types
                                                                     Manufacturer 6	

                                    Mintor	

                                                                                                            220 simulation days
                                                                                                            15 seconds per day




                    Two-­‐Strategy	
  Game	
  (Unpreempted)	
  




PRIMA-­‐12	
                                                                                                                                    4	
  
M.	
  Wellman	
                                                                6	
  Sep	
  12	
  




                    Two-­‐Strategy	
  Game	
  (Unpreempted)	
  




                                  Three-­‐Strategy	
  Game:	
  Devia<ons	
  




PRIMA-­‐12	
                                                                                 5	
  
M.	
  Wellman	
                                                                                             6	
  Sep	
  12	
  




                                                       Ranking	
  Strategies	
  
                        •  O`en	
  want	
  to	
  know:	
  which	
  is	
  “beber”,	
  
                           strategy	
  A	
  or	
  strategy	
  B?	
  
                        •  Problem:	
  	
  
                               –  Depends	
  on	
  what	
  other	
  agents	
  do	
  
                               –  Cannot	
  evaluate	
  independent	
  of	
  strategic	
  context	
  
                        •  Which	
  context?	
  
                               –  Self-­‐play	
  
                               –  Fixed	
  propor<ons	
  of	
  other	
  agents	
  
                               –  Equilibrium	
  (NE	
  Regret)	
  




                             Ranking	
  Strategies:	
  TAC/SCM-­‐07	
  

                                      SCM-­‐07	
  Tournament	
                       SCM-­‐07	
  EGTA	
  




                    from	
  PR	
  Jordan	
  PhD	
  Thesis,	
  2009	
  




PRIMA-­‐12	
                                                                                                              6	
  
M.	
  Wellman	
                                                                                                                                    6	
  Sep	
  12	
  




                                Strategy	
  Ranking	
  (TAC	
  Travel)	
  
                                                                                                                                        50
                    Strategies	
  ranked	
  with	
                                                                     24
                                                                                                                                49
                                                                                                                                      42

                    respect	
  to	
  the	
  final	
                                                                      43
                                                                                                                                       5
                                                                                                                                      47
                                                                                                                      20
                    equilibrium	
  context	
                                                                            31
                                                                                                                                     40
                                                                                                                                       44

                                                                                                             17
                                                                                                                                  9
                                                                                                                  3
                                                                                                                                  25
                    from	
  LJ	
  Schvartzman	
  PhD	
  Thesis,	
  2009	
                                     7
                                                                                                                                 18
                                                                                                         39
                                                                                                                                30
                                                                                                         16
                                                                                                                               26
                                                                                                         4
                                                                                                                              32
                                                                                                       28
                                                                                                                            22
                                                                                                       8
                                                                                                                             6
                                                                                                 27
                                                                                                                          29
                                                                                                 19
                                                                                                                        45
                                                                                              46
                                                                                                                         23
                                                                                              10
                                                                                                                      35
                                                                                              36
                                                                                                             34
                                                                                         37
                                                                                                        15
                                                                                41
                                                                                                      21
                                                                               14
                                                                                                 38
                                                                               33
                                                                                                11
                                                                   12
                                                                              13
                                                      1
                                                            2
                                 −1400        −1200       −1000         −800          −600        −400                −200             0     200
                                                                                   Deviation Gain




                                           Strategy	
  Ranking	
  (CDA)	
  

                               strategy	
                 NE1	
  regret	
                NE2	
  regret	
                    symm.	
  
                                                                                                                         profile	
  payoff	
  
                          GDX	
                                   0	
                          1.32	
                          247.98	
  
                          GD	
                                   0.49	
                        3.26	
                          248.57	
  
                          RB	
                                   2.20	
                        8.64	
                          248.08	
  
                          ZIP	
                                  2.90	
                        9.86	
                          247.95	
  
                          Kaplan	
                               4.56	
                       24.55	
                           2.02	
  
                          ZIbtq	
                               14.67	
                       17.44	
                          247.45	
  
                          ZI	
                                  16.42	
                       16.82	
                          248.07	
  




PRIMA-­‐12	
                                                                                                                                                     7	
  
M.	
  Wellman	
                                                                                                                                                                   6	
  Sep	
  12	
  




                                            Strategy	
  Ranking	
  (SimSPSB)	
  
                            SC	
  Local:	
  Heuris<c	
  search	
  for	
  op<mal	
  bid	
  in	
  response	
  to	
  
                                            self-­‐confirming	
  prices	
  
                                                                  SC Local             SC BidEval               Local             BidEval            AvgMU
                                                            U[6,4]                  U[5,5]                  U[5,8]                     H[5,3]                   H[5,5]
                      SCLocalBidSearchS5K6_HB
                     SCLocalBidSearch_K16Z_HB
                       SCLocalBidSearch_K16_HB
                    SCBidXEvaluatorMixA_K16_HB
                     SCBidEvaluatorMixA_K16_HB
                          LocalBidSearch_K16_HB
                      BidXEvaluatorMixA_K16_HB
                       BidXEvaluatorMix3_K16_HB
                                 BidEvaluatorMixA
                    BidEvaluatorMix_E8S32K8_HB
                               AverageMU64Z_HB
                                AverageMU64_HB

                                                    0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0   0.1    0.2    0.3   0    2   4    6   8 10 12 0   2   4   6    8   10

                                                                                                          NE Regret



                                                        bobom	
  line:	
  	
  
                                                           	
  SC	
  Local	
  most	
  effec<ve	
  overall,	
  robust	
  across	
  environments	
  




                                                                     Scaling	
  #Players	
  


                            log(|G|)	
                                                                                                                 strategic	
  
                                                                                                                                                       complexity	
  




                                                                                                                        #players	
  




PRIMA-­‐12	
                                                                                                                                                                                    8	
  
M.	
  Wellman	
                                                                                6	
  Sep	
  12	
  




                                      Improving	
  Scalability	
  
                    •  Exploit	
  locality	
  of	
  interac<on	
  
                        –  graphical	
  games,	
  MAIDs,	
  
                           ac<on-­‐graph	
  games,	
  …	
  
                    •  Aggregate	
  agents	
  
                        –  hierarchical	
  reduc<on	
  
                           (Wellman	
  et	
  al.	
  AAAI-­‐05)	
  
                        –  clustering	
  (Ficici	
  et	
  al.	
  UAI-­‐08)	
  
                        –  devia<on-­‐preserving	
  
                           reduc<on	
  (Wiedenbeck	
  &	
  W,	
  
                           2012)	
  




                                                   Game	
  Size	
  


                                                                Number	
  of	
  profiles:	
  

                                                                   N +|S| 1
                                                                      N
                    “profile”	
  



PRIMA-­‐12	
                                                                                                 9	
  
M.	
  Wellman	
                                                                                     6	
  Sep	
  12	
  




                                     Player	
  Reduc<on	
  




                                                   ⇡

                                                              [Wellman,	
  et	
  al.	
  2005]	
  
                    Hierarchical	
  Reduc<on	
  




PRIMA-­‐12	
                                                                                                   10	
  
M.	
  Wellman	
                                                                   6	
  Sep	
  12	
  




                                             [Ficici,	
  et	
  al.	
  2008]	
  
                    Twins	
  Reduc<on	
  




                                    vs	
  




                                             [Ficici,	
  et	
  al.	
  2008]	
  
                    Twins	
  Reduc<on	
  




                                    vs	
  




PRIMA-­‐12	
                                                                                 11	
  
M.	
  Wellman	
                                             6	
  Sep	
  12	
  




                    Devia<on-­‐Preserving	
  Reduc<on	
  




                                   vs	
  




                    Devia<on-­‐Preserving	
  Reduc<on	
  




                                   vs	
  




PRIMA-­‐12	
                                                           12	
  
M.	
  Wellman	
                                                                                                                                                      6	
  Sep	
  12	
  




                    6	
                                                                HR	
  
                                                                                             24	
                                                         HR	
  
                                    12p-­‐6s	
  Network	
  Forma<on	
  Game	
                                    100p-­‐2s	
  Conges<on	
  Game	
  
                    5	
                                                                      20	
  
                    4	
                                                              DPR	
  
                                                                                             16	
                                                         DPR	
  
                    3	
                                                                      12	
  
                    2	
                                                                       8	
  
                                                                                     TR	
                                                                 TR	
  
                    1	
                                                                       4	
  
                    0	
                                                                       0	
  
                            0	
               500	
            1000	
             1500	
            0	
             5	
         10	
         15	
           20	
  
                    4	
                  12p-­‐6s	
  Local	
  Effect	
  Game	
                    4	
  
                                                                                       HR	
                       12p-­‐6s	
  Conges<on	
  Game	
  
                                                                                                                                                          HR	
  
                    3	
                                                                          3	
  
                                                                                       DPR	
                                                              DPR	
  
                    2	
                                                                          2	
  
                                                                                       TR	
                                                               TR	
  
                    1	
                                                                          1	
  
                    0	
                                                                          0	
  
                            0	
               500	
            1000	
             1500	
                 0	
       500	
          1000	
              1500	
  




PRIMA-­‐12	
                                                                                                                                                                    13	
  
M.	
  Wellman	
                                                          6	
  Sep	
  12	
  




                                Role-­‐Symmetric	
  Games	
  




                                                                32	
  




                    Hierarchical	
  Reduc<on	
  




PRIMA-­‐12	
                                                                        14	
  
M.	
  Wellman	
                                             6	
  Sep	
  12	
  




                    Devia<on-­‐Preserving	
  Reduc<on	
  




                                     vs	
  




                    Devia<on-­‐Preserving	
  Reduc<on	
  




                                     vs	
  




PRIMA-­‐12	
                                                           15	
  
M.	
  Wellman	
                                                                                                                                                  6	
  Sep	
  12	
  




                                             Itera<ve	
  EGTA	
  Process	
  
                                                                                                                           Game	
  Model	
  
                                                                                                                            Induc<on	
  
                    Sampling	
  Control	
                                                                                    Problem	
  
                       Problem	
  
                                                                                          Payoff	
                             Empirical	
  
                           Profile	
                                Simulator	
  
                                                                                           Data	
                              Game	
  
                            Select	
  

                                                                                                                               Game	
  
                      Profile	
  Space	
                                                                                     Analysis	
  (NE)	
  


                                                                                                        More	
                                      More	
  
                       Strategy	
  Set	
      Add	
  Strategy	
   Strategy	
  Space	
  
                                                                                                      Strategies	
  	
          Refine?	
           Samples	
  

                                                                                                                                      N	
  
                                                     Strategy	
  Explora<on	
  
                                                           Problem	
  
                                                                                                                                   End	
  




                                         Sampling	
  Control	
  Problem	
  
                     •  Revealed	
  payoff	
  model	
  
                            –  sample	
  provides	
  exact	
  payoff	
  
                            –  minimum-­‐regret-­‐first	
  search	
  (MRFS)	
  
                                    •  abempts	
  to	
  refute	
  best	
  current	
  candidate	
  
                     •  Noisy	
  payoff	
  model	
  
                            –  sample	
  drawn	
  from	
  payoff	
  distribu<on	
  
                            –  informa<on	
  gain	
  search	
  (IGS)	
  
                                    •  sample	
  profile	
  maximizing	
  entropy	
  difference	
  wrt	
  
                                       probability	
  of	
  being	
  min-­‐regret	
  profile	
  




PRIMA-­‐12	
                                                                                                                                                                16	
  
M.	
  Wellman	
                                                                                                    6	
  Sep	
  12	
  




                                      Min-­‐Regret-­‐First	
  Search	
  

                                                                                               Profile   ε-bound
                                                        c1       c2       c3        c4         (r1,c1)      0

                       start                  r1       9,5      3,3       2,5      4,8
                    (arbitrary)
                                              r2       6,4      8,8       3,0      5,3

                                              r3       2,2      2,1       3,2      4,6

                                              r4       4,4      2,0       2,2      9,3




                                              Min-­‐Regret	
  Search	
  

                                                                                               Profile   ε-bound
                                                        c1       c2       c3        c4         (r1,c1)      0
                                                                                               (r1,c2)      2
                     evaluated
                                              r1       9,5      3,3       2,5      4,8
                       best
                                              r2       6,4      8,8       3,0      5,3

                                              r3       2,2      2,1       3,2      4,6

                                              r4       4,4      2,0       2,2      9,3

                       Select	
  random	
  devia<on	
  from	
  current	
  best	
  profile	
  




PRIMA-­‐12	
                                                                                                                  17	
  
M.	
  Wellman	
                                                                  6	
  Sep	
  12	
  




                                Min-­‐Regret	
  Search	
  

                                                             Profile   ε-bound
                                     c1    c2    c3    c4    (r1,c1)      0
                                                             (r1,c2)      2
                    evaluated                                (r2,c1)      3
                                r1   9,5   3,3   2,5   4,8
                      best
                                r2   6,4   8,8   3,0   5,3

                                r3   2,2   2,1   3,2   4,6

                                r4   4,4   2,0   2,2   9,3




                                Min-­‐Regret	
  Search	
  

                                                             Profile   ε-bound
                                     c1    c2    c3    c4    (r1,c1)      0
                                                             (r1,c2)      2
                    evaluated                                (r2,c1)      3
                                r1   9,5   3,3   2,5   4,8
                      best                                   (r3,c1)      7

                                r2   6,4   8,8   3,0   5,3

                                r3   2,2   2,1   3,2   4,6

                                r4   4,4   2,0   2,2   9,3




PRIMA-­‐12	
                                                                                18	
  
M.	
  Wellman	
                                                                  6	
  Sep	
  12	
  




                                Min-­‐Regret	
  Search	
  

                                                             Profile   ε-bound
                                     c1    c2    c3    c4    (r1,c1)      3
                                                             (r1,c2)      5
                    evaluated                                (r2,c1)      3
                                r1   9,5   3,3   2,5   4,8
                      best                                   (r3,c1)      7
                                                             (r1,c4)      0
                                r2   6,4   8,8   3,0   5,3

                                r3   2,2   2,1   3,2   4,6

                                r4   4,4   2,0   2,2   9,3




                                Min-­‐Regret	
  Search	
  

                                                             Profile   ε-bound
                                     c1    c2    c3    c4    (r1,c1)      3
                                                             (r1,c2)      5
                    evaluated                                (r2,c1)      3
                                r1   9,5   3,3   2,5   4,8
                      best                                   (r3,c1)      7
                                                             (r1,c4)      1
                                r2   6,4   8,8   3,0   5,3   (r2,c4)      1


                                r3   2,2   2,1   3,2   4,6

                                r4   4,4   2,0   2,2   9,3




PRIMA-­‐12	
                                                                                19	
  
M.	
  Wellman	
                                                                  6	
  Sep	
  12	
  




                                Min-­‐Regret	
  Search	
  

                                                             Profile   ε-bound
                                     c1    c2    c3    c4    (r1,c1)      3
                                                             (r1,c2)      5
                    evaluated                                (r2,c1)      4
                                r1   9,5   3,3   2,5   4,8
                      best                                   (r3,c1)      7
                                                             (r1,c4)      1
                                r2   6,4   8,8   3,0   5,3   (r2,c4)      5
                                                             (r2,c2)      0

                                r3   2,2   2,1   3,2   4,6

                                r4   4,4   2,0   2,2   9,3




                                Min-­‐Regret	
  Search	
  

                                                             Profile   ε-bound
                                     c1    c2    c3    c4    (r1,c1)      3
                                                             (r1,c2)      5
                    evaluated                                (r2,c1)      4
                                r1   9,5   3,3   2,5   4,8
                      best                                   (r3,c1)      7
                                                             (r1,c4)      1
                                r2   6,4   8,8   3,0   5,3   (r2,c4)      5
                                                             (r2,c2)      0
                                                             (r2,c3)      8
                                r3   2,2   2,1   3,2   4,6

                                r4   4,4   2,0   2,2   9,3




PRIMA-­‐12	
                                                                                20	
  
M.	
  Wellman	
                                                                   6	
  Sep	
  12	
  




                                 Min-­‐Regret	
  Search	
  

                                                              Profile   ε-bound
                                      c1    c2    c3    c4    (r1,c1)      3
                                                              (r1,c2)      5
                     evaluated                                (r2,c1)      4
                                 r1   9,5   3,3   2,5   4,8
                       best                                   (r3,c1)      7
                                                              (r1,c4)      1
                                 r2   6,4   8,8   3,0   5,3   (r2,c4)      5
                                                              (r2,c2)      0
                                                              (r2,c3)      8
                                 r3   2,2   2,1   3,2   4,6   (r3,c2)      6

                                 r4   4,4   2,0   2,2   9,3




                                 Min-­‐Regret	
  Search	
  

                                                              Profile   ε-bound
                                      c1    c2    c3    c4    (r1,c1)       3
                                                              (r1,c2)       5
                     evaluated                                (r2,c1)       4
                                 r1   9,5   3,3   2,5   4,8
                       best                                   (r3,c1)       7
                                                              (r1,c4)       1
                                 r2   6,4   8,8   3,0   5,3   (r2,c4)       5
                                                              (r2,c2)      0*
                                                              (r2,c3)       8
                       NE        r3   2,2   2,1   3,2   4,6   (r3,c2)       6
                    Confirmed!                                (r4,c2)       6
                                 r4   4,4   2,0   2,2   9,3




PRIMA-­‐12	
                                                                                 21	
  
M.	
  Wellman	
                                                                                                                                                  6	
  Sep	
  12	
  




                                         Finding	
  Approximate	
  PSNE	
  




                                             Itera<ve	
  EGTA	
  Process	
  
                                                                                                                           Game	
  Model	
  
                                                                                                                            Induc<on	
  
                    Sampling	
  Control	
                                                                                    Problem	
  
                       Problem	
  
                                                                                          Payoff	
                             Empirical	
  
                           Profile	
                                Simulator	
  
                                                                                           Data	
                              Game	
  
                            Select	
  

                                                                                                                               Game	
  
                      Profile	
  Space	
                                                                                     Analysis	
  (NE)	
  


                                                                                                        More	
                                      More	
  
                       Strategy	
  Set	
      Add	
  Strategy	
   Strategy	
  Space	
  
                                                                                                      Strategies	
  	
          Refine?	
           Samples	
  

                                                                                                                                      N	
  
                                                     Strategy	
  Explora<on	
  
                                                           Problem	
  
                                                                                                                                   End	
  




PRIMA-­‐12	
                                                                                                                                                                22	
  
M.	
  Wellman	
                                                                                                                                                  6	
  Sep	
  12	
  




                                                   Construct	
  Empirical	
  Game	
  
                       •  Simplest	
  approach:	
  direct	
  es<ma<on	
  
                                  –  employ	
  control	
  variates	
  and	
  other	
  variance	
  
                                     reduc<on	
  techniques	
  
                                                                                                 Empirical	
  Game	
  

                                                                 (s1,u(s1))	
  
                                                                                         ?	
  
                                                                  ...	
  



                                                                                                        u(•)	
  

                                                                 (sL,u(sL))	
  


                                                  Payoff	
  data	
  from	
  selected	
  profiles	
  




                                                Payoff	
  Func<on	
  Regression	
  
                                          Si	
  =	
  [0,1]	
  
                                                          generate	
  data	
  (simula<ons)	
             FPSB2	
  Example	
  
                                  0	
            0.5	
             1	
  
                       0	
   3,3	
              1,4	
         1,1	
  

                    0.5	
   4,1	
               2,2	
         4,1	
  
                               1,1	
            1,0	
         3,3	
  
                       1	
  
                                                          learn	
  regression	
  



                                                          solve	
  learned	
  game	
  

                               eq	
  =	
  (0.32,0.32)	
  

                                                                                                                   Vorobeychik	
  et	
  al.,	
  ML	
  2007	
  




PRIMA-­‐12	
                                                                                                                                                                23	
  
M.	
  Wellman	
                                                                                                                                                       6	
  Sep	
  12	
  




                               Generaliza<on	
  Risk	
  Approach	
  

                    •  Model	
  varia<ons	
  
                           –  func<onal	
  forms,	
  rela<onship	
                                Cross	
  ValidaHon	
  
                              structures,	
  parameters	
  
                           –  strategy	
  granularity	
                                                 Observa#on	
  Data	
  
                    •  Approach:	
  
                           –  Treat	
  candidate	
  game	
  model                         Fold	
  1	
             Fold	
  2	
                 Fold	
  3	
  
                              as	
  a	
  predictor	
  for	
  payoff	
  data	
  
                           –  Adopt	
  loss	
  func<on	
  for	
  
                              predictor	
                                                         Training	
                          Valida#on	
  
                           –  Select	
  model	
  candidate	
  
                              minimizing	
  expected	
  loss	
  
                                                                                                          Jordan	
  et	
  al.,	
  AAMAS-­‐09	
  




                                             Itera<ve	
  EGTA	
  Process	
  
                                                                                                                           Game	
  Model	
  
                                                                                                                            Induc<on	
  
                    Sampling	
  Control	
                                                                                    Problem	
  
                       Problem	
  
                                                                                          Payoff	
                             Empirical	
  
                           Profile	
                                Simulator	
  
                                                                                           Data	
                              Game	
  
                            Select	
  

                                                                                                                               Game	
  
                      Profile	
  Space	
                                                                                     Analysis	
  (NE)	
  


                                                                                                        More	
                                           More	
  
                       Strategy	
  Set	
      Add	
  Strategy	
   Strategy	
  Space	
  
                                                                                                      Strategies	
  	
          Refine?	
                Samples	
  

                                                                                                                                      N	
  
                                                     Strategy	
  Explora<on	
  
                                                           Problem	
  
                                                                                                                                   End	
  




PRIMA-­‐12	
                                                                                                                                                                     24	
  
M.	
  Wellman	
                                                                                                                                                                         6	
  Sep	
  12	
  




                    Learning	
  New	
  Strategies:	
  EGTA+RL	
  

                                                                                                               Payoff	
                              Empirical	
  
                        Profile	
                                             Simulator	
  
                                                                                                                Data	
                               Game	
  
                          Select	
  

                                                                                                                                                     Game	
  
                    Profile	
  Space	
                                                 Online	
  
                                                                                                                                                  Analysis	
  (NE)	
  
                                                                                     Learning	
  


                                                                New	
   RL:	
  Best	
  response	
                              More	
                                      More	
  
                    Strategy	
  Set	
                         Strategy	
         to	
  NE	
                                  Strategies	
  	
         Refine?	
            Samples	
  


                              Add	
  new	
                                                                                                                   N	
  
                              Strategy	
  
                                                                                                                     Y	
  


                                       Y	
                                   N	
                            Improve	
                  N	
  
                                                     Deviates?	
                                           RL	
  Model?	
                                End	
  




                                CDA	
  Learning	
  Problem	
  Setup	
  

                                                                      H1:	
  Moving	
  average	
  	
  
                                                                      H2:	
  Frequency	
  weighted	
  ra<o,	
                                         Ac<ons	
  
                                               History	
  of	
  
                                                recent	
              threshold=	
  V	
  
                                                                                                                                               A:	
  Offset	
  from	
  V	
  
                                                trades	
              H3:	
  Frequency	
  weighted	
  ra<o,	
  
                                                                      threshold=	
  A	
  
                                                                      	
  
                                                 Quotes	
             Q1:	
  Opposite	
  role	
  
                    State	
                                           Q2:	
  Same	
  role	
  
                    Space	
                                           	
                                                                              Rewards	
  
                                                                      T1:	
  Total	
  
                                                   Time	
  
                                                                      T2:	
  Since	
  last	
  trade	
  
                                                                      	
                                                                       R:	
  Difference	
  between	
  
                                                                      U:	
  Number	
  of	
  trades	
  le`	
                                    unit	
  valua<on	
  and	
  trade	
  
                                                Pending	
  
                                                                      V:	
  Value	
  of	
  next	
  unit	
  to	
  be	
  traded	
                price	
  
                                                 Trades	
  




PRIMA-­‐12	
                                                                                                                                                                                       25	
  
M.	
  Wellman	
                                                                                                    6	
  Sep	
  12	
  




                                           EGTA/RL	
  Round	
  1	
  

                    Strategies	
     Payoff	
             NE	
                            Learning	
  
                                                                            Strategy	
             Dev.	
  
                                                                                                  Payoff	
  
                    Kaplan	
  
                    ZI	
             248.1	
      1.000	
  	
  	
  ZI	
         L1	
              268.7	
  
                    ZIbtq	
  
                    L1	
             242.5	
     1.000	
  	
  	
  L1	
  




                                           EGTA/RL	
  Round	
  2	
  

                    Strategies	
     Payoff	
             NE	
                            Learning	
  
                                                                            Strategy	
             Dev.	
  
                                                                                                  Payoff	
  
                    Kaplan	
  
                    ZI	
             248.1	
      1.000	
  	
  	
  ZI	
         L1	
              268.7	
  
                    ZIbtq	
  
                    L1	
             242.5	
     1.000	
  	
  	
  L1	
  
                    ZIP	
            248.0	
     1.000	
  	
  	
  ZIP	
  
                                                                              L2-­‐L8	
            -­‐-­‐-­‐	
  
                    GD	
             248.6	
     1.000	
  	
  	
  GD	
  
                                                                               L9	
               251.8	
  
                                                 0.531	
  	
  	
  GD	
         L10	
              252.1	
  
                    L9	
             246.1	
  
                                                 0.469	
  	
  	
  L9	
  




PRIMA-­‐12	
                                                                                                                  26	
  
M.	
  Wellman	
                                                                                                                             6	
  Sep	
  12	
  




                                          EGTA/RL	
  Rounds	
  3+	
  

                       Strategies	
     Payoff	
              NE	
                            Learning	
  
                                                                                Strategy	
   Dev.	
  Payoff	
  
                       …	
                 …	
                …	
                   …	
                     …	
  

                       L10	
             248.0	
     0.191	
  	
  	
  GD	
         L11	
              251.0	
  
                                                     0.809	
  	
  	
  L10	
  
                       L11	
             246.2	
     1.000	
  	
  	
  L11	
  

                       GDX	
             245.8	
     0.192	
  	
  	
  GDX	
        L12	
              248.3	
  
                                                     0.808	
  	
  	
  L11	
  
                       L12	
             245.8	
     0.049	
  	
  	
  L11	
        L13	
              245.9	
  
                                                     0.951	
  	
  	
  L12	
  
                                                                                                                    Final	
  champion	
  
                       L13	
             245.6	
     0.872	
  	
  	
  L12	
        L14	
              245.6	
  
                                                     0.128	
  	
  	
  L13	
  
                       RB	
              245.6	
     0.872	
  	
  	
  L12	
  
                                                     0.128	
  	
  	
  L13	
  




                                Strategy	
  Explora<on	
  Problem	
  
                    •  Premise:	
  
                        –  Limited	
  ability	
  to	
  cover	
  profile	
  space	
  
                        –  Expecta<on	
  to	
  reasonably	
  evaluate	
  all	
  considered	
  
                           strategies	
  
                    •  Need	
  deliberate	
  policy	
  to	
  decide	
  which	
  strategies	
  
                       to	
  introduce	
  
                    •  RL	
  for	
  strategy	
  explora<on	
  
                        –  abempt	
  at	
  best	
  response	
  to	
  current	
  equilibrium	
  
                        –  is	
  this	
  a	
  good	
  heuris<c	
  (even	
  assuming	
  ideal	
  BR	
  calc?)	
  




PRIMA-­‐12	
                                                                                                                                           27	
  
M.	
  Wellman	
                                                                                                                              6	
  Sep	
  12	
  




                                                       Example"
                    Introduce	
  strategies	
  in	
  order:	
                        A1	
          A2	
          A3	
          A4	
  
                    A1,	
  A2,	
  A3,	
  A4	
                              A1	
      1,	
  1	
     1,	
  2	
     1,	
  3	
     1,	
  4	
  
                                                                           A2	
      2,	
  1	
     2,	
  2	
     2,	
  3	
     2,	
  6	
  
                    Regret may increase                                    A3	
      3,	
  1	
     3,	
  2	
     3,	
  3	
     3,	
  8	
  
                    over subsequent steps!"                                A4	
      4,	
  1	
     6,	
  2	
     8,	
  3	
     4,	
  4	
  


                       Strategy	
  Set	
           Candidate	
  Eq.	
            Regret	
  wrt	
  True	
  Game	
  
                           {A1}	
                      (A1,A1)	
                                      3	
  
                         {A1,A2}	
                     (A2,A2)	
                                      4	
  
                       {A1,A2,A3}	
                    (A3,A3)	
                                      5	
  
                     {A1,A2,A3,A4}	
                   (A4,A4)	
                                      0	
  




                                           FPSB2	
  Regret	
  Surface	
  
                                              BR                                                                         0.1
                                              E(DEV)
                                    0.14      DEV [MESH]                                                                 0.09
                                    0.12
                                                                                                                         0.08
                                     0.1
                                                                                                                         0.07
                                    0.08
                            !(kj)




                                                                                                                         0.06
                                    0.06
                                                                                                                         0.05
                                    0.04
                                                                                                                         0.04
                                    0.02
                                                                                                                         0.03
                                      0
                                      0                                                                                  0.02

                                             0.5                                                                         0.01
                                                                                                                 0
                                             kj                                                    0.2
                                                                                      0.4
                                                                           0.6
                                                        1            0.8
                                                            1
                                                                           ki




PRIMA-­‐12	
                                                                                                                                            28	
  
M.	
  Wellman	
                                                                                                                  6	
  Sep	
  12	
  




                                                      Explora<on	
  Policies	
  
                               •  RND:	
  Random	
  (uniform)	
  selec<on	
  
                               •  Devia<on-­‐Based	
  
                                       –  DEV:	
  Uniform	
  among	
  strategies	
  that	
  deviate	
  from	
  current	
  
                                          equilibrium	
  
                                       –  BR:	
  Best	
  response	
  to	
  current	
  equilibrium	
  
                                       –  BR+DEV:	
  Alternate	
  on	
  successive	
  itera<ons	
  
                                       –  ST(t):	
  So`max	
  selec<on	
  among	
  deviators,	
  propor<onal	
  to	
  gain	
  
                               •  MEMT:	
  	
  
                                  –  Select	
  strategy	
  that	
  maximizes	
  the	
  gain	
  (regret)	
  from	
  
                                     devia<ng	
  to	
  a	
  strategy	
  outside	
  the	
  set	
  from	
  any	
  
                                     mixture	
  over	
  the	
  set.	
  




                                                                      CDA↓4"
                                      103                                                                       MEMT
                                                                                                                DEV
                    Expected Regret




                                      102                                                                       RND
                                                                                                                BR
                                      101                                                                       ST10
                                                                                                                ST1
                                                                                                                ST0.1
                                      100

                                      10!1

                                      10!2

                                      10!3


                                                1           3            5            7           9           11          13

                                                                                  Step




PRIMA-­‐12	
                                                                                                                                29	
  
M.	
  Wellman	
                                                                                                      6	
  Sep	
  12	
  




                                          EGTA	
  Applica<ons	
  
                    •  Market	
  games	
  
                        –  TAC:	
  Travel,	
  Supply	
  Chain,	
  Ad	
  Auc<on	
  
                        –  Canonical	
  auc<ons:	
  SimAAs,	
  CDAs,	
  SimSPSBs,…	
  
                        –  Equity	
  premium	
  in	
  financial	
  trading	
  
                    •  Other	
  domains	
  
                        –  Privacy:	
  informa<on	
  sharing	
  abacks	
  
                        –  Networking:	
  rou<ng,	
  wireless	
  AP	
  selec<on	
  
                        –  Credit	
  network	
  forma<on	
  
                    •  Mechanism	
  design	
  




                       Conclusion:	
  EGTA	
  Methodology	
  
                    •  Extends	
  scope	
  of	
  GT	
  to	
  procedurally	
  defined	
  
                       scenarios	
  
                    •  Embraces	
  sta<s<cal	
  underpinnings	
  of	
  strategic	
  
                       reasoning	
  
                    •  Search	
  process:	
  
                        –  GT	
  for	
  establishing	
  salient	
  strategic	
  context	
  
                        –  Strategy	
  explora<on:	
  	
  
                            •  e.g.,	
  RL	
  to	
  search	
  for	
  best	
  response	
  to	
  that	
  context	
  
                        → Principled	
  approach	
  to	
  evaluate	
  complex	
  strategy	
  
                         spaces	
  
                    •  Growing	
  toolbox	
  of	
  EGTA	
  techniques	
  




PRIMA-­‐12	
                                                                                                                    30	
  

More Related Content

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Empirical Game-Theoretic Analysis for Practical Strategic Reasoning

  • 1. M.  Wellman   6  Sep  12   Empirical  Game-­‐Theore<c  Analysis   for  Prac<cal  Strategic  Reasoning   Michael  P.  Wellman   University  of  Michigan   Planning  in  Strategic  Environments   •  Planning  problem   –  find  agent  behavior  sa<sfying/op<mizing  objec<ves   wrt  environment   –  strives  for  ra<onality   Agent1   Environment   Agent2     •  When  environment  contains  other  agents   –  model  them  as  ra#onal  planners  as  well   –  problem  is  a  game   –  search  now  mul<-­‐dimensional,  different  (global)   objec<ve   PRIMA-­‐12   1  
  • 2. M.  Wellman   6  Sep  12   Real-­‐World  Games   complex  dynamics  and  uncertainty       •  rich  strategy  space   –  strategy:  obs*  ×  <me       ac<on     •  severely  incomplete   two  approaches   informa<on   1.  analyze  (stylized)   –  interdependent  types  (signals)   approxima<ons   –  info  par<ally  revealed  over   –  one-­‐shot,  complete  info…   <me   2.  simula<on-­‐based   ➙ analy<c  game-­‐theore<c   methods   solu<ons  few  and  far   –  search   between   –  empirical:  sta<s<cs,   machine  learning,…   Empirical  Game-­‐Theore<c  Analysis   (EGTA)   •  Game  described  procedurally,  no  directly   usable  analy<cal  form   •  Parametrize  strategy  space  based  on  agent   architecture   •  Selec<vely  explore  strategy/profile  space   •  Induce  game  model  (payoff  func<on)  from   simula<on  data   Empirical  game   PRIMA-­‐12   2  
  • 3. M.  Wellman   6  Sep  12   EGTA  Process   2.  Es<mate  empirical  game   Payoff   Empirical   Profile   Simulator   Data   Game   Game   Profile  Space   Analysis  (NE)   3.  Solve  empirical  game   Strategy  Set   1.  Parametrize  strategy  space   Simula<on-­‐Based  Game  Modeling   …   5,1   0,2   6,8   PRIMA-­‐12   3  
  • 4. M.  Wellman   6  Sep  12   TAC  Supply  Chain  Mgmt  Game   suppliers manufacturers Pintel CPU Manufacturer 1 IMD component RFQs Manufacturer 2 Basus Motherboard PC RFQs Macrostar supplier offers Manufacturer 3 PC bids customer PC orders Memory Mec Manufacturer 4 component Queenmax orders Manufacturer 5 Watergate 10 component types Hard Disk 16 PC types Manufacturer 6 Mintor 220 simulation days 15 seconds per day Two-­‐Strategy  Game  (Unpreempted)   PRIMA-­‐12   4  
  • 5. M.  Wellman   6  Sep  12   Two-­‐Strategy  Game  (Unpreempted)   Three-­‐Strategy  Game:  Devia<ons   PRIMA-­‐12   5  
  • 6. M.  Wellman   6  Sep  12   Ranking  Strategies   •  O`en  want  to  know:  which  is  “beber”,   strategy  A  or  strategy  B?   •  Problem:     –  Depends  on  what  other  agents  do   –  Cannot  evaluate  independent  of  strategic  context   •  Which  context?   –  Self-­‐play   –  Fixed  propor<ons  of  other  agents   –  Equilibrium  (NE  Regret)   Ranking  Strategies:  TAC/SCM-­‐07   SCM-­‐07  Tournament   SCM-­‐07  EGTA   from  PR  Jordan  PhD  Thesis,  2009   PRIMA-­‐12   6  
  • 7. M.  Wellman   6  Sep  12   Strategy  Ranking  (TAC  Travel)   50 Strategies  ranked  with   24 49 42 respect  to  the  final   43 5 47 20 equilibrium  context   31 40 44 17 9 3 25 from  LJ  Schvartzman  PhD  Thesis,  2009   7 18 39 30 16 26 4 32 28 22 8 6 27 29 19 45 46 23 10 35 36 34 37 15 41 21 14 38 33 11 12 13 1 2 −1400 −1200 −1000 −800 −600 −400 −200 0 200 Deviation Gain Strategy  Ranking  (CDA)   strategy   NE1  regret   NE2  regret   symm.   profile  payoff   GDX   0   1.32   247.98   GD   0.49   3.26   248.57   RB   2.20   8.64   248.08   ZIP   2.90   9.86   247.95   Kaplan   4.56   24.55   2.02   ZIbtq   14.67   17.44   247.45   ZI   16.42   16.82   248.07   PRIMA-­‐12   7  
  • 8. M.  Wellman   6  Sep  12   Strategy  Ranking  (SimSPSB)   SC  Local:  Heuris<c  search  for  op<mal  bid  in  response  to   self-­‐confirming  prices   SC Local SC BidEval Local BidEval AvgMU U[6,4] U[5,5] U[5,8] H[5,3] H[5,5] SCLocalBidSearchS5K6_HB SCLocalBidSearch_K16Z_HB SCLocalBidSearch_K16_HB SCBidXEvaluatorMixA_K16_HB SCBidEvaluatorMixA_K16_HB LocalBidSearch_K16_HB BidXEvaluatorMixA_K16_HB BidXEvaluatorMix3_K16_HB BidEvaluatorMixA BidEvaluatorMix_E8S32K8_HB AverageMU64Z_HB AverageMU64_HB 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0 2 4 6 8 10 12 0 2 4 6 8 10 NE Regret bobom  line:      SC  Local  most  effec<ve  overall,  robust  across  environments   Scaling  #Players   log(|G|)   strategic   complexity   #players   PRIMA-­‐12   8  
  • 9. M.  Wellman   6  Sep  12   Improving  Scalability   •  Exploit  locality  of  interac<on   –  graphical  games,  MAIDs,   ac<on-­‐graph  games,  …   •  Aggregate  agents   –  hierarchical  reduc<on   (Wellman  et  al.  AAAI-­‐05)   –  clustering  (Ficici  et  al.  UAI-­‐08)   –  devia<on-­‐preserving   reduc<on  (Wiedenbeck  &  W,   2012)   Game  Size   Number  of  profiles:   N +|S| 1 N “profile”   PRIMA-­‐12   9  
  • 10. M.  Wellman   6  Sep  12   Player  Reduc<on   ⇡ [Wellman,  et  al.  2005]   Hierarchical  Reduc<on   PRIMA-­‐12   10  
  • 11. M.  Wellman   6  Sep  12   [Ficici,  et  al.  2008]   Twins  Reduc<on   vs   [Ficici,  et  al.  2008]   Twins  Reduc<on   vs   PRIMA-­‐12   11  
  • 12. M.  Wellman   6  Sep  12   Devia<on-­‐Preserving  Reduc<on   vs   Devia<on-­‐Preserving  Reduc<on   vs   PRIMA-­‐12   12  
  • 13. M.  Wellman   6  Sep  12   6   HR   24   HR   12p-­‐6s  Network  Forma<on  Game   100p-­‐2s  Conges<on  Game   5   20   4   DPR   16   DPR   3   12   2   8   TR   TR   1   4   0   0   0   500   1000   1500   0   5   10   15   20   4   12p-­‐6s  Local  Effect  Game   4   HR   12p-­‐6s  Conges<on  Game   HR   3   3   DPR   DPR   2   2   TR   TR   1   1   0   0   0   500   1000   1500   0   500   1000   1500   PRIMA-­‐12   13  
  • 14. M.  Wellman   6  Sep  12   Role-­‐Symmetric  Games   32   Hierarchical  Reduc<on   PRIMA-­‐12   14  
  • 15. M.  Wellman   6  Sep  12   Devia<on-­‐Preserving  Reduc<on   vs   Devia<on-­‐Preserving  Reduc<on   vs   PRIMA-­‐12   15  
  • 16. M.  Wellman   6  Sep  12   Itera<ve  EGTA  Process   Game  Model   Induc<on   Sampling  Control   Problem   Problem   Payoff   Empirical   Profile   Simulator   Data   Game   Select   Game   Profile  Space   Analysis  (NE)   More   More   Strategy  Set   Add  Strategy   Strategy  Space   Strategies     Refine?   Samples   N   Strategy  Explora<on   Problem   End   Sampling  Control  Problem   •  Revealed  payoff  model   –  sample  provides  exact  payoff   –  minimum-­‐regret-­‐first  search  (MRFS)   •  abempts  to  refute  best  current  candidate   •  Noisy  payoff  model   –  sample  drawn  from  payoff  distribu<on   –  informa<on  gain  search  (IGS)   •  sample  profile  maximizing  entropy  difference  wrt   probability  of  being  min-­‐regret  profile   PRIMA-­‐12   16  
  • 17. M.  Wellman   6  Sep  12   Min-­‐Regret-­‐First  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 0 start r1 9,5 3,3 2,5 4,8 (arbitrary) r2 6,4 8,8 3,0 5,3 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 0 (r1,c2) 2 evaluated r1 9,5 3,3 2,5 4,8 best r2 6,4 8,8 3,0 5,3 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 Select  random  devia<on  from  current  best  profile   PRIMA-­‐12   17  
  • 18. M.  Wellman   6  Sep  12   Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 0 (r1,c2) 2 evaluated (r2,c1) 3 r1 9,5 3,3 2,5 4,8 best r2 6,4 8,8 3,0 5,3 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 0 (r1,c2) 2 evaluated (r2,c1) 3 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 r2 6,4 8,8 3,0 5,3 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 PRIMA-­‐12   18  
  • 19. M.  Wellman   6  Sep  12   Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 3 (r1,c2) 5 evaluated (r2,c1) 3 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 (r1,c4) 0 r2 6,4 8,8 3,0 5,3 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 3 (r1,c2) 5 evaluated (r2,c1) 3 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 (r1,c4) 1 r2 6,4 8,8 3,0 5,3 (r2,c4) 1 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 PRIMA-­‐12   19  
  • 20. M.  Wellman   6  Sep  12   Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 3 (r1,c2) 5 evaluated (r2,c1) 4 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 (r1,c4) 1 r2 6,4 8,8 3,0 5,3 (r2,c4) 5 (r2,c2) 0 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 3 (r1,c2) 5 evaluated (r2,c1) 4 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 (r1,c4) 1 r2 6,4 8,8 3,0 5,3 (r2,c4) 5 (r2,c2) 0 (r2,c3) 8 r3 2,2 2,1 3,2 4,6 r4 4,4 2,0 2,2 9,3 PRIMA-­‐12   20  
  • 21. M.  Wellman   6  Sep  12   Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 3 (r1,c2) 5 evaluated (r2,c1) 4 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 (r1,c4) 1 r2 6,4 8,8 3,0 5,3 (r2,c4) 5 (r2,c2) 0 (r2,c3) 8 r3 2,2 2,1 3,2 4,6 (r3,c2) 6 r4 4,4 2,0 2,2 9,3 Min-­‐Regret  Search   Profile ε-bound c1 c2 c3 c4 (r1,c1) 3 (r1,c2) 5 evaluated (r2,c1) 4 r1 9,5 3,3 2,5 4,8 best (r3,c1) 7 (r1,c4) 1 r2 6,4 8,8 3,0 5,3 (r2,c4) 5 (r2,c2) 0* (r2,c3) 8 NE r3 2,2 2,1 3,2 4,6 (r3,c2) 6 Confirmed! (r4,c2) 6 r4 4,4 2,0 2,2 9,3 PRIMA-­‐12   21  
  • 22. M.  Wellman   6  Sep  12   Finding  Approximate  PSNE   Itera<ve  EGTA  Process   Game  Model   Induc<on   Sampling  Control   Problem   Problem   Payoff   Empirical   Profile   Simulator   Data   Game   Select   Game   Profile  Space   Analysis  (NE)   More   More   Strategy  Set   Add  Strategy   Strategy  Space   Strategies     Refine?   Samples   N   Strategy  Explora<on   Problem   End   PRIMA-­‐12   22  
  • 23. M.  Wellman   6  Sep  12   Construct  Empirical  Game   •  Simplest  approach:  direct  es<ma<on   –  employ  control  variates  and  other  variance   reduc<on  techniques   Empirical  Game   (s1,u(s1))   ?   ...   u(•)   (sL,u(sL))   Payoff  data  from  selected  profiles   Payoff  Func<on  Regression   Si  =  [0,1]   generate  data  (simula<ons)   FPSB2  Example   0   0.5   1   0   3,3   1,4   1,1   0.5   4,1   2,2   4,1   1,1   1,0   3,3   1   learn  regression   solve  learned  game   eq  =  (0.32,0.32)   Vorobeychik  et  al.,  ML  2007   PRIMA-­‐12   23  
  • 24. M.  Wellman   6  Sep  12   Generaliza<on  Risk  Approach   •  Model  varia<ons   –  func<onal  forms,  rela<onship   Cross  ValidaHon   structures,  parameters   –  strategy  granularity   Observa#on  Data   •  Approach:   –  Treat  candidate  game  model Fold  1   Fold  2   Fold  3   as  a  predictor  for  payoff  data   –  Adopt  loss  func<on  for   predictor   Training   Valida#on   –  Select  model  candidate   minimizing  expected  loss   Jordan  et  al.,  AAMAS-­‐09   Itera<ve  EGTA  Process   Game  Model   Induc<on   Sampling  Control   Problem   Problem   Payoff   Empirical   Profile   Simulator   Data   Game   Select   Game   Profile  Space   Analysis  (NE)   More   More   Strategy  Set   Add  Strategy   Strategy  Space   Strategies     Refine?   Samples   N   Strategy  Explora<on   Problem   End   PRIMA-­‐12   24  
  • 25. M.  Wellman   6  Sep  12   Learning  New  Strategies:  EGTA+RL   Payoff   Empirical   Profile   Simulator   Data   Game   Select   Game   Profile  Space   Online   Analysis  (NE)   Learning   New   RL:  Best  response   More   More   Strategy  Set   Strategy   to  NE   Strategies     Refine?   Samples   Add  new   N   Strategy   Y   Y   N   Improve   N   Deviates?   RL  Model?   End   CDA  Learning  Problem  Setup   H1:  Moving  average     H2:  Frequency  weighted  ra<o,   Ac<ons   History  of   recent   threshold=  V   A:  Offset  from  V   trades   H3:  Frequency  weighted  ra<o,   threshold=  A     Quotes   Q1:  Opposite  role   State   Q2:  Same  role   Space     Rewards   T1:  Total   Time   T2:  Since  last  trade     R:  Difference  between   U:  Number  of  trades  le`   unit  valua<on  and  trade   Pending   V:  Value  of  next  unit  to  be  traded   price   Trades   PRIMA-­‐12   25  
  • 26. M.  Wellman   6  Sep  12   EGTA/RL  Round  1   Strategies   Payoff   NE   Learning   Strategy   Dev.   Payoff   Kaplan   ZI   248.1   1.000      ZI   L1   268.7   ZIbtq   L1   242.5   1.000      L1   EGTA/RL  Round  2   Strategies   Payoff   NE   Learning   Strategy   Dev.   Payoff   Kaplan   ZI   248.1   1.000      ZI   L1   268.7   ZIbtq   L1   242.5   1.000      L1   ZIP   248.0   1.000      ZIP   L2-­‐L8   -­‐-­‐-­‐   GD   248.6   1.000      GD   L9   251.8   0.531      GD   L10   252.1   L9   246.1   0.469      L9   PRIMA-­‐12   26  
  • 27. M.  Wellman   6  Sep  12   EGTA/RL  Rounds  3+   Strategies   Payoff   NE   Learning   Strategy   Dev.  Payoff   …   …   …   …   …   L10   248.0   0.191      GD   L11   251.0   0.809      L10   L11   246.2   1.000      L11   GDX   245.8   0.192      GDX   L12   248.3   0.808      L11   L12   245.8   0.049      L11   L13   245.9   0.951      L12   Final  champion   L13   245.6   0.872      L12   L14   245.6   0.128      L13   RB   245.6   0.872      L12   0.128      L13   Strategy  Explora<on  Problem   •  Premise:   –  Limited  ability  to  cover  profile  space   –  Expecta<on  to  reasonably  evaluate  all  considered   strategies   •  Need  deliberate  policy  to  decide  which  strategies   to  introduce   •  RL  for  strategy  explora<on   –  abempt  at  best  response  to  current  equilibrium   –  is  this  a  good  heuris<c  (even  assuming  ideal  BR  calc?)   PRIMA-­‐12   27  
  • 28. M.  Wellman   6  Sep  12   Example" Introduce  strategies  in  order:   A1   A2   A3   A4   A1,  A2,  A3,  A4   A1   1,  1   1,  2   1,  3   1,  4   A2   2,  1   2,  2   2,  3   2,  6   Regret may increase A3   3,  1   3,  2   3,  3   3,  8   over subsequent steps!" A4   4,  1   6,  2   8,  3   4,  4   Strategy  Set   Candidate  Eq.   Regret  wrt  True  Game   {A1}   (A1,A1)   3   {A1,A2}   (A2,A2)   4   {A1,A2,A3}   (A3,A3)   5   {A1,A2,A3,A4}   (A4,A4)   0   FPSB2  Regret  Surface   BR 0.1 E(DEV) 0.14 DEV [MESH] 0.09 0.12 0.08 0.1 0.07 0.08 !(kj) 0.06 0.06 0.05 0.04 0.04 0.02 0.03 0 0 0.02 0.5 0.01 0 kj 0.2 0.4 0.6 1 0.8 1 ki PRIMA-­‐12   28  
  • 29. M.  Wellman   6  Sep  12   Explora<on  Policies   •  RND:  Random  (uniform)  selec<on   •  Devia<on-­‐Based   –  DEV:  Uniform  among  strategies  that  deviate  from  current   equilibrium   –  BR:  Best  response  to  current  equilibrium   –  BR+DEV:  Alternate  on  successive  itera<ons   –  ST(t):  So`max  selec<on  among  deviators,  propor<onal  to  gain   •  MEMT:     –  Select  strategy  that  maximizes  the  gain  (regret)  from   devia<ng  to  a  strategy  outside  the  set  from  any   mixture  over  the  set.   CDA↓4" 103 MEMT DEV Expected Regret 102 RND BR 101 ST10 ST1 ST0.1 100 10!1 10!2 10!3 1 3 5 7 9 11 13 Step PRIMA-­‐12   29  
  • 30. M.  Wellman   6  Sep  12   EGTA  Applica<ons   •  Market  games   –  TAC:  Travel,  Supply  Chain,  Ad  Auc<on   –  Canonical  auc<ons:  SimAAs,  CDAs,  SimSPSBs,…   –  Equity  premium  in  financial  trading   •  Other  domains   –  Privacy:  informa<on  sharing  abacks   –  Networking:  rou<ng,  wireless  AP  selec<on   –  Credit  network  forma<on   •  Mechanism  design   Conclusion:  EGTA  Methodology   •  Extends  scope  of  GT  to  procedurally  defined   scenarios   •  Embraces  sta<s<cal  underpinnings  of  strategic   reasoning   •  Search  process:   –  GT  for  establishing  salient  strategic  context   –  Strategy  explora<on:     •  e.g.,  RL  to  search  for  best  response  to  that  context   → Principled  approach  to  evaluate  complex  strategy   spaces   •  Growing  toolbox  of  EGTA  techniques   PRIMA-­‐12   30