SlideShare a Scribd company logo
1 of 23
Download to read offline
EECS 463 Course Project   1




            ADAPTIVE LEARNING IN
            GAMES
3/11/2010   Suvarup Saha
Outline
2


     Motivation
     Games
     Learning in Games
     Adaptive Learning
       Example
     Gradient Techniques
     Conclusion


                           EECS 463 Course Project   3/11/2010
Motivation
3


     Adaptive Filtering Techniques generalize to a lot of
     applications outside
       Gradient Based iterative search
       Stochastic Gradient
       Least Squares
     Application of Game Theory in less than rational multi-
     agent scenarios demand self-learning mechanisms
     Adaptive techniques can be applied in such instances to
     help the agents learn the game and play intelligently

                             EECS 463 Course Project   3/11/2010
Games
4


     A game is an interaction between two or more self-interested
     agents
     Each agent chooses a strategy si from a set of strategies, Si
     A (joint) strategy profile, s, is the set of chosen strategies, also
     called an outcome of the game in a single play
     Each agent has a utility function, ui(s), specifying their
     preference for each outcome in terms of a payoff
     An agent’s best response is the strategy with the highest
     payoff, given its opponents choice of strategy
     A Nash equilibrium is a strategy profile such that every
     agent’s strategy is a best response to others’ choice of strategy

                                 EECS 463 Course Project   3/11/2010
A Normal Form Game
5

                                  B

                           b1           b2

                A     a1   4,4          5,2
                      a2   0,1          4,3


     This is a 2 player game with SA={a1,a2}, SB={b1,b2}
     The ui(s) are explicitly given in a matrix form, for
     example uA(a1, b2) = 5, uB(a1, b2) = 2
     The best response of A to B playing b2 is a1
     In this game, (a1, b1) is the unique Nash Equilibrium
                                EECS 463 Course Project   3/11/2010
Learning in Games
6


     Classical Approach: Compute an optimal/equilibrium
     strategy
     Some criticisms to this approach are
       Other agents’ utilities might be unknown to an agent for
       computing an equilibrium strategy
       Other agents might not be playing an equilibrium strategy
       Computing an equilibrium strategy might be hard
      Another Approach: Learn how to ‘optimally’ play a game
     by
       playing it many times
       updating strategy based on experience
                              EECS 463 Course Project   3/11/2010
Learning Dynamics
7




                                                  Rationality/Sophistication of agents



        Evolutionary        Adaptive             Bayesian
        Dynamics            Learning             Learning




                       Focus of Our Discussion




                                   EECS 463 Course Project   3/11/2010
Evolutionary Dynamics
8

     Inspired by Evolutionary Biology with no appeal to
     rationality of the agents
     Entire population of agents all programmed to use some
     strategy
        Players are randomly matched to play with each other
     Strategies with high payoff spread within the population by
       Learning
       copying or inheriting strategies – Replicator Dynamics
       Infection
     Stability analysis – Evolutionary Stable Strategies (ESS)
       Players playing an ESS must have strictly higher payoffs than a
       small group of invaders playing a different strategy

                                EECS 463 Course Project   3/11/2010
Bayesian Learning
9


     Assumes ‘informed agents’ playing repeated games
     with a finite action space
     Payoffs depend on some characteristics of agents
     represented by types – each agent’s type is private
     information
     The agents’ initial beliefs are given by a common prior
     distribution over agent types
     This belief is updated according to Bayes’ Rule to a
     posterior distribution with each stage of the game.
     In every finite Bayesian game, there is at least one
     Bayesian Nash equilibrium, possibly in mixed strategies

                            EECS 463 Course Project   3/11/2010
Adaptive Learning
10

      Agents are not fully rational, but can learn through
      experience and adapt their strategies
      Agents do not know the reward structure of the game
      Agents are only able to take actions and observe their own
      rewards (or oppnents’ rewards as well)
      Popular Examples
        Best Response Update
        Fictitious Play
        Regret Matching
        Infinitesimal Gradient Ascent (IGA)
        Dynamic Gradient Play
        Adaptive Play Q-learning

                                  EECS 463 Course Project   3/11/2010
Fictitious Play
11


      The learning process is used to develop a ‘historical
      distribution’ of the other agents’ play
      In fictitious play, agent i has an exogenous initial weight
      function       kit: S-i R+
      Weight is updated by adding 1 to the weight of each
      opponent strategy, each time it is played
      The probability that player i assigns to player -i
      playing s-i at date t is given by
                  qit(s-i) = kit(s-i) / Σ kit(s-i)
      The ‘best response’ of the agent i in this fictitious play is
      given by
                 sit+1 = arg max Σ qit(s-i)ui(si, s-it)

                                     EECS 463 Course Project   3/11/2010
An Example
12

      Consider the same 2x2 game example as before
                                                                                     B
      Suppose we assign                                                        b1        b2
          kA0 (b1)= kA0 (b2)= kB0 (a1)= kB0 (a2)= 1                   A   a1   4,4       5,2
      Then, qA0 (b1)= qA0 (b2)= qB0 (a1)= qB0 (a2)= 0.5
                                                                          a2   0,1       4,3
      For A, if A chooses a1
            qA0(b1)uA(a1, b1) + qA0(b2)uA(a1, b2) = .5*4+.5*5 = 4.5
      while if A chooses a2
            qA0(b1)uA(a2, b1) + qA0(b2)uA(a2, b2) = .5*0+.5*4 = 2
      For B, if B chooses b1
            qB0(a1)uB(a1, b1) + qB0(a2)uB(a2, b1) = .5*4+.5*1 = 2.5
      while if B chooses b2
            qB0(a1)uB(a1, b2) + qB0(a2)uB(a2, b2) = .5*2+.5*3 = 2.5
      Clearly, A plays a1 , B can choose either b1 or b2; assume B plays b2

                                         EECS 463 Course Project   3/11/2010
Game proceeds.
13



      stage                        0


      A’s selection                a1


      B’s selection                b2


      A’s payoff                   5


      B’ payoff                    2


      kAt(b1), qAt(b1)   1, 0.5         1, 0.33


      kAt(b2), qAt(b2)   1, 0.5         2, 0.67


      kBt(a1), qBt(a1)   1, 0 .5        2, 0.67


      kBt(a2), qBt(a2)   1, 0 .5        1, 0.33


                                                  EECS 463 Course Project   3/11/2010
Game proceeds..
14



      stage                        0              1


      A’s selection                a1             a1


      B’s selection                b2             b1


      A’s payoff                   5              4


      B’ payoff                    2              4


      kAt(b1), qAt(b1)   1, 0.5         1, 0.33         2, 0.5


      kAt(b2), qAt(b2)   1, 0.5         2, 0.67         2, 0.5


      kBt(a1), qBt(a1)   1, 0 .5        2, 0.67         3, 0.75


      kBt(a2), qBt(a2)   1, 0 .5        1, 0.33         1, 0.25


                                                  EECS 463 Course Project   3/11/2010
Game proceeds…
15



      stage                        0              1               2


      A’s selection                a1             a1              a1


      B’s selection                b2             b1              b1


      A’s payoff                   5              4               4


      B’ payoff                    2              4               4


      kAt(b1), qAt(b1)   1, 0.5         1, 0.33         2, 0.5              3, 0.6


      kAt(b2), qAt(b2)   1, 0.5         2, 0.67         2, 0.5              2, 0.4


      kBt(a1), qBt(a1)   1, 0 .5        2, 0.67         3, 0.75             4, 0.2


      kBt(a2), qBt(a2)   1, 0 .5        1, 0.33         1, 0.25             1, 0.8


                                                  EECS 463 Course Project     3/11/2010
Game proceeds….
16



      stage                        0              1               2                  3


      A’s selection                a1             a1              a1                 a1


      B’s selection                b2             b1              b1                 b1


      A’s payoff                   5              4               4                  4


      B’ payoff                    2              4               4                  4


      kAt(b1), qAt(b1)   1, 0.5         1, 0.33         2, 0.5              3, 0.6        4, 0.67


      kAt(b2), qAt(b2)   1, 0.5         2, 0.67         2, 0.5              2, 0.4        2, 0.33


      kBt(a1), qBt(a1)   1, 0 .5        2, 0.67         3, 0.75             4, 0.2        5, 0 .84


      kBt(a2), qBt(a2)   1, 0 .5        1, 0.33         1, 0.25             1, 0.8        1, 0.16


                                                  EECS 463 Course Project     3/11/2010
Gradient Based Learning
17


      Fictitious Play assumes unbounded computation is
      allowed in every step – arg max calculation
      An alternative is to proceed in gradient ascent on some
      objective function – expected payoff
      Two players – row and column – have payoffs
               r  r        c c 
            R= 11

                   r 
                       and
                       12
                           C=    
                                               11    12

                r
                21    22    c c             21    22

      Row player chooses action 1 with probability α while
      column player chooses action 2 with probability β
      Expected payoffs are
            Vr (α, β ) = r11αβ + r12α (1 − β ) + r21(1 − α)β + r22 (1 − α )(1 − β )
            Vc (α , β ) = c11αβ + c12α (1 − β ) + c21 (1 − α )β + c22 (1 − α )(1 − β )
                                               EECS 463 Course Project         3/11/2010
Gradient Ascent
18


      Each player repeatedly adjusts her half of the current strategy
      pair in the direction of the current gradient with some step size η
                                                     ∂Vr (α k , β k )
                                     α k +1 = α k + η
                                                          ∂α
                                                     ∂V (α , β )
                                     β k +1   = βk +η c k k
                                                          ∂β
      In case the equations take the strategies outside the probability
      simplex, it is projected back to the boundary
      Gradient ascent algorithm assumes a full information game –
      both the players know the game matrices and can see the mixed
      strategy of their opponent in the previous step
         u = (r11 + r22 ) − (r21 + r12 )                      u' = (c11 +c22) −(c21 +c12)
         ∂Vr (α , β )                                         ∂Vc (α , β )
                      = βu − (r22 − r12 )                                  = αu ' − (c 22 − c 21 )
            ∂α                                                   ∂β
                                                       EECS 463 Course Project       3/11/2010
Infinitesimal Gradient Ascent
19

      Interesting to see what happens to the strategy pair and to the
      expected payoffs over time
      Strategy pair sequence produced by following a gradient ascent
      algorithm may never converge
      Average payoff of both the players always converges to that of some
      Nash pair
      Consider a small step size assumption – limη →0 so that the update
      equations become        ∂α 
                                 ∂t    0          u  α   − ( r22 − r12 ) 
                                 ∂β   = '                  +
                                       u          0   β   − ( c 22 − c 21 ) 
                                                       
                                                                                
                                 ∂t   
      Point where the gradient is zero – Nash Equilibrium
                                       c − c         r22 − r12 
                        (α * , β * ) =  22 ' 21 ,
                                        u                u    
      This point might even lie outside the probability simplex.

                                           EECS 463 Course Project            3/11/2010
IGA dynamics
20


      Denote the off-diagonal matrix containing u and u’ by U
      Depending on the nature of U (noninvertible, real or imaginary
      e-values) the convergence dynamics will vary




                               EECS 463 Course Project   3/11/2010
WoLF - W(in)-o(r)-L(earn)-Fast
21


      Introduces variable learning rate instead of a fixed η
                                                     ∂Vr (α k , β k )
                             α k +1 = α k + ηl r
                                                          ∂α
                                                 k



                                                     ∂ V c (α k , β k )
                             β k +1 = β k + η l kc
                                                            ∂β
      Let αe be the equilibrium strategy selected by the row player
      and βe be the equilibrium strategy selected by the column player
                         l                 Vr (αk , βk ) > Vr (α e , βk ) →Winning
                     l =  min
                         r
                         k
                         l max                     →
                                            otherwise Losin g

                                l          Vc (αk , βk ) > Vc (αk , β e ) →Winning
                     l   c
                         k    =  min
                                 l max              →
                                             otherwise Losing

      If in a two-person, two-action, iterated general-sum game, both
      players follow the WoLF-IGA algorithm (with lmax>lmin) then their
      strategies will converge to a Nash equilibrium
                                                EECS 463 Course Project          3/11/2010
WoLF-IGA convergence
22




                  EECS 463 Course Project   3/11/2010
To Conclude
23


      Learning in games is popular in anticipation of a future in
      which less than rational agents play a game repeatedly to
      arrive at a stable and efficient equilibrium.
      The algorithmic structure and adaptive techniques involved in
      such learning are largely motivated by Machine Learning and
      Adaptive Filtering
      A Gradient- based approach relieves this computational
      burden but might suffer from convergence issues
      A stochastic gradient method (not discussed in the presentation)
      makes use of minimal information available and still performs
      near-optimally

                                EECS 463 Course Project   3/11/2010

More Related Content

Recently uploaded

會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽中 央社
 
Mbaye_Astou.Education Civica_Human Rights.pptx
Mbaye_Astou.Education Civica_Human Rights.pptxMbaye_Astou.Education Civica_Human Rights.pptx
Mbaye_Astou.Education Civica_Human Rights.pptxnuriaiuzzolino1
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTechSoup
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxShibin Azad
 
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya - UEM Kolkata Quiz Club
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesRased Khan
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptSourabh Kumar
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17Celine George
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesashishpaul799
 
IATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdffIATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdff17thcssbs2
 
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...Abhinav Gaur Kaptaan
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxheathfieldcps1
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfbu07226
 
....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdf....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdfVikramadityaRaj
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Celine George
 
[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online PresentationGDSCYCCE
 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryEugene Lysak
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/siemaillard
 

Recently uploaded (20)

會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽會考英聽
 
Mbaye_Astou.Education Civica_Human Rights.pptx
Mbaye_Astou.Education Civica_Human Rights.pptxMbaye_Astou.Education Civica_Human Rights.pptx
Mbaye_Astou.Education Civica_Human Rights.pptx
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptx
 
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General QuizPragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
Pragya Champions Chalice 2024 Prelims & Finals Q/A set, General Quiz
 
Application of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matricesApplication of Matrices in real life. Presentation on application of matrices
Application of Matrices in real life. Presentation on application of matrices
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
B.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdfB.ed spl. HI pdusu exam paper-2023-24.pdf
B.ed spl. HI pdusu exam paper-2023-24.pdf
 
How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17How to Manage Notification Preferences in the Odoo 17
How to Manage Notification Preferences in the Odoo 17
 
ppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyesppt your views.ppt your views of your college in your eyes
ppt your views.ppt your views of your college in your eyes
 
IATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdffIATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdff
 
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
Research Methods in Psychology | Cambridge AS Level | Cambridge Assessment In...
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
 
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdfINU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
INU_CAPSTONEDESIGN_비밀번호486_업로드용 발표자료.pdf
 
....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdf....................Muslim-Law notes.pdf
....................Muslim-Law notes.pdf
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
 
[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation
 
The Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. HenryThe Last Leaf, a short story by O. Henry
The Last Leaf, a short story by O. Henry
 
Championnat de France de Tennis de table/
Championnat de France de Tennis de table/Championnat de France de Tennis de table/
Championnat de France de Tennis de table/
 

Featured

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Adaptive Learning In Games

  • 1. EECS 463 Course Project 1 ADAPTIVE LEARNING IN GAMES 3/11/2010 Suvarup Saha
  • 2. Outline 2 Motivation Games Learning in Games Adaptive Learning Example Gradient Techniques Conclusion EECS 463 Course Project 3/11/2010
  • 3. Motivation 3 Adaptive Filtering Techniques generalize to a lot of applications outside Gradient Based iterative search Stochastic Gradient Least Squares Application of Game Theory in less than rational multi- agent scenarios demand self-learning mechanisms Adaptive techniques can be applied in such instances to help the agents learn the game and play intelligently EECS 463 Course Project 3/11/2010
  • 4. Games 4 A game is an interaction between two or more self-interested agents Each agent chooses a strategy si from a set of strategies, Si A (joint) strategy profile, s, is the set of chosen strategies, also called an outcome of the game in a single play Each agent has a utility function, ui(s), specifying their preference for each outcome in terms of a payoff An agent’s best response is the strategy with the highest payoff, given its opponents choice of strategy A Nash equilibrium is a strategy profile such that every agent’s strategy is a best response to others’ choice of strategy EECS 463 Course Project 3/11/2010
  • 5. A Normal Form Game 5 B b1 b2 A a1 4,4 5,2 a2 0,1 4,3 This is a 2 player game with SA={a1,a2}, SB={b1,b2} The ui(s) are explicitly given in a matrix form, for example uA(a1, b2) = 5, uB(a1, b2) = 2 The best response of A to B playing b2 is a1 In this game, (a1, b1) is the unique Nash Equilibrium EECS 463 Course Project 3/11/2010
  • 6. Learning in Games 6 Classical Approach: Compute an optimal/equilibrium strategy Some criticisms to this approach are Other agents’ utilities might be unknown to an agent for computing an equilibrium strategy Other agents might not be playing an equilibrium strategy Computing an equilibrium strategy might be hard Another Approach: Learn how to ‘optimally’ play a game by playing it many times updating strategy based on experience EECS 463 Course Project 3/11/2010
  • 7. Learning Dynamics 7 Rationality/Sophistication of agents Evolutionary Adaptive Bayesian Dynamics Learning Learning Focus of Our Discussion EECS 463 Course Project 3/11/2010
  • 8. Evolutionary Dynamics 8 Inspired by Evolutionary Biology with no appeal to rationality of the agents Entire population of agents all programmed to use some strategy Players are randomly matched to play with each other Strategies with high payoff spread within the population by Learning copying or inheriting strategies – Replicator Dynamics Infection Stability analysis – Evolutionary Stable Strategies (ESS) Players playing an ESS must have strictly higher payoffs than a small group of invaders playing a different strategy EECS 463 Course Project 3/11/2010
  • 9. Bayesian Learning 9 Assumes ‘informed agents’ playing repeated games with a finite action space Payoffs depend on some characteristics of agents represented by types – each agent’s type is private information The agents’ initial beliefs are given by a common prior distribution over agent types This belief is updated according to Bayes’ Rule to a posterior distribution with each stage of the game. In every finite Bayesian game, there is at least one Bayesian Nash equilibrium, possibly in mixed strategies EECS 463 Course Project 3/11/2010
  • 10. Adaptive Learning 10 Agents are not fully rational, but can learn through experience and adapt their strategies Agents do not know the reward structure of the game Agents are only able to take actions and observe their own rewards (or oppnents’ rewards as well) Popular Examples Best Response Update Fictitious Play Regret Matching Infinitesimal Gradient Ascent (IGA) Dynamic Gradient Play Adaptive Play Q-learning EECS 463 Course Project 3/11/2010
  • 11. Fictitious Play 11 The learning process is used to develop a ‘historical distribution’ of the other agents’ play In fictitious play, agent i has an exogenous initial weight function kit: S-i R+ Weight is updated by adding 1 to the weight of each opponent strategy, each time it is played The probability that player i assigns to player -i playing s-i at date t is given by qit(s-i) = kit(s-i) / Σ kit(s-i) The ‘best response’ of the agent i in this fictitious play is given by sit+1 = arg max Σ qit(s-i)ui(si, s-it) EECS 463 Course Project 3/11/2010
  • 12. An Example 12 Consider the same 2x2 game example as before B Suppose we assign b1 b2 kA0 (b1)= kA0 (b2)= kB0 (a1)= kB0 (a2)= 1 A a1 4,4 5,2 Then, qA0 (b1)= qA0 (b2)= qB0 (a1)= qB0 (a2)= 0.5 a2 0,1 4,3 For A, if A chooses a1 qA0(b1)uA(a1, b1) + qA0(b2)uA(a1, b2) = .5*4+.5*5 = 4.5 while if A chooses a2 qA0(b1)uA(a2, b1) + qA0(b2)uA(a2, b2) = .5*0+.5*4 = 2 For B, if B chooses b1 qB0(a1)uB(a1, b1) + qB0(a2)uB(a2, b1) = .5*4+.5*1 = 2.5 while if B chooses b2 qB0(a1)uB(a1, b2) + qB0(a2)uB(a2, b2) = .5*2+.5*3 = 2.5 Clearly, A plays a1 , B can choose either b1 or b2; assume B plays b2 EECS 463 Course Project 3/11/2010
  • 13. Game proceeds. 13 stage 0 A’s selection a1 B’s selection b2 A’s payoff 5 B’ payoff 2 kAt(b1), qAt(b1) 1, 0.5 1, 0.33 kAt(b2), qAt(b2) 1, 0.5 2, 0.67 kBt(a1), qBt(a1) 1, 0 .5 2, 0.67 kBt(a2), qBt(a2) 1, 0 .5 1, 0.33 EECS 463 Course Project 3/11/2010
  • 14. Game proceeds.. 14 stage 0 1 A’s selection a1 a1 B’s selection b2 b1 A’s payoff 5 4 B’ payoff 2 4 kAt(b1), qAt(b1) 1, 0.5 1, 0.33 2, 0.5 kAt(b2), qAt(b2) 1, 0.5 2, 0.67 2, 0.5 kBt(a1), qBt(a1) 1, 0 .5 2, 0.67 3, 0.75 kBt(a2), qBt(a2) 1, 0 .5 1, 0.33 1, 0.25 EECS 463 Course Project 3/11/2010
  • 15. Game proceeds… 15 stage 0 1 2 A’s selection a1 a1 a1 B’s selection b2 b1 b1 A’s payoff 5 4 4 B’ payoff 2 4 4 kAt(b1), qAt(b1) 1, 0.5 1, 0.33 2, 0.5 3, 0.6 kAt(b2), qAt(b2) 1, 0.5 2, 0.67 2, 0.5 2, 0.4 kBt(a1), qBt(a1) 1, 0 .5 2, 0.67 3, 0.75 4, 0.2 kBt(a2), qBt(a2) 1, 0 .5 1, 0.33 1, 0.25 1, 0.8 EECS 463 Course Project 3/11/2010
  • 16. Game proceeds…. 16 stage 0 1 2 3 A’s selection a1 a1 a1 a1 B’s selection b2 b1 b1 b1 A’s payoff 5 4 4 4 B’ payoff 2 4 4 4 kAt(b1), qAt(b1) 1, 0.5 1, 0.33 2, 0.5 3, 0.6 4, 0.67 kAt(b2), qAt(b2) 1, 0.5 2, 0.67 2, 0.5 2, 0.4 2, 0.33 kBt(a1), qBt(a1) 1, 0 .5 2, 0.67 3, 0.75 4, 0.2 5, 0 .84 kBt(a2), qBt(a2) 1, 0 .5 1, 0.33 1, 0.25 1, 0.8 1, 0.16 EECS 463 Course Project 3/11/2010
  • 17. Gradient Based Learning 17 Fictitious Play assumes unbounded computation is allowed in every step – arg max calculation An alternative is to proceed in gradient ascent on some objective function – expected payoff Two players – row and column – have payoffs r r  c c  R= 11 r  and 12 C=  11 12  r 21  22 c c  21 22 Row player chooses action 1 with probability α while column player chooses action 2 with probability β Expected payoffs are Vr (α, β ) = r11αβ + r12α (1 − β ) + r21(1 − α)β + r22 (1 − α )(1 − β ) Vc (α , β ) = c11αβ + c12α (1 − β ) + c21 (1 − α )β + c22 (1 − α )(1 − β ) EECS 463 Course Project 3/11/2010
  • 18. Gradient Ascent 18 Each player repeatedly adjusts her half of the current strategy pair in the direction of the current gradient with some step size η ∂Vr (α k , β k ) α k +1 = α k + η ∂α ∂V (α , β ) β k +1 = βk +η c k k ∂β In case the equations take the strategies outside the probability simplex, it is projected back to the boundary Gradient ascent algorithm assumes a full information game – both the players know the game matrices and can see the mixed strategy of their opponent in the previous step u = (r11 + r22 ) − (r21 + r12 ) u' = (c11 +c22) −(c21 +c12) ∂Vr (α , β ) ∂Vc (α , β ) = βu − (r22 − r12 ) = αu ' − (c 22 − c 21 ) ∂α ∂β EECS 463 Course Project 3/11/2010
  • 19. Infinitesimal Gradient Ascent 19 Interesting to see what happens to the strategy pair and to the expected payoffs over time Strategy pair sequence produced by following a gradient ascent algorithm may never converge Average payoff of both the players always converges to that of some Nash pair Consider a small step size assumption – limη →0 so that the update equations become  ∂α   ∂t  0 u  α   − ( r22 − r12 )   ∂β = ' +   u 0   β   − ( c 22 − c 21 )        ∂t  Point where the gradient is zero – Nash Equilibrium c − c r22 − r12  (α * , β * ) =  22 ' 21 ,  u u   This point might even lie outside the probability simplex. EECS 463 Course Project 3/11/2010
  • 20. IGA dynamics 20 Denote the off-diagonal matrix containing u and u’ by U Depending on the nature of U (noninvertible, real or imaginary e-values) the convergence dynamics will vary EECS 463 Course Project 3/11/2010
  • 21. WoLF - W(in)-o(r)-L(earn)-Fast 21 Introduces variable learning rate instead of a fixed η ∂Vr (α k , β k ) α k +1 = α k + ηl r ∂α k ∂ V c (α k , β k ) β k +1 = β k + η l kc ∂β Let αe be the equilibrium strategy selected by the row player and βe be the equilibrium strategy selected by the column player l Vr (αk , βk ) > Vr (α e , βk ) →Winning l =  min r k l max → otherwise Losin g l Vc (αk , βk ) > Vc (αk , β e ) →Winning l c k =  min  l max → otherwise Losing If in a two-person, two-action, iterated general-sum game, both players follow the WoLF-IGA algorithm (with lmax>lmin) then their strategies will converge to a Nash equilibrium EECS 463 Course Project 3/11/2010
  • 22. WoLF-IGA convergence 22 EECS 463 Course Project 3/11/2010
  • 23. To Conclude 23 Learning in games is popular in anticipation of a future in which less than rational agents play a game repeatedly to arrive at a stable and efficient equilibrium. The algorithmic structure and adaptive techniques involved in such learning are largely motivated by Machine Learning and Adaptive Filtering A Gradient- based approach relieves this computational burden but might suffer from convergence issues A stochastic gradient method (not discussed in the presentation) makes use of minimal information available and still performs near-optimally EECS 463 Course Project 3/11/2010