SlideShare a Scribd company logo
EECS 463 Course Project   1




            ADAPTIVE LEARNING IN
            GAMES
3/11/2010   Suvarup Saha
Outline
2


     Motivation
     Games
     Learning in Games
     Adaptive Learning
       Example
     Gradient Techniques
     Conclusion


                           EECS 463 Course Project   3/11/2010
Motivation
3


     Adaptive Filtering Techniques generalize to a lot of
     applications outside
       Gradient Based iterative search
       Stochastic Gradient
       Least Squares
     Application of Game Theory in less than rational multi-
     agent scenarios demand self-learning mechanisms
     Adaptive techniques can be applied in such instances to
     help the agents learn the game and play intelligently

                             EECS 463 Course Project   3/11/2010
Games
4


     A game is an interaction between two or more self-interested
     agents
     Each agent chooses a strategy si from a set of strategies, Si
     A (joint) strategy profile, s, is the set of chosen strategies, also
     called an outcome of the game in a single play
     Each agent has a utility function, ui(s), specifying their
     preference for each outcome in terms of a payoff
     An agent’s best response is the strategy with the highest
     payoff, given its opponents choice of strategy
     A Nash equilibrium is a strategy profile such that every
     agent’s strategy is a best response to others’ choice of strategy

                                 EECS 463 Course Project   3/11/2010
A Normal Form Game
5

                                  B

                           b1           b2

                A     a1   4,4          5,2
                      a2   0,1          4,3


     This is a 2 player game with SA={a1,a2}, SB={b1,b2}
     The ui(s) are explicitly given in a matrix form, for
     example uA(a1, b2) = 5, uB(a1, b2) = 2
     The best response of A to B playing b2 is a1
     In this game, (a1, b1) is the unique Nash Equilibrium
                                EECS 463 Course Project   3/11/2010
Learning in Games
6


     Classical Approach: Compute an optimal/equilibrium
     strategy
     Some criticisms to this approach are
       Other agents’ utilities might be unknown to an agent for
       computing an equilibrium strategy
       Other agents might not be playing an equilibrium strategy
       Computing an equilibrium strategy might be hard
      Another Approach: Learn how to ‘optimally’ play a game
     by
       playing it many times
       updating strategy based on experience
                              EECS 463 Course Project   3/11/2010
Learning Dynamics
7




                                                  Rationality/Sophistication of agents



        Evolutionary        Adaptive             Bayesian
        Dynamics            Learning             Learning




                       Focus of Our Discussion




                                   EECS 463 Course Project   3/11/2010
Evolutionary Dynamics
8

     Inspired by Evolutionary Biology with no appeal to
     rationality of the agents
     Entire population of agents all programmed to use some
     strategy
        Players are randomly matched to play with each other
     Strategies with high payoff spread within the population by
       Learning
       copying or inheriting strategies – Replicator Dynamics
       Infection
     Stability analysis – Evolutionary Stable Strategies (ESS)
       Players playing an ESS must have strictly higher payoffs than a
       small group of invaders playing a different strategy

                                EECS 463 Course Project   3/11/2010
Bayesian Learning
9


     Assumes ‘informed agents’ playing repeated games
     with a finite action space
     Payoffs depend on some characteristics of agents
     represented by types – each agent’s type is private
     information
     The agents’ initial beliefs are given by a common prior
     distribution over agent types
     This belief is updated according to Bayes’ Rule to a
     posterior distribution with each stage of the game.
     In every finite Bayesian game, there is at least one
     Bayesian Nash equilibrium, possibly in mixed strategies

                            EECS 463 Course Project   3/11/2010
Adaptive Learning
10

      Agents are not fully rational, but can learn through
      experience and adapt their strategies
      Agents do not know the reward structure of the game
      Agents are only able to take actions and observe their own
      rewards (or oppnents’ rewards as well)
      Popular Examples
        Best Response Update
        Fictitious Play
        Regret Matching
        Infinitesimal Gradient Ascent (IGA)
        Dynamic Gradient Play
        Adaptive Play Q-learning

                                  EECS 463 Course Project   3/11/2010
Fictitious Play
11


      The learning process is used to develop a ‘historical
      distribution’ of the other agents’ play
      In fictitious play, agent i has an exogenous initial weight
      function       kit: S-i R+
      Weight is updated by adding 1 to the weight of each
      opponent strategy, each time it is played
      The probability that player i assigns to player -i
      playing s-i at date t is given by
                  qit(s-i) = kit(s-i) / Σ kit(s-i)
      The ‘best response’ of the agent i in this fictitious play is
      given by
                 sit+1 = arg max Σ qit(s-i)ui(si, s-it)

                                     EECS 463 Course Project   3/11/2010
An Example
12

      Consider the same 2x2 game example as before
                                                                                     B
      Suppose we assign                                                        b1        b2
          kA0 (b1)= kA0 (b2)= kB0 (a1)= kB0 (a2)= 1                   A   a1   4,4       5,2
      Then, qA0 (b1)= qA0 (b2)= qB0 (a1)= qB0 (a2)= 0.5
                                                                          a2   0,1       4,3
      For A, if A chooses a1
            qA0(b1)uA(a1, b1) + qA0(b2)uA(a1, b2) = .5*4+.5*5 = 4.5
      while if A chooses a2
            qA0(b1)uA(a2, b1) + qA0(b2)uA(a2, b2) = .5*0+.5*4 = 2
      For B, if B chooses b1
            qB0(a1)uB(a1, b1) + qB0(a2)uB(a2, b1) = .5*4+.5*1 = 2.5
      while if B chooses b2
            qB0(a1)uB(a1, b2) + qB0(a2)uB(a2, b2) = .5*2+.5*3 = 2.5
      Clearly, A plays a1 , B can choose either b1 or b2; assume B plays b2

                                         EECS 463 Course Project   3/11/2010
Game proceeds.
13



      stage                        0


      A’s selection                a1


      B’s selection                b2


      A’s payoff                   5


      B’ payoff                    2


      kAt(b1), qAt(b1)   1, 0.5         1, 0.33


      kAt(b2), qAt(b2)   1, 0.5         2, 0.67


      kBt(a1), qBt(a1)   1, 0 .5        2, 0.67


      kBt(a2), qBt(a2)   1, 0 .5        1, 0.33


                                                  EECS 463 Course Project   3/11/2010
Game proceeds..
14



      stage                        0              1


      A’s selection                a1             a1


      B’s selection                b2             b1


      A’s payoff                   5              4


      B’ payoff                    2              4


      kAt(b1), qAt(b1)   1, 0.5         1, 0.33         2, 0.5


      kAt(b2), qAt(b2)   1, 0.5         2, 0.67         2, 0.5


      kBt(a1), qBt(a1)   1, 0 .5        2, 0.67         3, 0.75


      kBt(a2), qBt(a2)   1, 0 .5        1, 0.33         1, 0.25


                                                  EECS 463 Course Project   3/11/2010
Game proceeds…
15



      stage                        0              1               2


      A’s selection                a1             a1              a1


      B’s selection                b2             b1              b1


      A’s payoff                   5              4               4


      B’ payoff                    2              4               4


      kAt(b1), qAt(b1)   1, 0.5         1, 0.33         2, 0.5              3, 0.6


      kAt(b2), qAt(b2)   1, 0.5         2, 0.67         2, 0.5              2, 0.4


      kBt(a1), qBt(a1)   1, 0 .5        2, 0.67         3, 0.75             4, 0.2


      kBt(a2), qBt(a2)   1, 0 .5        1, 0.33         1, 0.25             1, 0.8


                                                  EECS 463 Course Project     3/11/2010
Game proceeds….
16



      stage                        0              1               2                  3


      A’s selection                a1             a1              a1                 a1


      B’s selection                b2             b1              b1                 b1


      A’s payoff                   5              4               4                  4


      B’ payoff                    2              4               4                  4


      kAt(b1), qAt(b1)   1, 0.5         1, 0.33         2, 0.5              3, 0.6        4, 0.67


      kAt(b2), qAt(b2)   1, 0.5         2, 0.67         2, 0.5              2, 0.4        2, 0.33


      kBt(a1), qBt(a1)   1, 0 .5        2, 0.67         3, 0.75             4, 0.2        5, 0 .84


      kBt(a2), qBt(a2)   1, 0 .5        1, 0.33         1, 0.25             1, 0.8        1, 0.16


                                                  EECS 463 Course Project     3/11/2010
Gradient Based Learning
17


      Fictitious Play assumes unbounded computation is
      allowed in every step – arg max calculation
      An alternative is to proceed in gradient ascent on some
      objective function – expected payoff
      Two players – row and column – have payoffs
               r  r        c c 
            R= 11

                   r 
                       and
                       12
                           C=    
                                               11    12

                r
                21    22    c c             21    22

      Row player chooses action 1 with probability α while
      column player chooses action 2 with probability β
      Expected payoffs are
            Vr (α, β ) = r11αβ + r12α (1 − β ) + r21(1 − α)β + r22 (1 − α )(1 − β )
            Vc (α , β ) = c11αβ + c12α (1 − β ) + c21 (1 − α )β + c22 (1 − α )(1 − β )
                                               EECS 463 Course Project         3/11/2010
Gradient Ascent
18


      Each player repeatedly adjusts her half of the current strategy
      pair in the direction of the current gradient with some step size η
                                                     ∂Vr (α k , β k )
                                     α k +1 = α k + η
                                                          ∂α
                                                     ∂V (α , β )
                                     β k +1   = βk +η c k k
                                                          ∂β
      In case the equations take the strategies outside the probability
      simplex, it is projected back to the boundary
      Gradient ascent algorithm assumes a full information game –
      both the players know the game matrices and can see the mixed
      strategy of their opponent in the previous step
         u = (r11 + r22 ) − (r21 + r12 )                      u' = (c11 +c22) −(c21 +c12)
         ∂Vr (α , β )                                         ∂Vc (α , β )
                      = βu − (r22 − r12 )                                  = αu ' − (c 22 − c 21 )
            ∂α                                                   ∂β
                                                       EECS 463 Course Project       3/11/2010
Infinitesimal Gradient Ascent
19

      Interesting to see what happens to the strategy pair and to the
      expected payoffs over time
      Strategy pair sequence produced by following a gradient ascent
      algorithm may never converge
      Average payoff of both the players always converges to that of some
      Nash pair
      Consider a small step size assumption – limη →0 so that the update
      equations become        ∂α 
                                 ∂t    0          u  α   − ( r22 − r12 ) 
                                 ∂β   = '                  +
                                       u          0   β   − ( c 22 − c 21 ) 
                                                       
                                                                                
                                 ∂t   
      Point where the gradient is zero – Nash Equilibrium
                                       c − c         r22 − r12 
                        (α * , β * ) =  22 ' 21 ,
                                        u                u    
      This point might even lie outside the probability simplex.

                                           EECS 463 Course Project            3/11/2010
IGA dynamics
20


      Denote the off-diagonal matrix containing u and u’ by U
      Depending on the nature of U (noninvertible, real or imaginary
      e-values) the convergence dynamics will vary




                               EECS 463 Course Project   3/11/2010
WoLF - W(in)-o(r)-L(earn)-Fast
21


      Introduces variable learning rate instead of a fixed η
                                                     ∂Vr (α k , β k )
                             α k +1 = α k + ηl r
                                                          ∂α
                                                 k



                                                     ∂ V c (α k , β k )
                             β k +1 = β k + η l kc
                                                            ∂β
      Let αe be the equilibrium strategy selected by the row player
      and βe be the equilibrium strategy selected by the column player
                         l                 Vr (αk , βk ) > Vr (α e , βk ) →Winning
                     l =  min
                         r
                         k
                         l max                     →
                                            otherwise Losin g

                                l          Vc (αk , βk ) > Vc (αk , β e ) →Winning
                     l   c
                         k    =  min
                                 l max              →
                                             otherwise Losing

      If in a two-person, two-action, iterated general-sum game, both
      players follow the WoLF-IGA algorithm (with lmax>lmin) then their
      strategies will converge to a Nash equilibrium
                                                EECS 463 Course Project          3/11/2010
WoLF-IGA convergence
22




                  EECS 463 Course Project   3/11/2010
To Conclude
23


      Learning in games is popular in anticipation of a future in
      which less than rational agents play a game repeatedly to
      arrive at a stable and efficient equilibrium.
      The algorithmic structure and adaptive techniques involved in
      such learning are largely motivated by Machine Learning and
      Adaptive Filtering
      A Gradient- based approach relieves this computational
      burden but might suffer from convergence issues
      A stochastic gradient method (not discussed in the presentation)
      makes use of minimal information available and still performs
      near-optimally

                                EECS 463 Course Project   3/11/2010

More Related Content

Recently uploaded

1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
ShivajiThube2
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
TechSoup
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
kimdan468
 

Recently uploaded (20)

1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
JEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questionsJEE1_This_section_contains_FOUR_ questions
JEE1_This_section_contains_FOUR_ questions
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Introduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp NetworkIntroduction to AI for Nonprofits with Tapp Network
Introduction to AI for Nonprofits with Tapp Network
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBCSTRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
STRAND 3 HYGIENIC PRACTICES.pptx GRADE 7 CBC
 

Featured

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
DevGAMM Conference
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

Adaptive Learning In Games

  • 1. EECS 463 Course Project 1 ADAPTIVE LEARNING IN GAMES 3/11/2010 Suvarup Saha
  • 2. Outline 2 Motivation Games Learning in Games Adaptive Learning Example Gradient Techniques Conclusion EECS 463 Course Project 3/11/2010
  • 3. Motivation 3 Adaptive Filtering Techniques generalize to a lot of applications outside Gradient Based iterative search Stochastic Gradient Least Squares Application of Game Theory in less than rational multi- agent scenarios demand self-learning mechanisms Adaptive techniques can be applied in such instances to help the agents learn the game and play intelligently EECS 463 Course Project 3/11/2010
  • 4. Games 4 A game is an interaction between two or more self-interested agents Each agent chooses a strategy si from a set of strategies, Si A (joint) strategy profile, s, is the set of chosen strategies, also called an outcome of the game in a single play Each agent has a utility function, ui(s), specifying their preference for each outcome in terms of a payoff An agent’s best response is the strategy with the highest payoff, given its opponents choice of strategy A Nash equilibrium is a strategy profile such that every agent’s strategy is a best response to others’ choice of strategy EECS 463 Course Project 3/11/2010
  • 5. A Normal Form Game 5 B b1 b2 A a1 4,4 5,2 a2 0,1 4,3 This is a 2 player game with SA={a1,a2}, SB={b1,b2} The ui(s) are explicitly given in a matrix form, for example uA(a1, b2) = 5, uB(a1, b2) = 2 The best response of A to B playing b2 is a1 In this game, (a1, b1) is the unique Nash Equilibrium EECS 463 Course Project 3/11/2010
  • 6. Learning in Games 6 Classical Approach: Compute an optimal/equilibrium strategy Some criticisms to this approach are Other agents’ utilities might be unknown to an agent for computing an equilibrium strategy Other agents might not be playing an equilibrium strategy Computing an equilibrium strategy might be hard Another Approach: Learn how to ‘optimally’ play a game by playing it many times updating strategy based on experience EECS 463 Course Project 3/11/2010
  • 7. Learning Dynamics 7 Rationality/Sophistication of agents Evolutionary Adaptive Bayesian Dynamics Learning Learning Focus of Our Discussion EECS 463 Course Project 3/11/2010
  • 8. Evolutionary Dynamics 8 Inspired by Evolutionary Biology with no appeal to rationality of the agents Entire population of agents all programmed to use some strategy Players are randomly matched to play with each other Strategies with high payoff spread within the population by Learning copying or inheriting strategies – Replicator Dynamics Infection Stability analysis – Evolutionary Stable Strategies (ESS) Players playing an ESS must have strictly higher payoffs than a small group of invaders playing a different strategy EECS 463 Course Project 3/11/2010
  • 9. Bayesian Learning 9 Assumes ‘informed agents’ playing repeated games with a finite action space Payoffs depend on some characteristics of agents represented by types – each agent’s type is private information The agents’ initial beliefs are given by a common prior distribution over agent types This belief is updated according to Bayes’ Rule to a posterior distribution with each stage of the game. In every finite Bayesian game, there is at least one Bayesian Nash equilibrium, possibly in mixed strategies EECS 463 Course Project 3/11/2010
  • 10. Adaptive Learning 10 Agents are not fully rational, but can learn through experience and adapt their strategies Agents do not know the reward structure of the game Agents are only able to take actions and observe their own rewards (or oppnents’ rewards as well) Popular Examples Best Response Update Fictitious Play Regret Matching Infinitesimal Gradient Ascent (IGA) Dynamic Gradient Play Adaptive Play Q-learning EECS 463 Course Project 3/11/2010
  • 11. Fictitious Play 11 The learning process is used to develop a ‘historical distribution’ of the other agents’ play In fictitious play, agent i has an exogenous initial weight function kit: S-i R+ Weight is updated by adding 1 to the weight of each opponent strategy, each time it is played The probability that player i assigns to player -i playing s-i at date t is given by qit(s-i) = kit(s-i) / Σ kit(s-i) The ‘best response’ of the agent i in this fictitious play is given by sit+1 = arg max Σ qit(s-i)ui(si, s-it) EECS 463 Course Project 3/11/2010
  • 12. An Example 12 Consider the same 2x2 game example as before B Suppose we assign b1 b2 kA0 (b1)= kA0 (b2)= kB0 (a1)= kB0 (a2)= 1 A a1 4,4 5,2 Then, qA0 (b1)= qA0 (b2)= qB0 (a1)= qB0 (a2)= 0.5 a2 0,1 4,3 For A, if A chooses a1 qA0(b1)uA(a1, b1) + qA0(b2)uA(a1, b2) = .5*4+.5*5 = 4.5 while if A chooses a2 qA0(b1)uA(a2, b1) + qA0(b2)uA(a2, b2) = .5*0+.5*4 = 2 For B, if B chooses b1 qB0(a1)uB(a1, b1) + qB0(a2)uB(a2, b1) = .5*4+.5*1 = 2.5 while if B chooses b2 qB0(a1)uB(a1, b2) + qB0(a2)uB(a2, b2) = .5*2+.5*3 = 2.5 Clearly, A plays a1 , B can choose either b1 or b2; assume B plays b2 EECS 463 Course Project 3/11/2010
  • 13. Game proceeds. 13 stage 0 A’s selection a1 B’s selection b2 A’s payoff 5 B’ payoff 2 kAt(b1), qAt(b1) 1, 0.5 1, 0.33 kAt(b2), qAt(b2) 1, 0.5 2, 0.67 kBt(a1), qBt(a1) 1, 0 .5 2, 0.67 kBt(a2), qBt(a2) 1, 0 .5 1, 0.33 EECS 463 Course Project 3/11/2010
  • 14. Game proceeds.. 14 stage 0 1 A’s selection a1 a1 B’s selection b2 b1 A’s payoff 5 4 B’ payoff 2 4 kAt(b1), qAt(b1) 1, 0.5 1, 0.33 2, 0.5 kAt(b2), qAt(b2) 1, 0.5 2, 0.67 2, 0.5 kBt(a1), qBt(a1) 1, 0 .5 2, 0.67 3, 0.75 kBt(a2), qBt(a2) 1, 0 .5 1, 0.33 1, 0.25 EECS 463 Course Project 3/11/2010
  • 15. Game proceeds… 15 stage 0 1 2 A’s selection a1 a1 a1 B’s selection b2 b1 b1 A’s payoff 5 4 4 B’ payoff 2 4 4 kAt(b1), qAt(b1) 1, 0.5 1, 0.33 2, 0.5 3, 0.6 kAt(b2), qAt(b2) 1, 0.5 2, 0.67 2, 0.5 2, 0.4 kBt(a1), qBt(a1) 1, 0 .5 2, 0.67 3, 0.75 4, 0.2 kBt(a2), qBt(a2) 1, 0 .5 1, 0.33 1, 0.25 1, 0.8 EECS 463 Course Project 3/11/2010
  • 16. Game proceeds…. 16 stage 0 1 2 3 A’s selection a1 a1 a1 a1 B’s selection b2 b1 b1 b1 A’s payoff 5 4 4 4 B’ payoff 2 4 4 4 kAt(b1), qAt(b1) 1, 0.5 1, 0.33 2, 0.5 3, 0.6 4, 0.67 kAt(b2), qAt(b2) 1, 0.5 2, 0.67 2, 0.5 2, 0.4 2, 0.33 kBt(a1), qBt(a1) 1, 0 .5 2, 0.67 3, 0.75 4, 0.2 5, 0 .84 kBt(a2), qBt(a2) 1, 0 .5 1, 0.33 1, 0.25 1, 0.8 1, 0.16 EECS 463 Course Project 3/11/2010
  • 17. Gradient Based Learning 17 Fictitious Play assumes unbounded computation is allowed in every step – arg max calculation An alternative is to proceed in gradient ascent on some objective function – expected payoff Two players – row and column – have payoffs r r  c c  R= 11 r  and 12 C=  11 12  r 21  22 c c  21 22 Row player chooses action 1 with probability α while column player chooses action 2 with probability β Expected payoffs are Vr (α, β ) = r11αβ + r12α (1 − β ) + r21(1 − α)β + r22 (1 − α )(1 − β ) Vc (α , β ) = c11αβ + c12α (1 − β ) + c21 (1 − α )β + c22 (1 − α )(1 − β ) EECS 463 Course Project 3/11/2010
  • 18. Gradient Ascent 18 Each player repeatedly adjusts her half of the current strategy pair in the direction of the current gradient with some step size η ∂Vr (α k , β k ) α k +1 = α k + η ∂α ∂V (α , β ) β k +1 = βk +η c k k ∂β In case the equations take the strategies outside the probability simplex, it is projected back to the boundary Gradient ascent algorithm assumes a full information game – both the players know the game matrices and can see the mixed strategy of their opponent in the previous step u = (r11 + r22 ) − (r21 + r12 ) u' = (c11 +c22) −(c21 +c12) ∂Vr (α , β ) ∂Vc (α , β ) = βu − (r22 − r12 ) = αu ' − (c 22 − c 21 ) ∂α ∂β EECS 463 Course Project 3/11/2010
  • 19. Infinitesimal Gradient Ascent 19 Interesting to see what happens to the strategy pair and to the expected payoffs over time Strategy pair sequence produced by following a gradient ascent algorithm may never converge Average payoff of both the players always converges to that of some Nash pair Consider a small step size assumption – limη →0 so that the update equations become  ∂α   ∂t  0 u  α   − ( r22 − r12 )   ∂β = ' +   u 0   β   − ( c 22 − c 21 )        ∂t  Point where the gradient is zero – Nash Equilibrium c − c r22 − r12  (α * , β * ) =  22 ' 21 ,  u u   This point might even lie outside the probability simplex. EECS 463 Course Project 3/11/2010
  • 20. IGA dynamics 20 Denote the off-diagonal matrix containing u and u’ by U Depending on the nature of U (noninvertible, real or imaginary e-values) the convergence dynamics will vary EECS 463 Course Project 3/11/2010
  • 21. WoLF - W(in)-o(r)-L(earn)-Fast 21 Introduces variable learning rate instead of a fixed η ∂Vr (α k , β k ) α k +1 = α k + ηl r ∂α k ∂ V c (α k , β k ) β k +1 = β k + η l kc ∂β Let αe be the equilibrium strategy selected by the row player and βe be the equilibrium strategy selected by the column player l Vr (αk , βk ) > Vr (α e , βk ) →Winning l =  min r k l max → otherwise Losin g l Vc (αk , βk ) > Vc (αk , β e ) →Winning l c k =  min  l max → otherwise Losing If in a two-person, two-action, iterated general-sum game, both players follow the WoLF-IGA algorithm (with lmax>lmin) then their strategies will converge to a Nash equilibrium EECS 463 Course Project 3/11/2010
  • 22. WoLF-IGA convergence 22 EECS 463 Course Project 3/11/2010
  • 23. To Conclude 23 Learning in games is popular in anticipation of a future in which less than rational agents play a game repeatedly to arrive at a stable and efficient equilibrium. The algorithmic structure and adaptive techniques involved in such learning are largely motivated by Machine Learning and Adaptive Filtering A Gradient- based approach relieves this computational burden but might suffer from convergence issues A stochastic gradient method (not discussed in the presentation) makes use of minimal information available and still performs near-optimally EECS 463 Course Project 3/11/2010