Successfully reported this slideshow.
Upcoming SlideShare
×

# Theory of Repeated Games

1,501 views

Published on

Lecture slides on Repeated Games I used in the following lecture:

Published in: Economy & Finance
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Theory of Repeated Games

1. 1. Theory of Repeated Games Lecture Notes on Central Results Yosuke YASUDA Osaka University, Department of Economics yasuda@econ.osaka-u.ac.jp Last-Update: May 21, 2015 1 / 36
2. 2. Announcement Course Website: You can ﬁnd my corse websites from the link below: https://sites.google.com/site/yosukeyasuda2/home/lecture/repeated15 Textbook & Survey: MS is a comprehensive textbook on repeated games, K and P are highly readable survey articles, which complement MS. MS Mailath and Samuelson, Repeated Games and Reputations: Long-run Relationships. 2006. K Kandori, 2008. P Pearce, 1992. Symbols that we use in lectures:£ ¢   ¡Ex : Example, § ¦ ¤ ¥ Fg : Figure, § ¦ ¤ ¥ Q : Question, £ ¢   ¡Rm : Remark. 2 / 36
3. 3. Finitely Repeated Games (1) A repeated game, a speciﬁc class of dynamic game, is a suitable framework for studying the interaction between immediate gains and long-term incentives, and for understanding how a reputation mechanism can support cooperation. Let G = {A1, ..., An; u1, ..., un} denote a static game in which players 1 through n simultaneously choose actions a1 through an from the action spaces A1 through An, and the corresponding payoﬀs are u1(a1, ..., an) through un(a1, ..., an). Deﬁnition 1 The game G is called the stage game of the repeated game. Given a stage game G, let G(T) denote the ﬁnitely repeated game in which G is played T times, with the outcomes of all preceding plays observed before the next play begins. Assume that the payoﬀ for G(T) is simply the sum of the payoﬀs from the T stage games. (future payoﬀs are not discounted) 3 / 36
4. 4. Finitely Repeated Games (2) Theorem 2 If the stage game G has a unique Nash equilibrium, then, for any ﬁnite T, the repeated game G(T) has a unique subgame perfect Nash equilibrium: the Nash equilibrium of G is played in every stage irrespective of the past history of the play. Proof. We can solve the game by backward induction, that is, starting from the smallest subgame and going backward through the game. In stage T, players choose a unique Nash equilibrium of G. Given that, in stage T − 1, players again end up choosing the same Nash equilibrium outcome, since no matter what they play in T − 1 the last stage game outcome will be unchanged. This argument carries over backwards through stage 1, which concludes that the unique Nash equilibrium outcome is played in every stage (irrespective of the past history). 4 / 36
5. 5. Finitely Repeated Games (3) When there are more than one Nash equilibrium in a stage game, multiple subgame perfect Nash equilibria may exist. Furthermore, an action proﬁle which does not constitute a stage game Nash equilibrium may be sustained (for any period t < T) in a subgame perfect Nash equilibrium. § ¦ ¤ ¥ Q The following stage game will be played twice. Can players support non-equilibrium outcome (M1, M2) in the ﬁrst period? 1 2 L2 M2 R2 L1 1, 1 5, 0 0, 0 M1 0, 5 4, 4 0, 0 R1 0, 0 0, 0 3, 3 £ ¢   ¡Rm Note that there are two Nash equilibria in the stage game: (L1, L2), (R1, R2): what players choose in the ﬁrst period may result in diﬀerent outcomes (equilibria) in the second period. 5 / 36
6. 6. Inﬁnitely Repeated Games (1) Even if the stage game has a unique Nash equilibrium, there may be subgame perfect outcomes of the inﬁnitely repeated game in which no stage game’s outcome is a Nash equilibrium of G. Let G(∞, δ) denote the inﬁnitely repeated game in which G is repeated forever and the players share the discount factor δ. For each t, the outcomes of the t − 1 preceding plays of the stage game are observed before the t-th stage begins. Each player’s payoﬀ in G(∞, δ) is the average payoﬀ deﬁned as follows. Deﬁnition 3 Given the discount factor δ, the average payoﬀ of the inﬁnite sequence of payoﬀs u1 , u2 , ... is (1 − δ)(u1 + δu2 + δ2 u3 + · · · ) = (1 − δ) ∞ t=1 δt−1 ut . 6 / 36
7. 7. Inﬁnitely Repeated Games (2) There are a few important remarks: The history of play through stage t is the record of the players’ choices in stages 1 through t. The players might have chosen (as 1, ..., as n) in stage s, where for each player i the action as i belongs to Ai. In the ﬁnitely repeated game G(T) or the inﬁnitely repeated game G(∞, δ), a player’s strategy speciﬁes the action that she will take in each stage, for every possible history of play. In the inﬁnitely repeated game G(∞, δ), each subgame beginning at any stage is identical to the original game. In G(T), a subgame beginning at stage t + 1 is the repeated game in which G is played T − t times, denoted by G(T − t). In a repeated game, a Nash equilibrium is subgame perfect if the players’ strategies constitute a Nash equilibrium in every subgame, i.e., after every possible history of the play. 7 / 36
8. 8. Unimprovability (1) Deﬁnition 4 A strategy σi is called a perfect best response to the other players’ strategies, when player i has no incentive to deviate following any history. Consider the following requirement that, at ﬁrst glance, looks much weaker than the perfect best response condition. Deﬁnition 5 A strategy for i is unimprovable against a vector of strategies of her opponents if there is no t − 1 period history (for any t) such that i could proﬁt by deviating from her strategy in period t only and conforming thereafter (i.e., switching back to the original strategy). To verify the unimprovability of a strategy, one needs to checks only “one-shot” deviations from the strategy, rather than arbitrarily complex deviations. 8 / 36
9. 9. Unimprovability (2) The following result simpliﬁes the analysis of SPNE immensely. It is the exact counterpart of a well-known result from dynamic programming due to Howard (1960), and was ﬁrst emphasized in the context of self-enforcing cooperation by Abreu (1988). Theorem 6 Let the payoﬀs of G be bounded. In the repeated game G(T) or G(∞, δ), strategy σi is a perfect best response to a proﬁle of strategies σ if and only if σi is unimprovable against that proﬁle. The proof is simple, and generalizes easily to a wide variety of dynamic and stochastic games with discounting and bounded payoﬀs. 9 / 36
10. 10. Unimprovability (3) Proof of ⇒ (Note ⇐ is trivial). We will only show “⇒” since “⇐” is trivial. Consider the contrapositive, i.e., not perfect best response ⇒ not umimprovable. 1 If σi is not a perfect best response, there must be a history after which it is proﬁtable to deviate to some other strategy. 2 Then, because of discounting and boundedness of payoﬀs, there must exist a proﬁtable deviation involves defection for ﬁnitely many periods (and conforms to σi thereafter). If the deviation involves defection at inﬁnitely many nodes, then for suﬃciently large T, the strategy σi that agrees with σi until time T and conforms to σ thereafter, is also a proﬁtable deviation (because of discounting and boundedness of payoﬀs). 3 Consider a proﬁtable deviation involving defection at the smallest possible number of period, denoted by T. 4 In such a proﬁtable deviation, the player must be improvable (not unimprobable) after deviating for T − 1 period. 10 / 36
11. 11. Repeated Prisoner’s Dilemma (1) § ¦ ¤ ¥ Q The following prisoner’s dilemma will be played inﬁnitely many times. Under what conditions of δ, can a SPNE support cooperation (C1, C2)? 1 2 C2 D2 C1 2, 2 -1, 3 D2 3, -1 0, 0 Suppose that player i plays Ci in the ﬁrst stage. In the t-th stage, if the outcome of all t − 1 preceding stages has been all (C1, C2) then play Ci; otherwise, play Di (thereafter). This strategy is called trigger strategy, because player i cooperates until someone fails to cooperate, which triggers a switch to noncooperation forever after. If both players adopt this trigger strategy then the outcome of the inﬁnitely repeated game will be (C1, C2) in every stage. 11 / 36
12. 12. Repeated Prisoner’s Dilemma (2) To show that the trigger strategy is SPNE, we must verify that the trigger strategies constitute a Nash equilibrium on every possible subgame that could be generated in the inﬁnitely repeated game. £ ¢   ¡Rm Since every subgame of an inﬁnitely repeated game is identical to the game as a whole (thanks to its recursive structure), we have to consider only two types of subgames: (i) subgame in which all the outcomes of earlier stages have been (C1, C2), and (ii) subgames in which the outcome of at least one earlier stage diﬀers from (C1, C2). By unimprovability, it is suﬃcient to show that there is no one-shot proﬁtable deviation in every possible history that can realize when players follow the trigger strategies. Players have no incentive to deviate in (ii) since trigger strategy involves repeated play of one shot NE, (D1, D2). 12 / 36
13. 13. Repeated Prisoner’s Dilemma (3) The following condition guarantees that there will be no (one-shot) proﬁtable deviation in (i). 2 + δ × 2 + δ2 × 2 + · · · ≥ 3 + δ × 0 + δ2 × 0 + · · · ⇐⇒ 2(δ + δ2 + · · · ) ≥ 1 ⇐⇒ 2δ 1 − δ ≥ 1 ⇐⇒ δ ≥ 1 3 . Mutual cooperation (C1, C2) can be sustained as an SPNE outcome by using the trigger strategy when players are long-sighted. Trigger strategy (in repeated prisoner’s dilemma) is the severest punishment, since each player receives her minmax payoﬀ (in every period) after deviation happens. 13 / 36
14. 14. Folk Theorem: Preparation (1) £ ¢   ¡Rm The following expositions are Fudenberg and Maskin (1986). For each j, choose Mj = (Mj 1 , . . . , Mj n) so that (Mj 1 , . . . , Mj j−1, Mj j+1, . . . , Mj n) ∈ arg min a−j max aj uj(aj, a−j), and player j’s reservation value is deﬁned by v∗ j := max aj ui(aj, Mj −j) = ui(Mj ). The strategies Mj = (Mj 1 , . . . , Mj j−1, Mj j+1, . . . , Mj n) are minimax strategies (which may not be unique) against player j, and v∗ j is the smallest payoﬀ that the other players can keep player j below. We refer to (v∗ 1, . . . , v∗ n) as the minimax point. 14 / 36
15. 15. Folk Theorem: Preparation (2) Deﬁnition 7 Let V be the set of feasible payoﬀs, i.e., a convex hull of payoﬀ vectors u yielded by (pure) action proﬁles, and V ∗ (⊂ V ) be the set of feasible payoﬀs that Pareto dominate the minimax point: V ∗ = {(v1, . . . , vn) ∈ V |vi > 0 for all i}. V ∗ is called the set of individually rational payoﬀs. There are a couple of versions of folk theorem. The name comes from the fact that the statement (relying on NE rather than SPNE) was widely known among game theorists in the 1950s, even though no one had published it. 15 / 36
16. 16. Folk Theorem (1) Theorem 8 (Theorem A) For any (v1, . . . , vn) ∈ V ∗ , if players discount the future suﬃciently little, there exists a Nash equilibrium of the inﬁnitely repeated game where, for all i, player i’s average payoﬀ is vi. If a player deviates, it may not be in others’ interest to go through with the punishment of minimaxing him forever. However, Aumann and Shapley (1976) and Rubinstein (1979) showed that, when there is no discounting, the counterpart of Theorem A holds for SPNE. Theorem 9 (Theorem B) For any (v1, . . . , vn) ∈ V ∗ there exists a subgame perfect equilibrium in the inﬁnitely repeated game with no discounting, where, for all i, player i’s expected payoﬀ each period is vi. 16 / 36
17. 17. Folk Theorem (2) One well-known case that admits both discounting and simple strategies is where the point to be sustained Pareto dominates the payoﬀs of a Nash equilibrium of the constituent game G. Theorem 10 (Theorem C) Suppose (v1, . . . , vn) ∈ V ∗ Pareto dominates the payoﬀs (y1, . . . , yn) of a (one-shot) Nash equilibrium (e1, . . . , en) of G. If players discount the future suﬃciently little, there exists a subgame perfect equilibrium of the inﬁnitely repeated game where, for all i, player i’s average payoﬀ is vi. Because the punishments used in Theorem C are less severe than those in Theorems A and B, its conclusion is weaker. For example, Theorem C does not allow us to conclude that a Stackelberg outcome can be supported as an equilibrium in an inﬁnitely repeated quantity-setting duopoly. 17 / 36
18. 18. General Falk Theorem — Two Players Abreu (1988) shows that there is no loss in restricting attention to simple punishments when players discount the future. Indeed, simple punishments are employed in the proof of the following result. Theorem 11 (Theorem 1) For any (v1, v2) ∈ V ∗ there exists δ ∈ (0, 1) such that, for all δ ∈ (δ, 1), there exists a subgame perfect equilibrium of the inﬁnitely repeated game in which player i’s average payoﬀ is vi when players have discount factor δ. After a deviation by either player, the players (mutually) minimax each other for a certain number of periods, after which they return to the original path. If a further deviation occurs during the punishment phase, the phase is begun again. 18 / 36
19. 19. General Falk Theorem — Three or More Players The method we used to establish Theorem 1 –“mutual minimaxing”– does not extend to three or more players. Theorem 12 (Theorem 2) Assume that the dimensionality of V ∗ equals n, the number of players, i.e., that the interior of V (relative to n-dimensional space) is nonempty. Then, for any (v1, . . . , vn) in V ∗ , there exists δ ∈ (0, 1) such that for all δ ∈ (δ, 1) there exists a subgame perfect equilibrium of the inﬁnitely repeated game with discount factor δ in which player i’s average payoﬀ is vi. If a player deviates, he is minimaxed by the other players long enough to wipe out any gain from his deviation. To induce the other players to go through with minimaxing him, they are ultimately given a “reward” in the form of an additional ε in their average payoﬀ. The possibility of providing such a reward relies on the full dimensionality of the payoﬀ set. 19 / 36
20. 20. Imperfect Monitoring (1) Perfect Monitoring: Players can fully observe the history of their past play. There is no monitoring diﬃculty or imperfection. Bounded/Imperfect Recall: Players forget (part of) the history of their past play, especially that of distant past, as time goes by. Imperfect Monitoring: Players cannot directly observe the (full) history of their past play, but instead observe signals that depend on actions taken in the previous period. § ¦ ¤ ¥ Public Monitoring Players publicly observe a common signal. § ¦ ¤ ¥ Private Monitoring Players privately receives diﬀerent signals. 20 / 36
21. 21. Imperfect Monitoring (2) Punishment necessarily becomes indirectly linked with deviation. Players can punish the deviator only in reaction to the common signals, since they cannot observe deviation itself. Even if no one has deviated, punishment is triggered when bad signal realizes (with positive probability). ⇒ Constructing (eﬃcient) punishment becomes dramatically diﬃcult. 21 / 36
22. 22. Example | Prisoner’s Dilemma (1) Consider the following Prisoner’s Dilemma as a stage game while each player cannot observe the rival’s past actions. Table: Ex ante Payoﬀs ui(ai, a−i) 1 2 C D C 2, 2 -1, 3 D 3, -1 0, 0 § ¦ ¤ ¥ Q Can each player deduce the rival’s action through the realized payoﬀ (and her own action) ? If this is the case indeed, then observation cannot be imperfect... 22 / 36
23. 23. Example | Prisoner’s Dilemma (2) Player i’s payoﬀ in each period depends only on her own action, ai ∈ {C, D} and the public signal, y ∈ {g, b}, i.e., u∗ i (y, ai). Table: Ex post Payoﬀs u∗ i (y, ai) i y g b C 3 − p − 2q p − q − p + 2q p − q D 3(1 − r) q − r − 3r q − r p, q, r (0 < q, r < p < 1) are conditional probabilities that g realizes: p = Pr{g|CC}, q = Pr{g|DC} = Pr{g|CD}, r = Pr{g|DD}. 23 / 36
24. 24. Example | Prisoner’s Dilemma (3) To achieve cooperation, consider the (modiﬁed) trigger strategies: Play (C, C) in the ﬁrst period. Continue to play (C, C) as long as g keeps realized. Play (D, D) forever once b is realized. The above trigger strategies constitute an SPNE if and only if the following condition is satisﬁed: δ(3p − 2q) ≥ 1 ⇐⇒ δ ≥ 1 3p − 2q (7.2.4 in MS) Then, symmetric equilibrium (average) payoﬀ becomes 2(1 − δ) 1 − δp , which converges 0 as δ goes to 1. 24 / 36
25. 25. General Model (1) n (long-lived) players engage in an inﬁnitely repeated game with discrete time horizon (t = 0, 1, . . . ∞) whose stage game is deﬁned as follows: ai ∈ Ai: Player i’s action (Ai is assumed ﬁnite) y ∈ Y : Public signal realizes at the end of each period (Y is ﬁnite) ρ(y|a): Conditional probability function (assuming full-support) ρ(y|α): Extension to mixed action proﬁle α ∈ Πn i=1∆(Ai) Πi(α−i) := ρ(·|·, α−i): |Ai| × |Y | matrix. u∗ i (y, ai): Player i’s ex post payoﬀ ui(a): Player i’s ex ante payoﬀ, expressed by ui(a) = y∈Y u∗ i (y, ai)ρ(y|a) (7.1.1 in MS) V (δ): Set of equilibrium (PPE, deﬁned later) payoﬀ under δ 25 / 36
26. 26. General Model (2) In the repeated game (of imperfect public monitoring), the only public information available in period t is the t-period history of public signals: ht := (y0 , y1 , . . . , yt−1 ). The set of public histories is (Y 0 is empty, note h0 is not well-deﬁned): H := ∪∞ t=0Y t A history for player i includes both the public history and the history of actions that i has taken: ht i := (y0 , a0 i ; y1 , a1 i ; . . . ; yt−1 , at−1 i ). The set of histories for player i is ((Y, Ai)0 is empty): Hi := ∪∞ t=0(Ai × Y )t 26 / 36
27. 27. Perfect Public Equilibrium (1) A pure strategy for player i is a mapping from all possible histories into the set of pure actions, σi : Hi → Ai. A mixed strategy is a mixture over pure strategies. A behavior strategy is a mapping σi : Hi → ∆(Ai). Deﬁnition 13 (Def 7.1.1) A behavior strategy σi is public if, in every period t, it depends only on the public history ht ∈ Y t and not on i’s private history. That is, for all ht i, ˆht i ∈ Hi satisfying yτ = ˆyτ for all τ ≤ t − 1, σi(ht i) = σi(ˆht i). A behavior strategy σi is private if it is not public. 27 / 36
28. 28. Perfect Public Equilibrium (2) Deﬁnition 14 (Def 7.1.2) Suppose Ai = Aj for all i and j. A public proﬁle σ is strongly symmetric if, for all public histories ht , σi(ht ) = σj(ht ) for all i and j. Deﬁnition 15 (Def 7.1.3) A perfect public equilibrium (PPE) is a proﬁle of public strategies σ that for any public history ht , speciﬁes a Nash equilibrium for the repeated game. A PPE is strict if each player strictly prefers his equilibrium strategy to every other public strategy. Lemma 16 (Lemma 7.1.1) If all players other than i are playing a public strategy, then player i has a public strategy as a best reply. Therefore, every PPE is a sequential equilibrium. 28 / 36
29. 29. Dynamic Programming Approach 1 Decomposition Transforming a dynamic game into a static game. In so doing, recursive structure and unimprovability play key roles. 2 Self-Generation Useful property to characterize the set of equilibrium (PPE) payoﬀs. Without (explicitly) solving a game, the set of equilibrium payoﬀs can be fully and computationally identiﬁed. 29 / 36
30. 30. Decomposition — Perfect Monitoring A continuation payoﬀ can be decomposed by a current period payoﬀ and future payoﬀs of the repeated game starting from the next period: vi = (1 − δ)ui(a) + δγi(a) (1) where γ : A → V (δ) (⊂ Rn ) assigns an equilibrium payoﬀ vector to each action proﬁle and γi is i’s element (i’s assigned payoﬀ). Theorem 17 v is supported (as an average payoﬀ) by an SPNE if and only if there exist a mixed action proﬁle α ∈ ∆(A) and γ : ∆(A) → V (δ) such that ∀i ∀ai ∈ Ai vi(α) = (1 − δ)ui(α) + δγi(α) ≥ (1 − δ)ui(ai, α−i) + δγi(ai, α−i) 30 / 36
31. 31. Decomposition — Imperfect Monitoring A continuation payoﬀ can be decomposed by a current period payoﬀ and future payoﬀs of the repeated game starting from the next period: vi = (1 − δ)ui(a) + δ y∈Y γi(y)ρ(y|a) (2) where γ : Y → V (δ) (⊂ Rn ) assigns an equilibrium (PPE) payoﬀ vector to each public signal and γi is i’s element (i’s assigned payoﬀ). Theorem 18 v is supported (as an average payoﬀ) by a PPE if and only if there exist a mixed action proﬁle α ∈ ∆(A) and γ : ∆(A) → V (δ) such that ∀i ∀ai ∈ Ai vi(α) = (1 − δ)ui(α) + δ y∈Y γi(y)ρ(y|α) ≥ (1 − δ)ui(ai, α−i) + δ y∈Y γi(y)ρ(y|ai, α−i) 31 / 36
32. 32. Self-Generation (1) What happens if the range of the mapping γ, V (δ) is replaced with an arbitrary set W(⊂ Rn ) ? Deﬁnition 19 Let B(W) be a set of vector w = (w1, . . . , wn) if there exist a mixed action proﬁle α ∈ ∆(A) and γ : ∆(A) → W such that ∀i ∀ai ∈ Ai wi(α) = (1 − δ)ui(α) + δ y∈Y γi(y)ρ(y|α) ≥ (1 − δ)ui(ai, α−i) + δ y∈Y γi(y)ρ(y|ai, α−i) W is called self-generating (or self-enforceable) if W ⊆ B(W). 32 / 36
33. 33. Self-Generation (2) Theorem 20 The set of average payoﬀs in PPE is the ﬁxed point of mapping B(·). Theorem 21 If W ⊆ W , then B(W) ⊆ B(W ) must be satisﬁed. Theorem 22 If W is self-generating, then the following holds: W ⊆ ∞ t=1 Bt (W) ⊆ V (δ) (3) If W is bounded and V (δ) ⊂ W, then ∞ t=1 Bt (W) = V (δ) (4) 33 / 36
34. 34. Folk Theorem by FLM (1994) (1) Deﬁnition 23 The proﬁle α has individual full rank for player i if Πi(α−i) has rank equal to |Ai|, that is, the |Ai| vectors {ρ(·|ai, α−i)}ai∈Ai are linearly independent. If this is so for every player i, α has individual full rank. Note that if α has individual full rank, the number of observable outcomes |Y | must be at least maxi |Ai|. Deﬁnition 24 Proﬁle α is pairwise-identiﬁable for players i and j if the rank of matrix Πij(α) equals rank Πi(α−i) + Πj(α−j) − 1. Deﬁnition 25 Proﬁle α has pairwise full rank for players i and j if the matrix Πij(α) has rank |Ai| + |Aj| − 1. 34 / 36
35. 35. Folk Theorem by FLM (1994) (2) Pairwise full rank on α (for players i and j) is actually the conjunction of two weaker conditions, individual full rank and pairwise-identiﬁablity (on α for i and j). 1 Pairwise full rank obviously implies individual full rank: incentives can be designed to induce a player to choose a given action. 2 It also ensures pairwise-identiﬁablity: deviations by players i and j are distinct in the sense that they induce diﬀerent probability distributions over public outcomes. 3 Thus, player i’s incentives can be designed without interfering with those of player j. 35 / 36
36. 36. Folk Theorem by FLM (1994) (3) Theorem 26 Suppose that every pure action proﬁle a has individual full rank and either (i) for all pairs i and j, there exists a mixed action proﬁle α that has pairwise full rank for that pair, or (ii) every pure-action, Pareto-eﬃcient proﬁle is pairwise-identiﬁable for all pairs of players, holds. Let W be a smooth subset in the interior of V ∗ . Then there exists δ < 1 such that, for all δ > δ, W ⊆ E(δ), i.e., each point in W corresponds to a perfect public equilibrium payoﬀ with discount factor δ. The theorem applies only to interior points and so do not pertain to payoﬀs on the eﬃcient frontier. This contrasts with the standard Folk Theorem for observable actions, in which eﬃcient payoﬀs can be exactly attained. 36 / 36