Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Theory of Repeated Games
Lecture Notes on Central Results
Yosuke YASUDA
Osaka University, Department of Economics
yasuda@e...
Announcement
Course Website: You can find my corse websites from the link below:
https://sites.google.com/site/yosukeyasuda...
Finitely Repeated Games (1)
A repeated game, a specific class of dynamic game, is a suitable
framework for studying the int...
Finitely Repeated Games (2)
Theorem 2
If the stage game G has a unique Nash equilibrium, then, for any finite
T, the repeat...
Finitely Repeated Games (3)
When there are more than one Nash equilibrium in a stage game,
multiple subgame perfect Nash e...
Infinitely Repeated Games (1)
Even if the stage game has a unique Nash equilibrium, there may be
subgame perfect outcomes o...
Infinitely Repeated Games (2)
There are a few important remarks:
The history of play through stage t is the record of the p...
Unimprovability (1)
Definition 4
A strategy σi is called a perfect best response to the other players’
strategies, when pla...
Unimprovability (2)
The following result simplifies the analysis of SPNE immensely.
It is the exact counterpart of a well-k...
Unimprovability (3)
Proof of ⇒ (Note ⇐ is trivial).
We will only show “⇒” since “⇐” is trivial. Consider the contrapositiv...
Repeated Prisoner’s Dilemma (1)
§
¦
¤
¥
Q The following prisoner’s dilemma will be played infinitely many times.
Under what...
Repeated Prisoner’s Dilemma (2)
To show that the trigger strategy is SPNE, we must verify that the
trigger strategies cons...
Repeated Prisoner’s Dilemma (3)
The following condition guarantees that there will be no (one-shot)
profitable deviation in...
Folk Theorem: Preparation (1)
£
¢
 
¡Rm The following expositions are Fudenberg and Maskin (1986).
For each j, choose Mj
=...
Folk Theorem: Preparation (2)
Definition 7
Let V be the set of feasible payoffs, i.e., a convex hull of payoff vectors
u yiel...
Folk Theorem (1)
Theorem 8 (Theorem A)
For any (v1, . . . , vn) ∈ V ∗
, if players discount the future sufficiently little,
...
Folk Theorem (2)
One well-known case that admits both discounting and simple strategies
is where the point to be sustained...
General Falk Theorem — Two Players
Abreu (1988) shows that there is no loss in restricting attention to
simple punishments...
General Falk Theorem — Three or More Players
The method we used to establish Theorem 1 –“mutual minimaxing”–
does not exte...
Imperfect Monitoring (1)
Perfect Monitoring: Players can fully observe the history of their past
play. There is no monitor...
Imperfect Monitoring (2)
Punishment necessarily becomes indirectly linked with deviation.
Players can punish the deviator ...
Example | Prisoner’s Dilemma (1)
Consider the following Prisoner’s Dilemma as a stage game while each
player cannot observ...
Example | Prisoner’s Dilemma (2)
Player i’s payoff in each period depends only on her own action,
ai ∈ {C, D} and the publi...
Example | Prisoner’s Dilemma (3)
To achieve cooperation, consider the (modified) trigger strategies:
Play (C, C) in the firs...
General Model (1)
n (long-lived) players engage in an infinitely repeated game with discrete
time horizon (t = 0, 1, . . . ...
General Model (2)
In the repeated game (of imperfect public monitoring), the only public
information available in period t...
Perfect Public Equilibrium (1)
A pure strategy for player i is a mapping from all possible histories into
the set of pure ...
Perfect Public Equilibrium (2)
Definition 14 (Def 7.1.2)
Suppose Ai = Aj for all i and j. A public profile σ is strongly
sym...
Dynamic Programming Approach
1 Decomposition
Transforming a dynamic game into a static game.
In so doing, recursive struct...
Decomposition — Perfect Monitoring
A continuation payoff can be decomposed by a current period payoff and
future payoffs of t...
Decomposition — Imperfect Monitoring
A continuation payoff can be decomposed by a current period payoff and
future payoffs of...
Self-Generation (1)
What happens if the range of the mapping γ, V (δ) is replaced with an
arbitrary set W(⊂ Rn
) ?
Definiti...
Self-Generation (2)
Theorem 20
The set of average payoffs in PPE is the fixed point of mapping B(·).
Theorem 21
If W ⊆ W , t...
Folk Theorem by FLM (1994) (1)
Definition 23
The profile α has individual full rank for player i if Πi(α−i) has rank
equal t...
Folk Theorem by FLM (1994) (2)
Pairwise full rank on α (for players i and j) is actually the conjunction
of two weaker con...
Folk Theorem by FLM (1994) (3)
Theorem 26
Suppose that every pure action profile a has individual full rank and
either (i) ...
Upcoming SlideShare
Loading in …5
×

Theory of Repeated Games

1,645 views

Published on

Lecture slides on Repeated Games I used in the following lecture:
https://sites.google.com/site/yosukeyasuda2/home/lecture/repeated15

Published in: Economy & Finance
  • Be the first to comment

Theory of Repeated Games

  1. 1. Theory of Repeated Games Lecture Notes on Central Results Yosuke YASUDA Osaka University, Department of Economics yasuda@econ.osaka-u.ac.jp Last-Update: May 21, 2015 1 / 36
  2. 2. Announcement Course Website: You can find my corse websites from the link below: https://sites.google.com/site/yosukeyasuda2/home/lecture/repeated15 Textbook & Survey: MS is a comprehensive textbook on repeated games, K and P are highly readable survey articles, which complement MS. MS Mailath and Samuelson, Repeated Games and Reputations: Long-run Relationships. 2006. K Kandori, 2008. P Pearce, 1992. Symbols that we use in lectures:£ ¢   ¡Ex : Example, § ¦ ¤ ¥ Fg : Figure, § ¦ ¤ ¥ Q : Question, £ ¢   ¡Rm : Remark. 2 / 36
  3. 3. Finitely Repeated Games (1) A repeated game, a specific class of dynamic game, is a suitable framework for studying the interaction between immediate gains and long-term incentives, and for understanding how a reputation mechanism can support cooperation. Let G = {A1, ..., An; u1, ..., un} denote a static game in which players 1 through n simultaneously choose actions a1 through an from the action spaces A1 through An, and the corresponding payoffs are u1(a1, ..., an) through un(a1, ..., an). Definition 1 The game G is called the stage game of the repeated game. Given a stage game G, let G(T) denote the finitely repeated game in which G is played T times, with the outcomes of all preceding plays observed before the next play begins. Assume that the payoff for G(T) is simply the sum of the payoffs from the T stage games. (future payoffs are not discounted) 3 / 36
  4. 4. Finitely Repeated Games (2) Theorem 2 If the stage game G has a unique Nash equilibrium, then, for any finite T, the repeated game G(T) has a unique subgame perfect Nash equilibrium: the Nash equilibrium of G is played in every stage irrespective of the past history of the play. Proof. We can solve the game by backward induction, that is, starting from the smallest subgame and going backward through the game. In stage T, players choose a unique Nash equilibrium of G. Given that, in stage T − 1, players again end up choosing the same Nash equilibrium outcome, since no matter what they play in T − 1 the last stage game outcome will be unchanged. This argument carries over backwards through stage 1, which concludes that the unique Nash equilibrium outcome is played in every stage (irrespective of the past history). 4 / 36
  5. 5. Finitely Repeated Games (3) When there are more than one Nash equilibrium in a stage game, multiple subgame perfect Nash equilibria may exist. Furthermore, an action profile which does not constitute a stage game Nash equilibrium may be sustained (for any period t < T) in a subgame perfect Nash equilibrium. § ¦ ¤ ¥ Q The following stage game will be played twice. Can players support non-equilibrium outcome (M1, M2) in the first period? 1 2 L2 M2 R2 L1 1, 1 5, 0 0, 0 M1 0, 5 4, 4 0, 0 R1 0, 0 0, 0 3, 3 £ ¢   ¡Rm Note that there are two Nash equilibria in the stage game: (L1, L2), (R1, R2): what players choose in the first period may result in different outcomes (equilibria) in the second period. 5 / 36
  6. 6. Infinitely Repeated Games (1) Even if the stage game has a unique Nash equilibrium, there may be subgame perfect outcomes of the infinitely repeated game in which no stage game’s outcome is a Nash equilibrium of G. Let G(∞, δ) denote the infinitely repeated game in which G is repeated forever and the players share the discount factor δ. For each t, the outcomes of the t − 1 preceding plays of the stage game are observed before the t-th stage begins. Each player’s payoff in G(∞, δ) is the average payoff defined as follows. Definition 3 Given the discount factor δ, the average payoff of the infinite sequence of payoffs u1 , u2 , ... is (1 − δ)(u1 + δu2 + δ2 u3 + · · · ) = (1 − δ) ∞ t=1 δt−1 ut . 6 / 36
  7. 7. Infinitely Repeated Games (2) There are a few important remarks: The history of play through stage t is the record of the players’ choices in stages 1 through t. The players might have chosen (as 1, ..., as n) in stage s, where for each player i the action as i belongs to Ai. In the finitely repeated game G(T) or the infinitely repeated game G(∞, δ), a player’s strategy specifies the action that she will take in each stage, for every possible history of play. In the infinitely repeated game G(∞, δ), each subgame beginning at any stage is identical to the original game. In G(T), a subgame beginning at stage t + 1 is the repeated game in which G is played T − t times, denoted by G(T − t). In a repeated game, a Nash equilibrium is subgame perfect if the players’ strategies constitute a Nash equilibrium in every subgame, i.e., after every possible history of the play. 7 / 36
  8. 8. Unimprovability (1) Definition 4 A strategy σi is called a perfect best response to the other players’ strategies, when player i has no incentive to deviate following any history. Consider the following requirement that, at first glance, looks much weaker than the perfect best response condition. Definition 5 A strategy for i is unimprovable against a vector of strategies of her opponents if there is no t − 1 period history (for any t) such that i could profit by deviating from her strategy in period t only and conforming thereafter (i.e., switching back to the original strategy). To verify the unimprovability of a strategy, one needs to checks only “one-shot” deviations from the strategy, rather than arbitrarily complex deviations. 8 / 36
  9. 9. Unimprovability (2) The following result simplifies the analysis of SPNE immensely. It is the exact counterpart of a well-known result from dynamic programming due to Howard (1960), and was first emphasized in the context of self-enforcing cooperation by Abreu (1988). Theorem 6 Let the payoffs of G be bounded. In the repeated game G(T) or G(∞, δ), strategy σi is a perfect best response to a profile of strategies σ if and only if σi is unimprovable against that profile. The proof is simple, and generalizes easily to a wide variety of dynamic and stochastic games with discounting and bounded payoffs. 9 / 36
  10. 10. Unimprovability (3) Proof of ⇒ (Note ⇐ is trivial). We will only show “⇒” since “⇐” is trivial. Consider the contrapositive, i.e., not perfect best response ⇒ not umimprovable. 1 If σi is not a perfect best response, there must be a history after which it is profitable to deviate to some other strategy. 2 Then, because of discounting and boundedness of payoffs, there must exist a profitable deviation involves defection for finitely many periods (and conforms to σi thereafter). If the deviation involves defection at infinitely many nodes, then for sufficiently large T, the strategy σi that agrees with σi until time T and conforms to σ thereafter, is also a profitable deviation (because of discounting and boundedness of payoffs). 3 Consider a profitable deviation involving defection at the smallest possible number of period, denoted by T. 4 In such a profitable deviation, the player must be improvable (not unimprobable) after deviating for T − 1 period. 10 / 36
  11. 11. Repeated Prisoner’s Dilemma (1) § ¦ ¤ ¥ Q The following prisoner’s dilemma will be played infinitely many times. Under what conditions of δ, can a SPNE support cooperation (C1, C2)? 1 2 C2 D2 C1 2, 2 -1, 3 D2 3, -1 0, 0 Suppose that player i plays Ci in the first stage. In the t-th stage, if the outcome of all t − 1 preceding stages has been all (C1, C2) then play Ci; otherwise, play Di (thereafter). This strategy is called trigger strategy, because player i cooperates until someone fails to cooperate, which triggers a switch to noncooperation forever after. If both players adopt this trigger strategy then the outcome of the infinitely repeated game will be (C1, C2) in every stage. 11 / 36
  12. 12. Repeated Prisoner’s Dilemma (2) To show that the trigger strategy is SPNE, we must verify that the trigger strategies constitute a Nash equilibrium on every possible subgame that could be generated in the infinitely repeated game. £ ¢   ¡Rm Since every subgame of an infinitely repeated game is identical to the game as a whole (thanks to its recursive structure), we have to consider only two types of subgames: (i) subgame in which all the outcomes of earlier stages have been (C1, C2), and (ii) subgames in which the outcome of at least one earlier stage differs from (C1, C2). By unimprovability, it is sufficient to show that there is no one-shot profitable deviation in every possible history that can realize when players follow the trigger strategies. Players have no incentive to deviate in (ii) since trigger strategy involves repeated play of one shot NE, (D1, D2). 12 / 36
  13. 13. Repeated Prisoner’s Dilemma (3) The following condition guarantees that there will be no (one-shot) profitable deviation in (i). 2 + δ × 2 + δ2 × 2 + · · · ≥ 3 + δ × 0 + δ2 × 0 + · · · ⇐⇒ 2(δ + δ2 + · · · ) ≥ 1 ⇐⇒ 2δ 1 − δ ≥ 1 ⇐⇒ δ ≥ 1 3 . Mutual cooperation (C1, C2) can be sustained as an SPNE outcome by using the trigger strategy when players are long-sighted. Trigger strategy (in repeated prisoner’s dilemma) is the severest punishment, since each player receives her minmax payoff (in every period) after deviation happens. 13 / 36
  14. 14. Folk Theorem: Preparation (1) £ ¢   ¡Rm The following expositions are Fudenberg and Maskin (1986). For each j, choose Mj = (Mj 1 , . . . , Mj n) so that (Mj 1 , . . . , Mj j−1, Mj j+1, . . . , Mj n) ∈ arg min a−j max aj uj(aj, a−j), and player j’s reservation value is defined by v∗ j := max aj ui(aj, Mj −j) = ui(Mj ). The strategies Mj = (Mj 1 , . . . , Mj j−1, Mj j+1, . . . , Mj n) are minimax strategies (which may not be unique) against player j, and v∗ j is the smallest payoff that the other players can keep player j below. We refer to (v∗ 1, . . . , v∗ n) as the minimax point. 14 / 36
  15. 15. Folk Theorem: Preparation (2) Definition 7 Let V be the set of feasible payoffs, i.e., a convex hull of payoff vectors u yielded by (pure) action profiles, and V ∗ (⊂ V ) be the set of feasible payoffs that Pareto dominate the minimax point: V ∗ = {(v1, . . . , vn) ∈ V |vi > 0 for all i}. V ∗ is called the set of individually rational payoffs. There are a couple of versions of folk theorem. The name comes from the fact that the statement (relying on NE rather than SPNE) was widely known among game theorists in the 1950s, even though no one had published it. 15 / 36
  16. 16. Folk Theorem (1) Theorem 8 (Theorem A) For any (v1, . . . , vn) ∈ V ∗ , if players discount the future sufficiently little, there exists a Nash equilibrium of the infinitely repeated game where, for all i, player i’s average payoff is vi. If a player deviates, it may not be in others’ interest to go through with the punishment of minimaxing him forever. However, Aumann and Shapley (1976) and Rubinstein (1979) showed that, when there is no discounting, the counterpart of Theorem A holds for SPNE. Theorem 9 (Theorem B) For any (v1, . . . , vn) ∈ V ∗ there exists a subgame perfect equilibrium in the infinitely repeated game with no discounting, where, for all i, player i’s expected payoff each period is vi. 16 / 36
  17. 17. Folk Theorem (2) One well-known case that admits both discounting and simple strategies is where the point to be sustained Pareto dominates the payoffs of a Nash equilibrium of the constituent game G. Theorem 10 (Theorem C) Suppose (v1, . . . , vn) ∈ V ∗ Pareto dominates the payoffs (y1, . . . , yn) of a (one-shot) Nash equilibrium (e1, . . . , en) of G. If players discount the future sufficiently little, there exists a subgame perfect equilibrium of the infinitely repeated game where, for all i, player i’s average payoff is vi. Because the punishments used in Theorem C are less severe than those in Theorems A and B, its conclusion is weaker. For example, Theorem C does not allow us to conclude that a Stackelberg outcome can be supported as an equilibrium in an infinitely repeated quantity-setting duopoly. 17 / 36
  18. 18. General Falk Theorem — Two Players Abreu (1988) shows that there is no loss in restricting attention to simple punishments when players discount the future. Indeed, simple punishments are employed in the proof of the following result. Theorem 11 (Theorem 1) For any (v1, v2) ∈ V ∗ there exists δ ∈ (0, 1) such that, for all δ ∈ (δ, 1), there exists a subgame perfect equilibrium of the infinitely repeated game in which player i’s average payoff is vi when players have discount factor δ. After a deviation by either player, the players (mutually) minimax each other for a certain number of periods, after which they return to the original path. If a further deviation occurs during the punishment phase, the phase is begun again. 18 / 36
  19. 19. General Falk Theorem — Three or More Players The method we used to establish Theorem 1 –“mutual minimaxing”– does not extend to three or more players. Theorem 12 (Theorem 2) Assume that the dimensionality of V ∗ equals n, the number of players, i.e., that the interior of V (relative to n-dimensional space) is nonempty. Then, for any (v1, . . . , vn) in V ∗ , there exists δ ∈ (0, 1) such that for all δ ∈ (δ, 1) there exists a subgame perfect equilibrium of the infinitely repeated game with discount factor δ in which player i’s average payoff is vi. If a player deviates, he is minimaxed by the other players long enough to wipe out any gain from his deviation. To induce the other players to go through with minimaxing him, they are ultimately given a “reward” in the form of an additional ε in their average payoff. The possibility of providing such a reward relies on the full dimensionality of the payoff set. 19 / 36
  20. 20. Imperfect Monitoring (1) Perfect Monitoring: Players can fully observe the history of their past play. There is no monitoring difficulty or imperfection. Bounded/Imperfect Recall: Players forget (part of) the history of their past play, especially that of distant past, as time goes by. Imperfect Monitoring: Players cannot directly observe the (full) history of their past play, but instead observe signals that depend on actions taken in the previous period. § ¦ ¤ ¥ Public Monitoring Players publicly observe a common signal. § ¦ ¤ ¥ Private Monitoring Players privately receives different signals. 20 / 36
  21. 21. Imperfect Monitoring (2) Punishment necessarily becomes indirectly linked with deviation. Players can punish the deviator only in reaction to the common signals, since they cannot observe deviation itself. Even if no one has deviated, punishment is triggered when bad signal realizes (with positive probability). ⇒ Constructing (efficient) punishment becomes dramatically difficult. 21 / 36
  22. 22. Example | Prisoner’s Dilemma (1) Consider the following Prisoner’s Dilemma as a stage game while each player cannot observe the rival’s past actions. Table: Ex ante Payoffs ui(ai, a−i) 1 2 C D C 2, 2 -1, 3 D 3, -1 0, 0 § ¦ ¤ ¥ Q Can each player deduce the rival’s action through the realized payoff (and her own action) ? If this is the case indeed, then observation cannot be imperfect... 22 / 36
  23. 23. Example | Prisoner’s Dilemma (2) Player i’s payoff in each period depends only on her own action, ai ∈ {C, D} and the public signal, y ∈ {g, b}, i.e., u∗ i (y, ai). Table: Ex post Payoffs u∗ i (y, ai) i y g b C 3 − p − 2q p − q − p + 2q p − q D 3(1 − r) q − r − 3r q − r p, q, r (0 < q, r < p < 1) are conditional probabilities that g realizes: p = Pr{g|CC}, q = Pr{g|DC} = Pr{g|CD}, r = Pr{g|DD}. 23 / 36
  24. 24. Example | Prisoner’s Dilemma (3) To achieve cooperation, consider the (modified) trigger strategies: Play (C, C) in the first period. Continue to play (C, C) as long as g keeps realized. Play (D, D) forever once b is realized. The above trigger strategies constitute an SPNE if and only if the following condition is satisfied: δ(3p − 2q) ≥ 1 ⇐⇒ δ ≥ 1 3p − 2q (7.2.4 in MS) Then, symmetric equilibrium (average) payoff becomes 2(1 − δ) 1 − δp , which converges 0 as δ goes to 1. 24 / 36
  25. 25. General Model (1) n (long-lived) players engage in an infinitely repeated game with discrete time horizon (t = 0, 1, . . . ∞) whose stage game is defined as follows: ai ∈ Ai: Player i’s action (Ai is assumed finite) y ∈ Y : Public signal realizes at the end of each period (Y is finite) ρ(y|a): Conditional probability function (assuming full-support) ρ(y|α): Extension to mixed action profile α ∈ Πn i=1∆(Ai) Πi(α−i) := ρ(·|·, α−i): |Ai| × |Y | matrix. u∗ i (y, ai): Player i’s ex post payoff ui(a): Player i’s ex ante payoff, expressed by ui(a) = y∈Y u∗ i (y, ai)ρ(y|a) (7.1.1 in MS) V (δ): Set of equilibrium (PPE, defined later) payoff under δ 25 / 36
  26. 26. General Model (2) In the repeated game (of imperfect public monitoring), the only public information available in period t is the t-period history of public signals: ht := (y0 , y1 , . . . , yt−1 ). The set of public histories is (Y 0 is empty, note h0 is not well-defined): H := ∪∞ t=0Y t A history for player i includes both the public history and the history of actions that i has taken: ht i := (y0 , a0 i ; y1 , a1 i ; . . . ; yt−1 , at−1 i ). The set of histories for player i is ((Y, Ai)0 is empty): Hi := ∪∞ t=0(Ai × Y )t 26 / 36
  27. 27. Perfect Public Equilibrium (1) A pure strategy for player i is a mapping from all possible histories into the set of pure actions, σi : Hi → Ai. A mixed strategy is a mixture over pure strategies. A behavior strategy is a mapping σi : Hi → ∆(Ai). Definition 13 (Def 7.1.1) A behavior strategy σi is public if, in every period t, it depends only on the public history ht ∈ Y t and not on i’s private history. That is, for all ht i, ˆht i ∈ Hi satisfying yτ = ˆyτ for all τ ≤ t − 1, σi(ht i) = σi(ˆht i). A behavior strategy σi is private if it is not public. 27 / 36
  28. 28. Perfect Public Equilibrium (2) Definition 14 (Def 7.1.2) Suppose Ai = Aj for all i and j. A public profile σ is strongly symmetric if, for all public histories ht , σi(ht ) = σj(ht ) for all i and j. Definition 15 (Def 7.1.3) A perfect public equilibrium (PPE) is a profile of public strategies σ that for any public history ht , specifies a Nash equilibrium for the repeated game. A PPE is strict if each player strictly prefers his equilibrium strategy to every other public strategy. Lemma 16 (Lemma 7.1.1) If all players other than i are playing a public strategy, then player i has a public strategy as a best reply. Therefore, every PPE is a sequential equilibrium. 28 / 36
  29. 29. Dynamic Programming Approach 1 Decomposition Transforming a dynamic game into a static game. In so doing, recursive structure and unimprovability play key roles. 2 Self-Generation Useful property to characterize the set of equilibrium (PPE) payoffs. Without (explicitly) solving a game, the set of equilibrium payoffs can be fully and computationally identified. 29 / 36
  30. 30. Decomposition — Perfect Monitoring A continuation payoff can be decomposed by a current period payoff and future payoffs of the repeated game starting from the next period: vi = (1 − δ)ui(a) + δγi(a) (1) where γ : A → V (δ) (⊂ Rn ) assigns an equilibrium payoff vector to each action profile and γi is i’s element (i’s assigned payoff). Theorem 17 v is supported (as an average payoff) by an SPNE if and only if there exist a mixed action profile α ∈ ∆(A) and γ : ∆(A) → V (δ) such that ∀i ∀ai ∈ Ai vi(α) = (1 − δ)ui(α) + δγi(α) ≥ (1 − δ)ui(ai, α−i) + δγi(ai, α−i) 30 / 36
  31. 31. Decomposition — Imperfect Monitoring A continuation payoff can be decomposed by a current period payoff and future payoffs of the repeated game starting from the next period: vi = (1 − δ)ui(a) + δ y∈Y γi(y)ρ(y|a) (2) where γ : Y → V (δ) (⊂ Rn ) assigns an equilibrium (PPE) payoff vector to each public signal and γi is i’s element (i’s assigned payoff). Theorem 18 v is supported (as an average payoff) by a PPE if and only if there exist a mixed action profile α ∈ ∆(A) and γ : ∆(A) → V (δ) such that ∀i ∀ai ∈ Ai vi(α) = (1 − δ)ui(α) + δ y∈Y γi(y)ρ(y|α) ≥ (1 − δ)ui(ai, α−i) + δ y∈Y γi(y)ρ(y|ai, α−i) 31 / 36
  32. 32. Self-Generation (1) What happens if the range of the mapping γ, V (δ) is replaced with an arbitrary set W(⊂ Rn ) ? Definition 19 Let B(W) be a set of vector w = (w1, . . . , wn) if there exist a mixed action profile α ∈ ∆(A) and γ : ∆(A) → W such that ∀i ∀ai ∈ Ai wi(α) = (1 − δ)ui(α) + δ y∈Y γi(y)ρ(y|α) ≥ (1 − δ)ui(ai, α−i) + δ y∈Y γi(y)ρ(y|ai, α−i) W is called self-generating (or self-enforceable) if W ⊆ B(W). 32 / 36
  33. 33. Self-Generation (2) Theorem 20 The set of average payoffs in PPE is the fixed point of mapping B(·). Theorem 21 If W ⊆ W , then B(W) ⊆ B(W ) must be satisfied. Theorem 22 If W is self-generating, then the following holds: W ⊆ ∞ t=1 Bt (W) ⊆ V (δ) (3) If W is bounded and V (δ) ⊂ W, then ∞ t=1 Bt (W) = V (δ) (4) 33 / 36
  34. 34. Folk Theorem by FLM (1994) (1) Definition 23 The profile α has individual full rank for player i if Πi(α−i) has rank equal to |Ai|, that is, the |Ai| vectors {ρ(·|ai, α−i)}ai∈Ai are linearly independent. If this is so for every player i, α has individual full rank. Note that if α has individual full rank, the number of observable outcomes |Y | must be at least maxi |Ai|. Definition 24 Profile α is pairwise-identifiable for players i and j if the rank of matrix Πij(α) equals rank Πi(α−i) + Πj(α−j) − 1. Definition 25 Profile α has pairwise full rank for players i and j if the matrix Πij(α) has rank |Ai| + |Aj| − 1. 34 / 36
  35. 35. Folk Theorem by FLM (1994) (2) Pairwise full rank on α (for players i and j) is actually the conjunction of two weaker conditions, individual full rank and pairwise-identifiablity (on α for i and j). 1 Pairwise full rank obviously implies individual full rank: incentives can be designed to induce a player to choose a given action. 2 It also ensures pairwise-identifiablity: deviations by players i and j are distinct in the sense that they induce different probability distributions over public outcomes. 3 Thus, player i’s incentives can be designed without interfering with those of player j. 35 / 36
  36. 36. Folk Theorem by FLM (1994) (3) Theorem 26 Suppose that every pure action profile a has individual full rank and either (i) for all pairs i and j, there exists a mixed action profile α that has pairwise full rank for that pair, or (ii) every pure-action, Pareto-efficient profile is pairwise-identifiable for all pairs of players, holds. Let W be a smooth subset in the interior of V ∗ . Then there exists δ < 1 such that, for all δ > δ, W ⊆ E(δ), i.e., each point in W corresponds to a perfect public equilibrium payoff with discount factor δ. The theorem applies only to interior points and so do not pertain to payoffs on the efficient frontier. This contrasts with the standard Folk Theorem for observable actions, in which efficient payoffs can be exactly attained. 36 / 36

×