Lecture Overview
Solving the prisoner’s dilemma
Instrumental rationality
Morality & norms
Repeated games
Three ways to solve the prisoner’s dilemma
Sequential games
Backward induction
Subgame perfect Nash equilibrium
Common knowledge of rationality
Mixed strategies
Game theory: underlying assumptions
Remember:
Homo economicus: instrumental rationality and preferences
Common knowledge of rationality and consistent alignment of believes: given the same information individuals arrive at the same decisions
Individuals know the rules of the game which are exogenously given and independent of individuals’ choices
We will look at these one by one, analysing alternative assumptions.
We will use the prisoner’s dilemma as example.
Why?
Coordination game with conflict
Arguably it describes many social situations, e.g. the free rider problem:
Voting
Trade union affiliation
Wage cuts to increase profit
Domestic work
Prisoner’s dilemma
The homo economicus maximises his/her utility.
In a prisoner’s dilemma the dominant strategy is to confess (defect).
Fallacy of compositions: what is individually rational is neither Pareto optimal not socially rational.
But do people really defect?
Kant’s categorical imperative: not the outcome but the act is crucial (morality)
Altruism: blood donation
Social norms: forest people hunting in the Congo (Turnbull 1963)
Instrumental rationality
Gauthier: it is instrumentally rational to cooperate rather than to defect
Assume there are two sorts of maximisers in the economy: straight maximisers (SM) and constrained maximisers (CM); SMs defect, CMs cooperate with other CMs:
E(return from CM) = p*(-1)+(1-p)*(-3)
E(return from SM) = -3
For any p>0 the CM
strategy is better than
the SM one.
Instrumental rationality
Tit-for-tat
Unsurprisingly (maybe), in a repeated Prisoner’s dilemma the best strategy is not to defect but to adopt a tit-for-tat strategy.
In the 1980s, Robert Axelrod invited professional game theorists to enter strategies into a tournament of a repeated game (200 times).
The winning strategy was tit-for-tat entered by Anatol Rapaport:
Start off with cooperation
If opponent defects punish him/her by defecting
If opponent comes back to cooperation ‘forgive’ them and go back to cooperation
Overall, forgiving and cooperative strategies did better.
Repeated games & reputation
A tit-for-tat strategy can only be played in repeated games.
The folk theorem states that in an infinitely repeated game (or given uncertainty to the end of the game) any strategy with a feasible payoff can be an equilibrium.
This is important for social interaction: the prisoner’s dilemma can be overcome without (!) external authority.
Players enforce compliance (cooperate rather than defect) through punishment.
The loss of future returns deters players from defecting.
The surprising thing about Axelrod’s tournament was that the tit-for-tat strategy won in a finite (and defined) repeated game ...
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
Solving Prisoner's Dilemma Using Game Theory
1. Lecture Overview
Solving the prisoner’s dilemma
Instrumental rationality
Morality & norms
Repeated games
Three ways to solve the prisoner’s dilemma
Sequential games
Backward induction
Subgame perfect Nash equilibrium
Common knowledge of rationality
Mixed strategies
Game theory: underlying assumptions
Remember:
Homo economicus: instrumental rationality and preferences
Common knowledge of rationality and consistent alignment of
believes: given the same information individuals arrive at the
same decisions
Individuals know the rules of the game which are exogenously
given and independent of individuals’ choices
We will look at these one by one, analysing alternative
assumptions.
2. We will use the prisoner’s dilemma as example.
Why?
Coordination game with conflict
Arguably it describes many social situations, e.g. the free rider
problem:
Voting
Trade union affiliation
Wage cuts to increase profit
Domestic work
Prisoner’s dilemma
The homo economicus maximises his/her utility.
In a prisoner’s dilemma the dominant strategy is to confess
(defect).
Fallacy of compositions: what is individually rational is neither
Pareto optimal not socially rational.
But do people really defect?
Kant’s categorical imperative: not the outcome but the act is
crucial (morality)
Altruism: blood donation
Social norms: forest people hunting in the Congo (Turnbull
1963)
Instrumental rationality
3. Gauthier: it is instrumentally rational to cooperate rather than to
defect
Assume there are two sorts of maximisers in the economy:
straight maximisers (SM) and constrained maximisers (CM);
SMs defect, CMs cooperate with other CMs:
E(return from CM) = p*(-1)+(1-p)*(-3)
E(return from SM) = -3
For any p>0 the CM
strategy is better than
the SM one.
Instrumental rationality
Tit-for-tat
Unsurprisingly (maybe), in a repeated Prisoner’s dilemma the
best strategy is not to defect but to adopt a tit-for-tat strategy.
In the 1980s, Robert Axelrod invited professional game
theorists to enter strategies into a tournament of a repeated
4. game (200 times).
The winning strategy was tit-for-tat entered by Anatol
Rapaport:
Start off with cooperation
If opponent defects punish him/her by defecting
If opponent comes back to cooperation ‘forgive’ them and go
back to cooperation
Overall, forgiving and cooperative strategies did better.
Repeated games & reputation
A tit-for-tat strategy can only be played in repeated games.
The folk theorem states that in an infinitely repeated game (or
given uncertainty to the end of the game) any strategy with a
feasible payoff can be an equilibrium.
This is important for social interaction: the prisoner’s dilemma
can be overcome without (!) external authority.
Players enforce compliance (cooperate rather than defect)
through punishment.
The loss of future returns deters players from defecting.
The surprising thing about Axelrod’s tournament was that the
tit-for-tat strategy won in a finite (and defined) repeated game.
5. Solution
s to the prisoner’s dilemma
Authority (state, mafia etc.) imposes cooperation by changing
the payoff matrix (e.g. binding contracting).
Individuals are motivated by morality/norms (but: if some
individuals in society are motivated by morality it becomes
rational – according to Gauthier – to cooperate)
In repeated games with uncertainty about the end of the game
cooperation can be policed by the players through punishment
(e.g. defecting in future round). This insight is backed by the
Folk Theorem.
Remember this game?
This is the matrix form of a simultaneous game.
Are there any equilibria?
Yes, player A has a dominant strategy: ‘Top’.
6. The equilibrium in dominant strategy is (Top, Right).
This is simultaneously a Nash equilibrium.
Dynamic games & backward induction
LeftRightTop10, 41, 5Bottom9, 90, 3
Now imagine the game is played sequentially: A moves first.
To solve this game, we can use backward induction.
A: What will B do once I’ve played Top/Bottom?
Dynamic games & backward induction
7. 10, 4
0, 3
9, 9
1, 5
Dynamic games & backward induction
Let’s play a game to demonstrate backward induction:
The race to 20
Your aims is to get to 20 first!
You can either pick up one pen or two pens.
Whoever gets the last pen wins!
Who will win?
Whoever gets to 2 first (the first player if she/he is aware of
backward induction).
8. From 2 you can get to 5, 8, 11, 14, 17, 20.
Dynamic games & backward induction
Backward induction helps us to identify ‘subgame perfect Nash
equilibria’.
A subgame is a segment of a dynamic game, if:
The subgame starts from a node
It branches out to the successor node
It must end at the pay-offs of the last node
The initial node must be a singleton (i.e. it is clear what the
previous move was).
While (Top, Right) was an equilibrium in dominant strategies, it
is not a subgame perfect Nash equilibrium (SPNE).
For backward induction we ask ‘what will player B do it A
plays strategy x, y, z.
But if it is not rational for player A to play bottom, why should
B’s reaction to a non-rational strategy be rational?
9. This is a game of imperfect information.
Common knowledge of rationality is not assumed. Therefore,
you can build a reputation for being rational/irrational, play
cooperative/defect etc.
Common knowledge of rationality
R
R
R
C
C
C
110, 101
101
102
10. 4
1
3
100
102
99
1
0
0
2
A reputation as ‘irrational’ individual might increase your pay-
off.
But your opponent might know that. He/she might suspect that
you’re just playing irrational.
If a player does not have perfect information about her
opponents strategy choices she will play a game of mixed
strategies (rather than pure strategy).
Assume C plays left with probability c and right with
probability (1-c). R plays top with probability r and
bottom with probability (1-r).
Common knowledge of rationality
11. For R pay-offs are:
For C:
Row will be happy with his assigned probabilities if a small
change in r does not change his pay-off:
Same is true for Column.
Mixed strategies
12. Mixed strategies
These are best response curves.
The intersection shows a Nash equilibrium.
There are three ways how the prisoner’s dilemma can be
resolved:
enforcement by authority,
non-instrumental rationality and
the folk theorem.
In dynamic games we use backward induction to find subgame
perfect Nash equilibria.
13. If we drop common knowledge of rationality pay-off
change/might increase (reputation).
Given incomplete information we can play mixed strategies.
Summary
Varian (2006): chapters 24, 28 and 29.
Hargreaves Heap & Varoufakis (1995): Game theory: A critical
introduction, chapters 3, 5 and 6.
Required readings