1. Extortion Strategies in the Prisoner’s Dilemma
Companion poster to“Extortion on the Front Lines”
Chris Hughes
Extortion Strategies in the Prisoner’s Dilemma
Companion poster to“Extortion on the Front Lines”
Chris Hughes
Introduction
“Extortion on the Front lines” aims to provide a clear and concise
introduction to the Prisoner’s Dilemma game; in a way that can be
understood by an undergraduate mathematician.
• I discuss the key developments in the field, providing the reader
with the necessary knowledge to explore recent publications.
• I describe how the Prisoner’s Dilemma can be applied to WW1
trench warfare [1] and, using historical sources, create a model
inspired by this.
• I place particular emphasis on the derivation and applications of
zero-determinant extortion strategies [3]; as well exploring the af-
fects of modifying of a strategy’s parameters.
• I demonstrate the robustness zero-determinant extortion strate-
gies, against an unwitting adaptive opponent, through examples.
The Iterated Prisoner’s Dilemma
The Iterated Prisoner’s Dilemma (IPD) is a two-player game in which
selfish individuals attempt to maximise their respective payoffs by
balancing cooperation with competition [1].
Each round, players X and Y can choose between two available
moves: Cooperate (C) or Defect (D).
Rules of the game
• if both players choose to Cooperate (C) both players receive
R points - this is the Reward for mutual cooperation
• if both players choose to Defect (D) both players receive P
points - this is the Punishment for mutual defection
• If one player Defects while the other Cooperates, the Defect-
ing player receives T points and the Cooperating player receives
S points - one player has embraced the Temptation to defect
while the other received the Sucker’s payoff.
This can be summarised in a Payoff Matrix:
1000 1100 900 1000 1100 900
1000 900 1100 950 1100 900
Temptation 1824 2042 1606 1832 2006 1642
Reward 1607 2000 1213 1666 1767 1446
Punishment 1002 1300 900 1050 1102 902
Suckers 588 647 529 588 647 529
Temptation 1824 1606 2042 1724 2006 1642
Reward 1607 1213 2000 1467 1767 1446
Punishment 1002 900 1300 950 1102 902
Suckers 588 529 647 559 647 529
Cooperate Defect
Cooperate (R, R) (S, T)
Defect
(T, S) (P,P)
Player 1
Player 2
We can also define the payoff vectors for both players as
SX = (R, S, T, P) and SY = (R, T, S, P). (1)
The payoffs in an IPD game must be strictly ordered such that
T > R > P > S and 2R > T + S.
For interesting analysis, and to eliminate any endgame tactics, we
introduce a probability w that the game will end at the end of the
current round - placing the players in a situation in which they are
unsure if they will meet again.
Strategies in an IPD
The strategy of a player determines how the player chooses his next
move.
A memory-one strategy depends only on the previous round of the
game
For players X and Y, with respective moves x and y, the four possible
outcomes from each round of an IPD can be represented, from the per-
spective of player X, as
xy ∈ {CC, CD, DC, DD}.
We can represent player X’s memory-one strategy as p =
(pCC, pCD, pDC, pDD) such that:
pCC = P(Xn+1 = C|Xn = C, Yn = C)
pCD = P(Xn+1 = C|Xn = C, Yn = D)
pDC = P(Xn+1 = C|Xn = D, Yn = C)
pDD = P(Xn+1 = C|Xn = D, Yn = D)
We can similarly express player Y’s strategy, corresponding to the out-
comes yx ∈ {CC, CD, DC, DD}.
As we only consider memory-1 strategies, we can describe the IPD game
as a four-state Markov chain with state space {CC, CD, DC, DD}.
The stationary distribution v of this Markov chain is the probability dis-
tribution of outcomes for any given round (in the long run). This governs
the long-run expected payoffs of each player by the formulas
PX = SX · v, PY = SY · v.
Press & Dyson’s Contribution
Using undergraduate linear algebra, Press & Dyson showed [3] that the
dot product of the stationary distribution v with any vector f can be
expressed as a 4 × 4 determinant, in which:
• one column is f
• one column is entirely under the control of player X - only involving
the four probabilities that describe X’s strategy,
• one column is entirely under the control of player Y.
Key Observation
One player, by choosing a strategy to ensure the column they control is
proportional to f, can independently force the dot product of v with
f to be zero.
If a player is able to select a strategy, for some α, β, γ ∈ R, which
satisfies
˜p = (p1 − 1, p2 − 1, p3, p4) = αSX + βSY + γ1,
then regardless his opponent’s strategy, the following linear re-
lation will be enforced between the expected payoffs of the players:
αPX + βPY + γ = 0.
The strategy ˜p is a zero-determinant (ZD) strategy.
Applications of Zero-determinant Strategies
The applications detailed in the project are:
• Unilaterally setting an opponent’s score - fixing an opponent’s long term payoff to some value between the
P and R payoffs, regardless of their strategy
• Extorting an opponent - one player enforces a relation resulting in himself gaining a greater payoff than his
opponent
• Extorting an adapting player - enforcing an unfair relation against a player who adapts his strategy
It is also demonstrated why a player is unable to set his own payoff using a ZD strategy.
When considering only two players, with one extorting the other, ZD strategies are extremely robust. It can
be proved [2], that when facing an unwitting adapting player, a player using a ZD extortion strategy will always
receive his maximum long term score, regardless of how his opponent adapts. This is demonstrated through
several examples. I also demonstrate that it is impossible for both players to extort the other simultaneously.
1000
1100
1200
1300
1400
1500
1600
1700
1800
1
101
201
301
401
501
601
701
801
901
1001
1101
1201
1301
1401
1501
1601
1701
1801
1901
2001
2101
2201
Expected payoff per round
Steps
P1 Max P1 Score-1 P1 Score-2 P1 Score-3 P1 Score-4
P2 Max P2 Score-1 P2 Score-2 P2 Score-3 P2 Score-4
Figure 1: Paths taken by an adapting player (P2) and his extorter (P1) in four different instances, both players arriving at the maximum
scores in each case
Acknowledgements
I would like to thank my supervisor Dr. Gustav W. Delius for his enthusiasm, guidance and dedication; providing
excellent supervision over the course of this project. I would also like to thank Rebecca Nelson for inspiring and
motivating me to work to the best of my ability.
References
[1] Axelrod R. The Evolution of Cooperation. New York: Basic Books; 1984
[2] Chen J.; Zinger A. The Robustness of Zero-Determinant Strategies in Iterated Prisoner’s Dilemma Games.
Journal of Theoretical Biology Vol. 357; 2014. pp. 46-54
[3] Press W.; Dyson F. Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent.
Proceedings of the National Academy of Sciences 109. 2012.
[4] Taylor J.G. Lanchester Models of Warfare 1983, vol. 1.
This work uses non-conventional values (T, R, P, S) = (1824, 1607, 1002, 588) for
both players. These were attained through Lanchester combat modelling [4].
Using results from [3], we are able to consider only memory-one strategies during our analysis on
zero-determinant extortion strategies, without loss of generality. This is fully explained in the project.
Here, the adapting
player’s strategy was
simulated using the
Gradient Descent
method. The appli-
cation of this method
to the example is fully
detailed in the project.