Study of the behavior of different algorithms in 2*2 matrix games through round robin and evolutionary tournaments

Assignment 2 report: Study of the behavior of different algorithms
in 2*2 matrix games through round robin and evolutionary
tournaments
Submitted By: Yomna Mahmoud Ibrahim Hassan
Introduction
Through this report, I go through the analysis done to design an algorithm for playing different
2*2 matrix games. Also I analyze the behavior of the algorithm designed against other 7
algorithms. This analysis is done through two types of tournaments: round-robin and
evolutionary tournaments.
The main objective while designing the algorithm was to reach an algorithm capable of “beating”
other algorithm in different games within different tournaments. The concept of “beating” here is
that on average it gets higher payoff than others. Also the algorithm needs to be robust to
different changes, in this report for instance we discuss the effect of “prior” on the performance
of the algorithm.
In addition, I discuss the basic requirements of a successful algorithm, depending on the results.
Also I discuss which algorithms affected the results only by its existence. For example which
algorithms acted as “king makers”, giving very high payoffs to some algorithms, while
decreasing the payoffs of others.

Design
Before designing the algorithm, I ran the round robin tournament, with the acclaimed best
algorithm in prisoner’s dilemma “tit for tat” (TFT) running twice (two out of 8 algorithms are
TFT). This was in an effort to see if there is some sort of a pattern that exists within the player’s
payoffs within different games. Although these results do not confirm that it will perform well in
evolutionary tournament as well, but it gives us a vague idea on what we should consider while
designing the algorithm.
The following tables show the results of running a round- robin tournament (Payoffs shown are
average taken over 1000 rounds) on three different games: Prisoner’s dilemma, a modified
version of chicken and the stag-hunt.
Prisoner’s dilemma

TFT
TF2T

TFT
3
3

TF2T
3
3

Random
2.078
1.503

Alwa. D
0.999
0

Alwa. C Maximin
3
0.999
3
0

Winstay
1.998
0

TFT Average
3 2.25925
3 1.687875

Random
Al. D
Al. C
Maximin
Winstay
TFT

1.922
1.004
3
1.004
2.003
3

4.076
5
3
5
5
3

1.804
2.932
1.68
2.956
2.122
1.988

0.528
1
0
1
0.5
0.999

4.012
5
3
5
5
3

0.464
1
0
1
0.5
0.999

2.266
3
0
3
2.998
1.998

2.057 2.141125
1.004
2.4925
3
1.71
1.004
2.4955
2.003 2.51575
3
2.248

Alwa. C Maximin
3
3
3
3
4.524
4.476
6
6
3
3
3
3
6
6
3
3

Winstay
3.5
4
4.894
5.995
4
4
2
3.5

TFT
3
3
1.943
1.005
3
3
3.5
3

Winstay
1.993
-5
-1.01
2
-5
2
2
1.993

TFT Average
4
3.092
4 0.10075
2.998
1.4345
2.001 2.311125
4 0.00625
2.001
2.3195
2.001
2.313
4 3.126375

Modified Chicken

TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
TFT

TFT
3
3
2.104
1.005
3
3
3.5
3

TF2T
3
3
4.593
6
3
3
6
3

Random
2.09
3.545
1.972
3.42
3.568
3.454
4.922
2.054

Alwa. D
1.003
4
2.503
1
4
4
3.997
1.003

Average
2.699125
3.318125
3.376125
3.803125
3.321
3.30675
4.489875
2.694625

Stag hunt

TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
TFT

TFT
4
4
3.071
2.001
4
2.001
2.001
4

TF2T
4
4
3.456
3
4
3
3
4

Random
2.757
-0.194
2.896
2.487
-0.95
2.554
2.502
3.032

Alwa. D
1.993
-5
-1.927
2
-5
2
2
1.993

Alwa. C Maximin
4
1.993
4
-5
3.471
-1.479
3
2
4
-5
3
2
3
2
4
1.993

From the results we deduce the following:
1- “Random” algorithms makes all algorithms confused, it plays a huge role in identifying who
will win and who won’t. And although it’s not stable, its payoff is one of the highest. (Notice
that randomization here is done depending on a probability distribution, as it is based on the
Random function implemented in the .Net framework).
2- “Win stay” plays really good in the first 2 games (highest), and its performance is really well
in stag hunt as well.

3- Algorithms that played worse were the one that were too “nice” (tit for 2 tat and always
cooperate). They got exploited easily; especially by really “mean” algorithms such as always
defect.
4- Algorithms that are game dependant performed really well. For example: Maximin, Win stay.
Another point that I wanted to take into consideration is the prior. That is why I ran another
tournament, where the prior of all algorithms is that the opponent defects. The following tables
show the results in different games.

TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
TFT

TFT
1
2.997
2.023
1
2.997
1
1.998
1

TF2T Random
3.002
2.072
3
1.479
4.064
1.984
5
2.916
3
1.482
5
2.904
3
2.736
3.003
2.012

Alw. D
1
0
0.426
1
0
1
0.5
1

Alwa. C Maximin
3.002
1
3
0
4.03
0.512
5
1
3
0
5
1
3
0.5
2.002
1

Winstay
2.003
3
2.895
3
3
3
3
2.003

TFT Average
1 1.759875
2.997 2.059125
1.854
2.2235
1
2.4895
2.997
2.0595
1
2.488
1.998
2.0915
1
1.6275

Winstay
3.5
3.999
4.815
6
3.999
3.999
2
3.5

TFT
1
3.001
1.937
1
3.001
3.001
3.5
1

Winstay
1.994
4
-1.303

TFT Average
2 2.59675
3.991 1.211125
2.957 5.204125

Chicken modified

TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
TFT

TFT
1
3.001
1.862
1
3.001
3.001
3.5
1

TF2T Random
3.003
2.052
3
3.461
4.62
1.998
6
3.545
3
3.569
3
3.514
5.997
4.811
3.003
2.02

Alw. D
1
4
2.356
1
4
4
4
1

Alwa. C Maximin
3.003
3.003
3
3
4.329
2.425
6
6
3
3
3
3
5.997
5.997
3.003
3.003

Average
2.195125
3.30775
3.04275
3.818125
3.32125
3.314375
4.47525
2.191125

Stag hunt

TFT
TF2T
Random

TFT
2
3.991
2.809

TF2T Random
3.999
2.782
4
-0.293
33.48
2.986

Alw. D
2
-5
-1.633

Alwa. C Maximin
3.999
2
4
-5
3.515
-1.178

Al. D
Al. C
Maximin
Winstay
TFT

2
3.991
2
1.994
2

3
4
3
4
3.999

2.514
-0.698
2.504
2.519
3.015

2
-5
2
1.993
2

3
4
3
4
3.999

2
-5
2
1.993
2

2.001
4
2.001
4
1.994

2
3.991
2
1.994
2

2.314375
1.1605
2.313125
2.811625
2.625875

We can see that the algorithms most affected by this change were immediate retaliators. On the
other hand, game dependant algorithms still performed really well in comparison.
From this I reached the main idea of the algorithm, which will evolve over time as I run other
tournaments.

Algorithm
Win Stay Modified
The algorithm is a modified version of the “WinStay”. “WinStay” only take into account its own
previous step as a judgment. In this algorithm, I took a larger history (5 steps) of my steps into
account. The following is a simple pseudo code of the algorithm:
1. for each of my previous 5 steps
2. Check the payoff, if it was higher than average
3. Increase the vote for this action
4. End for loop
5. Take the action with highest number of votes.
The motivation behind this is that maybe by taking a larger history, I can avoid quick retaliation.
Also I take into account the game design while playing, which is important as we mentioned
before.
I ran both round robin and evolutionary tournaments on the different algorithms. In these
tournaments, I gave all the algorithms misleading prior. The following tables represent the
results in the 3 different games:
Round robin
TFT

TF2T Random

Alw. D

Alwa. C Maximin

Winstay

Winstay Average

Mod.
TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
WinStay
Mod.

3
3
2.038
1.004
3
1.004
2.003

3
3
3.83
5
3
5
5

2.038
1.581
1.786
2.708
1.62
3.036
2.84

0.999
0
0.456
1
0
1
0.5

3
3
4.102
5
3
5
5

0.999
0
0.5
1
0
1
0.5

1.998
0
1.758
3
0
3
2.998

0.999
0
0.507
1
0
1
0.5

2.004125
1.322625
1.872125
2.464
1.3275
2.505
2.417625

1.004

4

3.136

1

5

1

3

1

2.3925

Winstay
3.5
4
4.932
5.995
4
4
2
5.995

Winstay
Mod.
1.003
4
2.548
1
4
4
3.997
1

Average
2.45875
3.442875
3.479
3.820625
3.437
3.435375
4.550875
3.82875

Evolutionary
0.6
0.5
TFT
0.4

TF2T

0.3

Random
Always Defect

0.2
Always Cooperate
0.1

Maximin

0

WinStay

-0.1

0

200

400

600

800

WinStay Modified 1

1000

-0.2

Chicken modified
Round robin

TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
WinStay

TFT
3
3
2.076
1.005
3
3
3.5
1.005

TF2T Random
3
2.164
3
3.543
4.452
2.126
6
3.565
3
3.496
3
3.483
6
4.913
6
3.63

Alw. D
1.003
4
2.371
1
4
4
3.997
1

Alwa. C Maximin
3
3
3
3
4.674
4.653
6
6
3
3
3
3
6
6
6
6

Mod.

Evolutionary
0.4
0.35
TFT
0.3

TF2T

0.25

Random

0.2

Always Defect
Always Cooperate

0.15

Maximin

0.1

WinStay
0.05
WinStay Modified
0
0

200

400

600

800

1000

Stag hunt
Round robin

TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
WinStay
Mod.

TFT
4
4
2.977
2.001
4
2.001
2.001
2.001

TF2T Random
4
2.786
4
-0.797
3.436
3.072
3
2.443
4
-0.491
3
2.51
3
2.465
3

2.426

Alw. D
1.993
-5
-1.227
2
-5
2
2

Alwa. C Maximin
4
1.993
4
-5
3.506
-1.234
3
2
4
-5
3
2
3
2

2

Evolutionary

3

2

Winstay
1.993
-5
-1.262
2
-5
2
2

Winstay
Mod. Average
1.993 2.84475
-5 -1.09963
-1.794 0.93425
2
2.3055
-5 -1.06138
2 2.313875
2 2.30825

2

2 2.303375

0.6
0.5
TFT
0.4

TF2T

0.3

Random
Always Defect

0.2
Always Cooperate
0.1

Maximin

0

WinStay

-0.1

0

200

400

600

800

WinStay Modified

1000

-0.2

From the results we can see that our algorithm didn’t perform well, even in self play, in both
prisoner’s dilemma and chicken. On the other hand it performed well in stag hunt.

Win Stay Modified 2
As we can notice in the previous simulation, the algorithms that performed well were the “nice”
algorithms (the one that never start with a defection). The following tables show the result with a
new modification of the algorithm. I added to it the condition of never being the one to defect.
Round robin

TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
WinStay
Mod2.

TFT
3
3
1.965
1.004
3
1.004
3
3

TF2T Random
3
2.091
3
2.096
2.06
1.924
1.008
2.968
3
1.332
1.008
3.028
3
2.382
3

3.073

Alw. D
0.999
0.998
0.502
1
0
1
0.5
0.997

Alwa. C Maximin
3
0.999
3
0.998
4.132
0.526
5
1
3
0
5
1
3
0.5
3

0.997

Winstay
3
3
1.95
3
3
3
3

Winstay
Mod. 2 Average
3 2.386125
3
2.3865
0.455 1.68925
1.012
1.999
3
2.0415
1.012
2.0065
3 2.29775

3

3 2.508375

Evolutionary
0.3
0.25

TFT
TF2T

0.2

Random
0.15

Always Defect
Always Cooperate

0.1
Maximin
WinStay

0.05

WinStay Modified 2
0
0

200

400

600

800

1000

Chicken modified
Round robin

TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
WinStay
Mod2.

TFT
3
3
2.016
1.005
3
3
3.5
1.005

TF2T Random
3
1.996
3
2.252
1.846
1.983
1.01
3.775
3
3.474
3
3.525
4.001
4.921
1.01

2.715

Alw. D
1.003
1.006
2.386
1
4
4
3.997

Alwa. C Maximin
3
3
3
3
4.569
4.71
6
6
3
3
3
3
6
6

1

Evolutionary

6

6

Winstay
3.5
3.999
4.844
5.995
4
4
2
5.995

Winstay
Mod. 2
1.003
1.006
2.599
1
4
4
3.997

Average
2.43775
2.532875
3.119125
3.223125
3.43425
3.440625
4.302

1 3.090625

0.4
0.35
TFT
0.3
TF2T
0.25

Random

0.2

Always Defect
Always Cooperate

0.15

Maximin

0.1

WinStay
0.05
WinStay Modified 2
0
0

200

400

600

800

1000

Stag hunt
Round robin

TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
WinStay
Mod2.

TFT
4
4
2.769
2.001
4
2.001
4
4

TF2T Random
4
2.943
4
2.708
2.842
3.042
2.002
2.544
4
0.184
2.002
2.537
4
2.477
4

2.526

Alw. D
1.993
1.986
-1.332
2
-5
2
1.993

Alwa. C Maximin
4
1.993
4
1.986
3.482
-1.276
3
2
4
-5
3
2
4
1.993

1.979

Evolutionary

4

1.979

Winstay
4
4
-1.377
2.001
4
2.001
4

Winstay
Mod. 2
4
4
-0.829
2.003
4
2.003
4

Average
3.366125
3.335
0.915125
2.193875
1.273
2.193
3.307875

4

4

3.3105

0.3
0.25

TFT
TF2T

0.2

Random
0.15

Always Defect
Always Cooperate

0.1

Maximin
0.05

WinStay
Winstay Modified 2

0
0

200

400

600

800

1000

-0.05

From the results, we can notice the following: On average, the algorithm performed well. The
only exception was in the case of the game “chicken”, especially in self play, although its
average with other players is not as bad as in self play. After running the simulation several
times, and following up on the actions taken, I deduced that one of the reasons that it doesn’t
perform well in self play, is that both agents try to play the “cooperate”, which in the case of our
game “modified chicken”, doesn’t give the best possible payoff for either players.

Win Stay Modified 3
As a result of these simulations, I came to the idea of enhancing the algorithm with a simple
version of “fictitious play”, where the algorithm tries to model other players based on the history.
This is a pseudo code representation of the algorithm:
1. for each previous step in history (5 previous steps)
2. if the step payoff is higher than average
3. Add a vote to it, then add a vote to the action the other player took at this time
4. end of for loop
5. Take the action with the highest votes based on my side and other player side
The results of the simulations that included the latest modified version of Win Stay are shown in
the following table and figure:

Round robin

TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
WinStay
Mod 3.

TFT
3
3
1.799
1.004
3
1.004
3
3

TF2T Random
3
1.843
3
1.864
2.024
1.98
1.008
2.776
3
1.671
1.008
3.076
3
2.167
3

Alw. D
0.999
0.998
0.466
1
0
1
0.5

2.493

Alwa. C Maximin
3
0.999
3
0.998
4.046
0.507
5
1
3
0
5
1
3
0.5

0.997

3

Winstay
3
3
2.523
3
3
3
3

0.997

Winstay
Mod. 3
3
3
0.547
1.012
3
1.012
3

3

Average
2.355125
2.3575
1.7365
1.975
2.083875
2.0125
2.270875

3 2.435875

Evolutionary
0.3

0.25
TFT
0.2

TF2T
Random

0.15

Always Defect
Always Cooperate

0.1

Maximin
WinStay

0.05

Winstay Modified 3

0
0

200

400

600

800

1000

Chicken modified
Round robin

TFT

TFT
3

TF2T Random
3
2.086

Alw. D
1.003

Alwa. C Maximin
3
3

Winstay
3.5

Winstay
Mod. 3 Average
1.003
2.449

TF2T
Random
Al. D
Al. C
Maximin
Winstay
WinStay
Mod 3.

3
1.98
1.005
3
3
3.5

3
1.948
1.01
3
3
4.001

2.292
1.829
3.465
3.525
3.55
4.878

1.006
2.746
1
4
4
3.997

3
4.368
6
3
3
6

3
4.557
6
3
3
6

3.999
4.941
5.995
4
4
2

1.005

1.01

3.525

1

6

6

1.006
2.581
1
4
4
3.997

5.995

2.537875
2.796
3.184375
3.440625
3.44375
4.296625

1 3.191875

Evolutionary
0.3

0.25
TFT
0.2

TF2T
Random

0.15

Always Defect
Always Cooperate

0.1

Maximin
WinStay

0.05

Winstay Modified 3

0
0

200

400

600

800

1000

Stag hunt
Round robin

TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay

TFT
4
4
2.865
2.001
4
2.001
4

TF2T Random
4
2.914
4
2.874
2.94
3.03
2.002
2.451
4
-0.644
2.002
2.482
4
2.406

Alw. D
1.993
1.986
-1.073
2
-5
2
1.993

Alwa. C Maximin
4
1.993
4
1.986
3.552
-1.64
3
2
4
-5
3
2
4
1.993

Winstay
4
4
-1.016
2.001
4
2.001
4

Winstay
Mod. 3 Average
4
3.3625
4 3.35575
-0.584 1.00925
2.003 2.18225
4
1.1695
2.003 2.186125
4
3.299

WinStay
Mod 3.

4

4

2.531

1.979

4

1.979

4

4 3.311125

Evolutionary
0.3

0.25
TFT
0.2

TF2T
Random

0.15

Always Defect
Always Cooperate

0.1

Maximin
WinStay

0.05

Winstay Modified 3
0
0

200

400

600

800

1000

-0.05

We can see from the results that the algorithm out-performs the other algorithms in all games in
case of evolutionary tournaments, as well as being one of the top algorithms ( first place in
prisoner’s dilemma) in case of running a round robin tournament.
Note that if we compare the average of our algorithm in the latest round robin tournament to the
performance of the previous version of algorithm. Although it is one of the best in all games, its
average is less than that of the previous version. From this we can conclude that being the “best”
and “beating” other algorithms, doesn’t mean having the best possible performance with respect
to other players on average.

Conclusions
From the results in different situations, we can see that there are different factors affecting the
how a certain algorithm may perform, this includes:
1- History given to different algorithms

2- Other algorithms existing in the competition pool (are they retaliatory or not?)
3- Type of tournament (we can see that performing well in one type of tournament, doesn’t mean
excelling in the other)
4- Goal of the algorithm itself (maximizing payoff or destroying others’ payoffs)
Also we can conclude that there are certain properties that if exist in an algorithm, can make it
out perform other algorithms. These properties are:
1- Having an optimistic prior: always try to not be the one who defect first, be optimistic that
others will cooperate as well.
2- Estimating my own payoff
3- Modeling of other player: if we have some information about other players (even by modeling
other algorithms through their actions), it gives the algorithm (sometimes) better reliability.

Study of the behavior of different algorithms in 2*2 matrix games through round robin and evolutionary tournaments

Recommended

Recommended

More Related Content

Similar to Study of the behavior of different algorithms in 2*2 matrix games through round robin and evolutionary tournaments

Similar to Study of the behavior of different algorithms in 2*2 matrix games through round robin and evolutionary tournaments (9)

More from Yomna Mahmoud Ibrahim Hassan

More from Yomna Mahmoud Ibrahim Hassan (20)

Recently uploaded

Recently uploaded (20)

Study of the behavior of different algorithms in 2*2 matrix games through round robin and evolutionary tournaments