Powerful Google developer tools for immediate impact! (2023-24 C)
Study of the behavior of different algorithms in 2*2 matrix games through round robin and evolutionary tournaments
1. Assignment 2 report: Study of the behavior of different algorithms
in 2*2 matrix games through round robin and evolutionary
tournaments
Submitted By: Yomna Mahmoud Ibrahim Hassan
Introduction
Through this report, I go through the analysis done to design an algorithm for playing different
2*2 matrix games. Also I analyze the behavior of the algorithm designed against other 7
algorithms. This analysis is done through two types of tournaments: round-robin and
evolutionary tournaments.
The main objective while designing the algorithm was to reach an algorithm capable of “beating”
other algorithm in different games within different tournaments. The concept of “beating” here is
that on average it gets higher payoff than others. Also the algorithm needs to be robust to
different changes, in this report for instance we discuss the effect of “prior” on the performance
of the algorithm.
In addition, I discuss the basic requirements of a successful algorithm, depending on the results.
Also I discuss which algorithms affected the results only by its existence. For example which
algorithms acted as “king makers”, giving very high payoffs to some algorithms, while
decreasing the payoffs of others.
Design
Before designing the algorithm, I ran the round robin tournament, with the acclaimed best
algorithm in prisoner’s dilemma “tit for tat” (TFT) running twice (two out of 8 algorithms are
TFT). This was in an effort to see if there is some sort of a pattern that exists within the player’s
payoffs within different games. Although these results do not confirm that it will perform well in
evolutionary tournament as well, but it gives us a vague idea on what we should consider while
designing the algorithm.
The following tables show the results of running a round- robin tournament (Payoffs shown are
average taken over 1000 rounds) on three different games: Prisoner’s dilemma, a modified
version of chicken and the stag-hunt.
Prisoner’s dilemma
TFT
TF2T
TFT
3
3
TF2T
3
3
Random
2.078
1.503
Alwa. D
0.999
0
Alwa. C Maximin
3
0.999
3
0
Winstay
1.998
0
TFT Average
3 2.25925
3 1.687875
2. Random
Al. D
Al. C
Maximin
Winstay
TFT
1.922
1.004
3
1.004
2.003
3
4.076
5
3
5
5
3
1.804
2.932
1.68
2.956
2.122
1.988
0.528
1
0
1
0.5
0.999
4.012
5
3
5
5
3
0.464
1
0
1
0.5
0.999
2.266
3
0
3
2.998
1.998
2.057 2.141125
1.004
2.4925
3
1.71
1.004
2.4955
2.003 2.51575
3
2.248
Alwa. C Maximin
3
3
3
3
4.524
4.476
6
6
3
3
3
3
6
6
3
3
Winstay
3.5
4
4.894
5.995
4
4
2
3.5
TFT
3
3
1.943
1.005
3
3
3.5
3
Winstay
1.993
-5
-1.01
2
-5
2
2
1.993
TFT Average
4
3.092
4 0.10075
2.998
1.4345
2.001 2.311125
4 0.00625
2.001
2.3195
2.001
2.313
4 3.126375
Modified Chicken
TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
TFT
TFT
3
3
2.104
1.005
3
3
3.5
3
TF2T
3
3
4.593
6
3
3
6
3
Random
2.09
3.545
1.972
3.42
3.568
3.454
4.922
2.054
Alwa. D
1.003
4
2.503
1
4
4
3.997
1.003
Average
2.699125
3.318125
3.376125
3.803125
3.321
3.30675
4.489875
2.694625
Stag hunt
TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
TFT
TFT
4
4
3.071
2.001
4
2.001
2.001
4
TF2T
4
4
3.456
3
4
3
3
4
Random
2.757
-0.194
2.896
2.487
-0.95
2.554
2.502
3.032
Alwa. D
1.993
-5
-1.927
2
-5
2
2
1.993
Alwa. C Maximin
4
1.993
4
-5
3.471
-1.479
3
2
4
-5
3
2
3
2
4
1.993
From the results we deduce the following:
1- “Random” algorithms makes all algorithms confused, it plays a huge role in identifying who
will win and who won’t. And although it’s not stable, its payoff is one of the highest. (Notice
that randomization here is done depending on a probability distribution, as it is based on the
Random function implemented in the .Net framework).
2- “Win stay” plays really good in the first 2 games (highest), and its performance is really well
in stag hunt as well.
3. 3- Algorithms that played worse were the one that were too “nice” (tit for 2 tat and always
cooperate). They got exploited easily; especially by really “mean” algorithms such as always
defect.
4- Algorithms that are game dependant performed really well. For example: Maximin, Win stay.
Another point that I wanted to take into consideration is the prior. That is why I ran another
tournament, where the prior of all algorithms is that the opponent defects. The following tables
show the results in different games.
Prisoner’s dilemma
TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
TFT
TFT
1
2.997
2.023
1
2.997
1
1.998
1
TF2T Random
3.002
2.072
3
1.479
4.064
1.984
5
2.916
3
1.482
5
2.904
3
2.736
3.003
2.012
Alw. D
1
0
0.426
1
0
1
0.5
1
Alwa. C Maximin
3.002
1
3
0
4.03
0.512
5
1
3
0
5
1
3
0.5
2.002
1
Winstay
2.003
3
2.895
3
3
3
3
2.003
TFT Average
1 1.759875
2.997 2.059125
1.854
2.2235
1
2.4895
2.997
2.0595
1
2.488
1.998
2.0915
1
1.6275
Winstay
3.5
3.999
4.815
6
3.999
3.999
2
3.5
TFT
1
3.001
1.937
1
3.001
3.001
3.5
1
Winstay
1.994
4
-1.303
TFT Average
2 2.59675
3.991 1.211125
2.957 5.204125
Chicken modified
TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
TFT
TFT
1
3.001
1.862
1
3.001
3.001
3.5
1
TF2T Random
3.003
2.052
3
3.461
4.62
1.998
6
3.545
3
3.569
3
3.514
5.997
4.811
3.003
2.02
Alw. D
1
4
2.356
1
4
4
4
1
Alwa. C Maximin
3.003
3.003
3
3
4.329
2.425
6
6
3
3
3
3
5.997
5.997
3.003
3.003
Average
2.195125
3.30775
3.04275
3.818125
3.32125
3.314375
4.47525
2.191125
Stag hunt
TFT
TF2T
Random
TFT
2
3.991
2.809
TF2T Random
3.999
2.782
4
-0.293
33.48
2.986
Alw. D
2
-5
-1.633
Alwa. C Maximin
3.999
2
4
-5
3.515
-1.178
4. Al. D
Al. C
Maximin
Winstay
TFT
2
3.991
2
1.994
2
3
4
3
4
3.999
2.514
-0.698
2.504
2.519
3.015
2
-5
2
1.993
2
3
4
3
4
3.999
2
-5
2
1.993
2
2.001
4
2.001
4
1.994
2
3.991
2
1.994
2
2.314375
1.1605
2.313125
2.811625
2.625875
We can see that the algorithms most affected by this change were immediate retaliators. On the
other hand, game dependant algorithms still performed really well in comparison.
From this I reached the main idea of the algorithm, which will evolve over time as I run other
tournaments.
Algorithm
Win Stay Modified
The algorithm is a modified version of the “WinStay”. “WinStay” only take into account its own
previous step as a judgment. In this algorithm, I took a larger history (5 steps) of my steps into
account. The following is a simple pseudo code of the algorithm:
1. for each of my previous 5 steps
2. Check the payoff, if it was higher than average
3. Increase the vote for this action
4. End for loop
5. Take the action with highest number of votes.
The motivation behind this is that maybe by taking a larger history, I can avoid quick retaliation.
Also I take into account the game design while playing, which is important as we mentioned
before.
I ran both round robin and evolutionary tournaments on the different algorithms. In these
tournaments, I gave all the algorithms misleading prior. The following tables represent the
results in the 3 different games:
Prisoner’s dilemma
Round robin
TFT
TF2T Random
Alw. D
Alwa. C Maximin
Winstay
Winstay Average
7. 0.6
0.5
TFT
0.4
TF2T
0.3
Random
Always Defect
0.2
Always Cooperate
0.1
Maximin
0
WinStay
-0.1
0
200
400
600
800
WinStay Modified
1000
-0.2
From the results we can see that our algorithm didn’t perform well, even in self play, in both
prisoner’s dilemma and chicken. On the other hand it performed well in stag hunt.
Win Stay Modified 2
As we can notice in the previous simulation, the algorithms that performed well were the “nice”
algorithms (the one that never start with a defection). The following tables show the result with a
new modification of the algorithm. I added to it the condition of never being the one to defect.
Prisoner’s dilemma
Round robin
TFT
TF2T
Random
Al. D
Al. C
Maximin
Winstay
WinStay
Mod2.
TFT
3
3
1.965
1.004
3
1.004
3
3
TF2T Random
3
2.091
3
2.096
2.06
1.924
1.008
2.968
3
1.332
1.008
3.028
3
2.382
3
3.073
Alw. D
0.999
0.998
0.502
1
0
1
0.5
0.997
Alwa. C Maximin
3
0.999
3
0.998
4.132
0.526
5
1
3
0
5
1
3
0.5
3
0.997
Winstay
3
3
1.95
3
3
3
3
Winstay
Mod. 2 Average
3 2.386125
3
2.3865
0.455 1.68925
1.012
1.999
3
2.0415
1.012
2.0065
3 2.29775
3
3 2.508375
10. 0.3
0.25
TFT
TF2T
0.2
Random
0.15
Always Defect
Always Cooperate
0.1
Maximin
0.05
WinStay
Winstay Modified 2
0
0
200
400
600
800
1000
-0.05
From the results, we can notice the following: On average, the algorithm performed well. The
only exception was in the case of the game “chicken”, especially in self play, although its
average with other players is not as bad as in self play. After running the simulation several
times, and following up on the actions taken, I deduced that one of the reasons that it doesn’t
perform well in self play, is that both agents try to play the “cooperate”, which in the case of our
game “modified chicken”, doesn’t give the best possible payoff for either players.
Win Stay Modified 3
As a result of these simulations, I came to the idea of enhancing the algorithm with a simple
version of “fictitious play”, where the algorithm tries to model other players based on the history.
This is a pseudo code representation of the algorithm:
1. for each previous step in history (5 previous steps)
2. if the step payoff is higher than average
3. Add a vote to it, then add a vote to the action the other player took at this time
4. end of for loop
5. Take the action with the highest votes based on my side and other player side
The results of the simulations that included the latest modified version of Win Stay are shown in
the following table and figure:
13. WinStay
Mod 3.
4
4
2.531
1.979
4
1.979
4
4 3.311125
Evolutionary
0.3
0.25
TFT
0.2
TF2T
Random
0.15
Always Defect
Always Cooperate
0.1
Maximin
WinStay
0.05
Winstay Modified 3
0
0
200
400
600
800
1000
-0.05
We can see from the results that the algorithm out-performs the other algorithms in all games in
case of evolutionary tournaments, as well as being one of the top algorithms ( first place in
prisoner’s dilemma) in case of running a round robin tournament.
Note that if we compare the average of our algorithm in the latest round robin tournament to the
performance of the previous version of algorithm. Although it is one of the best in all games, its
average is less than that of the previous version. From this we can conclude that being the “best”
and “beating” other algorithms, doesn’t mean having the best possible performance with respect
to other players on average.
Conclusions
From the results in different situations, we can see that there are different factors affecting the
how a certain algorithm may perform, this includes:
1- History given to different algorithms
14. 2- Other algorithms existing in the competition pool (are they retaliatory or not?)
3- Type of tournament (we can see that performing well in one type of tournament, doesn’t mean
excelling in the other)
4- Goal of the algorithm itself (maximizing payoff or destroying others’ payoffs)
Also we can conclude that there are certain properties that if exist in an algorithm, can make it
out perform other algorithms. These properties are:
1- Having an optimistic prior: always try to not be the one who defect first, be optimistic that
others will cooperate as well.
2- Estimating my own payoff
3- Modeling of other player: if we have some information about other players (even by modeling
other algorithms through their actions), it gives the algorithm (sometimes) better reliability.