SlideShare a Scribd company logo
1 of 84
Download to read offline
Learning optimal line-ups for Soccer Games
Ir. Eryk Kulikowski
Thesis submitted for the degree of
Master of Science in Artificial
Intelligence, option Engineering and
Computer Science
Thesis supervisor:
Prof. dr. Jesse Davis
Assessors:
Prof. dr. ir. Dirk Roose
Ir. Jan Van Haaren
Mentors:
Ir. Aäron Verachtert
Ir. Jan Van Haaren
Academic year 2014 – 2015
c Copyright KU Leuven
Without written permission of the thesis supervisor and the author it is forbidden
to reproduce or adapt in any form or by any means any part of this publication.
Requests for obtaining the right to reproduce or utilize parts of this publication
should be addressed to the Departement Computerwetenschappen, Celestijnenlaan
200A bus 2402, B-3001 Heverlee, +32-16-327700 or by email info@cs.kuleuven.be.
A written permission of the thesis supervisor is also required to use the methods,
products, schematics and programs described in this work for industrial or commercial
use, and for submitting this publication in scientific contests.
Preface
I would like to thank my supervisor professor Jesse Davis for giving me the opportunity
to work on this thesis. I would also like to thank my mentors, Jan Van Haaren and
Aäron Verachtert for their advise, and especially Aäron Verachtert for making his
thesis and his code available as a starting point for this thesis. I would also like
to thank my wife Debby Van Dam and other family members and friends for their
encouragement and support.
Ir. Eryk Kulikowski
i
Contents
Preface i
Abstract iii
List of Figures and Tables iv
1 Introduction 1
1.1 Background and motivations . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Analysis of the Problem 3
2.1 Drone delivering packages routing problem . . . . . . . . . . . . . . 4
2.2 Solution methodologies for routing optimization problems . . . . . . 11
2.3 Chosen approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Monte Carlo Tree Search (MCTS) Approach 13
3.1 Existing work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Proposed improvements . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Comparison of the MCTS algorithms for Hattrick . . . . . . . . . . . 30
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Genetic Algorithm Approach 37
4.1 A brief introduction to Genetic Algorithm . . . . . . . . . . . . . . . 37
4.2 Proposed Mutation operators . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Proposed Crossover operators . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Comparison with the MCTS algorithms . . . . . . . . . . . . . . . . 57
4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5 Conclusion 65
A The Evaluation Experiment Results 69
Bibliography 73
ii
Abstract
This thesis describes the lineup optimization problem from a new perspective, where
the link with the combinatorial problems and the TSP problem in particular becomes
apparent. From this new perspective the graph representation of the problem
is derived and then consequently used throughout the whole thesis. That graph
representation is then used to identify the weaknesses in the existing algorithm [31]
for lineups optimization. These weaknesses are then addressed and a new variant
of Monte Carlo Tree Search, the Recursive Monte Carlo Tree Search (RMCTS)
algorithm is derived that runs on the problem in full complexity. The proposed
algorithm is described in general terms and could be used in other combinatorial
problems, e.g., the Traveling Salesman Problem.
This thesis also describes a greedy search variant, the Greedy Search Tuned
algorithm, that performs very well in the context of Hattrick lineups optimization.
This algorithm performs better than all variants of the Monte Carlo Tree Search and
is only outperformed by the genetic algorithms described in this thesis.
Furthermore, this thesis investigates the possibility of using machine learning
algorithms for improvement of the performance of the genetic algorithms. This
investigation is done using the Rechenberg’s success rule for labeling of the offspring
where two methods are derived for using these labels in the repair process. One of the
methods uses the Naive Bayes approach while the other method derives a predictive
formula for valuation of the building blocks in the repair process. These methods
are then shown to be applicable in other combinatorial problems like Traveling
Salesman Problem and job shop scheduling, possibly also beyond the repair step. As
an alternative to the use of the Rechenberg’s success rule an other algorithm, i.e.,
the Recursive Monte Carlo Tree Search proposed in this thesis, is investigated in the
context of the repair step.
iii
List of Figures and Tables
List of Figures
2.1 Vertices from the graph representation of the problem. . . . . . . . . . . 6
2.2 Equivalent solutions of the problem. . . . . . . . . . . . . . . . . . . . . 8
2.3 Example solutions showing only relevant edges. . . . . . . . . . . . . . . 8
3.1 Canadian traveler problem graph example, Bnaya et al. [6]. . . . . . . . 14
3.2 Tree representation of the problem. . . . . . . . . . . . . . . . . . . . . . 15
3.3 Equivalent tours in the tree representation. . . . . . . . . . . . . . . . . 16
3.4 Equivalent tours in the tree representation. . . . . . . . . . . . . . . . . 17
3.5 Different solutions of the problem. . . . . . . . . . . . . . . . . . . . . . 18
3.6 Outline of a Monte-Carlo Tree Search, Chaslot et al. [10]. . . . . . . . . 20
3.7 Experiments with the numeric fitness evaluation. Two types of simulation
are explored: fully random (left) and semi-random (right), Verachtert [31]. 22
3.8 Experiments with the nominal fitness evaluation. Two types of simulation
are explored: fully random (left) and semi-random (right), Verachtert [31]. 22
3.9 Initialization/expansion step. . . . . . . . . . . . . . . . . . . . . . . . . 26
3.10 Selection/simulation step. . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.11 Backpropagation step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.12 Box-plots of the results against similar opponent, simplified problem. . . 32
3.13 Box-plots of the results against strong opponent, simplified problem. . . 33
3.14 Box-plots of the results against weak opponent, simplified problem. . . . 34
4.1 Individual. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2 The mutation operator for the simplified problem. . . . . . . . . . . . . 40
4.3 The swap two players mutation operator. . . . . . . . . . . . . . . . . . 41
4.4 The change behavior mutation operator. . . . . . . . . . . . . . . . . . . 41
4.5 The change position mutation operator. . . . . . . . . . . . . . . . . . . 42
4.6 Crossover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.7 Box-plots of the results against similar opponent. . . . . . . . . . . . . . 54
4.8 Box-plots of the results against strong opponent. . . . . . . . . . . . . . 54
4.9 Box-plots of the results against weak opponent. . . . . . . . . . . . . . . 55
4.10 Box-plots of the results from the TSP experiment. . . . . . . . . . . . . 57
4.11 Box-plots of the results against similar opponent. . . . . . . . . . . . . . 59
iv
4.12 Box-plots of the results against strong opponent. . . . . . . . . . . . . . 59
4.13 Box-plots of the results against weak opponent. . . . . . . . . . . . . . . 60
4.14 Box-plot of the paired difference between GA and GST (GA-GST) from
the evaluation experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . 61
List of Tables
3.1 Experiment results against similar opponent, simplified problem. . . . . 32
3.2 Experiment results against strong opponent, simplified problem. . . . . 33
3.3 Experiment results against weak opponent, simplified problem. . . . . . 34
3.4 Average quality of the solutions generated by the tested algorithms,
simplified problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1 Exemplary edge map of the parent tours for an ERX operator,
Affenzeller et al. [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 Experiment results against similar opponent, simplified problem. . . . . 52
4.3 Experiment results against strong opponent, simplified problem. . . . . 52
4.4 Experiment results against weak opponent, simplified problem. . . . . . 53
4.5 Experiment results against similar opponent. . . . . . . . . . . . . . . . 53
4.6 Experiment results against strong opponent. . . . . . . . . . . . . . . . . 55
4.7 Experiment results against weak opponent. . . . . . . . . . . . . . . . . 55
4.8 Experiment results from the TSP experiment. . . . . . . . . . . . . . . . 57
4.9 Experiment results against similar opponent. . . . . . . . . . . . . . . . 58
4.10 Experiment results against strong opponent. . . . . . . . . . . . . . . . . 60
4.11 Experiment results against weak opponent. . . . . . . . . . . . . . . . . 60
A.1 The evaluation experiment results. . . . . . . . . . . . . . . . . . . . . . 72
v
Chapter 1
Introduction
1.1 Background and motivations
This thesis investigates algorithms for lineup optimization for soccer games in the
context of Hattrick online football management game [18]. One of the tasks in that
game is defining the lineups for the upcoming game, where choosing the right set
of players and placing them strategically on the field can influence the outcome of
the game, i.e., it can make the difference between loosing and winning. Hattrick
also allows giving individual instructions to the players on the field, which further
increases the complexity of choosing a well performing lineup.
The choice for using an online game is convenient since all possibilities for the
lineup specification are well defined within the game rules. Furthermore, an initial
work has been done in that area [31], where large amount of data was collected and
a tool was made that can predict the outcome of the game based on the information
on the opponent and the lineup defined for the own team. This allows to evaluate the
proposed lineups and thus design algorithms that learn the optimal lineups for specific
Hattrick games. However, the proposed algorithms in that context could be also used
in the context of different football management games for lineups optimization, and
possibly even in the context of real-life soccer games. Nevertheless, the evaluation of
proposed lineups remains a hard problem on itself for real soccer games and is out of
scope in this thesis.
Furthermore, lineups optimization is a combinatorial problem. As a consequence,
not only the algorithms for lineups optimization can borrow ideas from the algorithms
in the context of combinatorial problems, but also the algorithms designed for
lineups optimization could inspire new ideas for solving other combinatorial problems.
Therefore, this thesis takes a broader perspective on the lineups optimizations and
views them in the light of the combinatorial problems, where the Traveling Salesman
Problem (TSP) gets special attention in this thesis.
In the Traveling Salesman Problem [36] the goal is to find the shortest possible
route visiting all cities from the list of given cities. In this problem the distances
between each pair of cities are given and each city must be visited exactly once. The
tour then originates and ends in the same city. It is one of the classical problems in
1
1. Introduction
combinatorial optimization and it is important in theoretical computer science and
operations research.
1.2 Objectives of the thesis
As mentioned earlier, initial work has been already done in the context of lineups
optimizations for Hattrick games [31]. That work includes definition of several
variants of Monte Carlo Tree Search algorithm for lineups optimization. However,
these algorithms have a drawback of large memory requirements and as a consequence
can only be used in the simplified version of the problem of lineups optimization.
The first objective of this thesis is then to investigate the possibility of improving
the existing algorithms such that they can also be used on the problem with full
complexity.
The second objective of this thesis is to propose new algorithms for lineups
optimization. For this part, the possibility of using genetic algorithms is investigated.
The proposed algorithms should be then usable in the context of the problem in full
complexity and run in acceptable computation time.
The third objective is then to evaluate the performance of the algorithms in terms
of the quality of the obtainable solutions within an acceptable computation time.
All algorithms described in this thesis are then evaluated in the context of chosen
specific cases, where the best algorithms will be evaluated in a more realistic context
using the actual teams from the Hattrick competition.
1.3 Overview
A good understanding of the given problem is important. Therefore chapter 2
analyzes the given problem in more details and makes the link to other combinatorial
problems. This chapter also proposes a graph representation of the given problem
that will be used throughout the whole thesis. The existing work is studied in more
detail in chapter 3, where also the improvements to the existing algorithms are
proposed. The resulting new algorithms are compared with the existing algorithms
in the context of the simplified problem in that chapter. The usability of genetic
algorithm in the context of the given problem is then studied in chapter 4. This
chapter also evaluates the best algorithms in the context of the actual teams and
matches from the Hattrick competition. Chapter 5 concludes this thesis.
2
Chapter 2
Analysis of the Problem
NP-hard (Non-deterministic Polynomial-time hard) problems [35] can be solved
using different techniques. When solving an NP-hard problem, this problem is often
modeled as one of the known problems, e.g., Traveling Salesman Problem, Knapsack
Problem, or Nursing Scheduling Problem, and than it is solved using one of the
techniques known to produce good solutions for that problem.
The goal of this chapter is then to model the given problem, i.e., optimizing the
soccer lineups in the context of the Hattrick setting, in terms that correspond more
to the classical combinatorial problems. Doing that permits to gain new insights,
where the link between the given problem and the other NP-hard problems becomes
more clear. This new insights can be then used in the following chapters for designing
better algorithms, either based on Monte Carlo or Genetic Algorithm techniques.
The chosen approach for this thesis is to model the given problem as a routing
problem. The choice of how the problem is modeled has influence on the performance
of the algorithms, as resulting representations can be handled differently with
algorithms specifically designed for these representations. Also, the NP-hard problems
are related with each other and it is commonly assumed that finding an efficient
solution to one problem can lead to efficient solutions for the other [35]. Therefore,
it is a good idea to keep in mind how this type of problems are solved in different
contexts.
For example, one of the studies [4] shows how capacitated vehicle routing problems
and job shop scheduling problems are related and how insights from these methods
can improve the algorithmic performance on both types of problems (although,
that article focuses mainly on the vehicle routing problems). As it should be clear
after reading this chapter, both of these types of problems are related to the soccer
lineup optimization problem. Also the bin packing problem [34], or more in general
cutting stock problem, or more specifically knapsack problem, are related to the
given problem, etc.
Nevertheless, as it can be seen later in this text, very interesting insights can
be gained with looking at the problem of finding optimal lineups as a Traveling
Salesman Problem [36]. It is a well studied problem, especially in the context of the
genetic algorithms [1], where it is one of the classical problems. Also Monte Carlo
3
2. Analysis of the Problem
methods have been studied in this context [25, 23, 5, 30, 7]. When solving the lineups
problem in the context of genetic algorithms, very good operators can be used (as
shown later) based on the operators designed for the Traveling Salesman Problem
(TSP), where certain simplifications can be made to further improve the algorithm.
Also the Monte Carlo Tree Search technique can be improved in that context, where
the work in the context of optimal lineups [31] has a drawback with exponential use
of memory. Insights from studying the TSP problem not only reduce exponentially
the memory usage in that technique, but also improve the computation time.
Therefore, it is crucial to understand how the optimal lineups problem relates
to the above mentioned problems. In order to make that link clear, section 2.1
describes the drone delivering packages routing problem, which is a routing problem
by definition. As can be seen from that section, this problem is closely related to the
TSP and the capacitated vehicle routing problems, and thus as mentioned earlier,
to the job shop scheduling problems, etc. That section also describes how this new
defined problem relates to the optimal lineup search for soccer games. Section 2.2
builds on these findings and discusses possible solution strategies that can be chosen.
Finally, section 2.3 describes and motivates the solution strategies chosen for this
thesis.
2.1 Drone delivering packages routing problem
In order to avoid certain pitfalls, like for example Hattrick is a game and thus the
optimal lineups can be found with an algorithm designed for game playing and this
problem has nothing to do with the TSP, it seems to the author of this thesis that
it is useful to start from a new perspective and forget the lineups optimization and
Hattrick setting for a moment.
Therefore, section 2.1.1 describes a new type of problem, a specific case of a drone
delivering packages routing problem. Only then, when the new problem is defined
and the links to the other combinatorial problems are clear, section 2.1.2 explains
that by solving the newly defined problem, the problem of the optimal lineups is also
solved.
2.1.1 Problem description
Imagine the following problem. A certain company has different factories producing
ink for 3D printers. The factories are different, as they produce different types of ink
(e.g., with different physical properties of the ink, like different colors, granularity,
drying time, etc.). These factories have also different locations geographically.
Further, the same company has different printing factories where it can print
different objects. However, when using ink from different ink producing factories,
the results vary for a specific factory, i.e., ink from some factories produces better
results than from other factories in a given printer. Also, the cost of printing an
object, even when the resulting object has the same quality, can be different when
using different ink. Furthermore, the printing factories have also different locations
geographically.
4
2.1. Drone delivering packages routing problem
In this situation, any ink can be used on any printer, however, the result will
have different printing cost and can result in a different value for the printed object.
Also, some factories could be grouped by sharing the same resources. For example,
there could be a group of five printing factories of similar properties producing very
good results, but they share the maintenance team. Therefore, having all of these
five printing factories working on full capacity could result in a penalty (e.g., small
delay of delivering the products) caused by the constraints on the maintenance team.
Nevertheless, there are more printing factories then strictly necessary to meet
the demand on the printed objects. Also the ink producing factories have a larger
capacity than strictly necessary. Furthermore, all factories (ink producing factories
and the printers) are automated, and it is more efficient to make some factories
work at full capacity and keep the remaining factories idle, than to make all of the
factories work at limited capacity.
Once that the objects are printed, they need to be shipped to the customers.
More in particular, the printed objects can be shipped using different strategies. For
example, the printed objects could be delivered using drones, local postal services,
commercial package delivering services (e.g., DHL, UPS), etc. Also here we have
different geographical locations of the distribution centers (i.e., there can be several
post services distribution centers located near several printing factories, etc.), and
the printed objects need to be transported to the distribution centers.
Furthermore, some distribution centers could be to far from the printing factories
and are not considered as valid options for certain printing factories. In other words,
only a subset of all possible options are available as shipment strategies from a
specific printing factory. Also, since the distances to the local distribution centers
vary between the different printing factories, the cost of using specific distribution
strategy will be different at each factory.
Nevertheless, just as with the grouping of printing factories (e.g., shared mainte-
nance team, as described before), also the distribution strategies can have grouping
effects. For example, when one particular strategy is used often, the company could
get better deal on the shipments. It is possible that we would also have more local
effects, for example, where several printing factories are located near a particular
distribution center, using that one center could reduce the cost, etc.
We can summarize the problem described above as a logistics problem, where
we need to transport ink from ink producing factories to printing factories, and the
printed objects from printing factories to distribution centers. Note that we do not
necessarily need to wait for the objects to be printed at the factory after delivering
the ink. For example, the printing factory could have limited stock of the ink from
all possible factories and start printing once that the whole process is scheduled.
The printed object would be then ready on arrival, and the delivered ink would then
serve as replacement of the used ink.
Now, imagine that the company wants to use drones for the transporting of
the ink and the objects. Such drone can then only carry one package at a time,
either ink or a printed object. Assume that this drone uses additional fuel when
carrying a package, but when it is flying empty it is only using solar power. Also,
there is no time constraint, i.e., all scheduled deliveries can be performed in a single
5
2. Analysis of the Problem
day during the daylight. In other words, only the edges (i.e., when we represent
the problem as graph, see also later) between the ink producing factories and the
printing factories, and the edges between the printing factories and the distribution
centers are relevant to the problem. The goal is then to minimize the total cost of
the production, printing, distribution and transportation.
Furthermore, we add a constraint that each ink producing factory and each
printing factory can be visited at most once in a single tour (just like in the Traveling
Salesman Problem). This can be motivated by the requirement that we want to
spread the load over as many factories as possible. For example, the company could
heavily rely on solar power and using the power at place where it is produced is
most efficient, while storing the electrical power is quite expensive (we want to use
as much as possible the solar power when it is being produced).
It is then interesting to represent this problem with a graph. Figure 2.1 shows
the vertices (or the graph nodes) for a simple variant of the above described problem.
The red points are the printing factories. As can be seen on that figure, there are
four printing factories. Lets assume that we are interested in an optimal solution
that visits exactly three printing factories, i.e., one printing factory is not visited in
the solution. The blue points are then the ink producing factories (there are eight ink
producing factories). As any ink can be used in any printer, all edges that connect
a blue and a red point are valid. Since we visit only three printing factories, any
valid solution would also visit only three ink producing factories. The black points
represent then the distribution centers. As mentioned above, only the distribution
centers that are near the printing factory can be visited from that printing factory.
In practice, we could simply enumerate the valid edges from the red to black points.
For the purpose of this illustration, we can easily see on the figure that there are two
printing factories that have two nearby distribution centers, one printing factory that
has one nearby distribution center, and one factory that has three nearby distribution
centers. Finally, the green point represents then the drone depot.
Figure 2.1: Vertices from the graph representation of the problem.
Note that the graph representation is a weighted graph. In the given example,
for illustration purposes, the weights of the edges can be assumed to be proportional
to the distance between the points. However, the weight of the edge includes also
the cost of production or distribution, not only the cost of the drone fuel. For the
6
2.1. Drone delivering packages routing problem
edges between the red and blue points (or the blue edges on figure 2.2), the weight
would be then mainly defined by the quality of the match between the ink and the
printer, as these edges include the cost of the ink and the printing. We can assume
that the cost of the used fuel is low in comparison to the other cost, and does not
influence the total cost much. The same can be said about the edges between the
red and the black points (the red edges on figure 2.2), where the cost of the fuel is
also negligible and the weight of these edges is mainly defined by the distribution
cost. The remaining edges between the black and the blue points, or between the
green point (the drone depot) and a blue or black point (the green edges on figure
2.2) have zero cost, as the drone is flying without the load.
Figure 2.2 illustrates then several equivalent solutions that could have been found
using different methodologies. Figure 2.2(a) shows then a solution with single tour
where we start and stop the tour at the drone depot. This kind of solution could
be found with an algorithm similar to an algorithm used for the Vehicle Routing
Problem with Pickup and Delivery (VRPPD) [37]. Figure 2.2(b) shows a solution
more similar to the classical Vehicle Routing Problem (VRP) [37], where we have
three subtours from the depot, i.e., we could use three drones in that situation,
where each drone does only one subtour. Finally, figures 2.2(c) and 2.2(d) show
more classical TSP situation, where we do not have the drone depot and we make a
tour where we visit only a subset of all possible cities. This solution is then closely
related to the traveling purchaser problem, where the TSP [36] is a special case of
that problem. Since the green edges have zero cost, all of these solutions are perfectly
equivalent.
As mentioned earlier, this thesis mainly takes its inspiration from the TSP for
the algorithms, it is then interesting to take a closer look at the figures 2.2(c) and
2.2(d). In this representation there is no drone depot. We could model that, for
example, as the case where each ink factory has a drone at its disposal and the route
can originate at any of the scheduled ink factories with a drone present there, etc.
Also, many of the TSP algorithms assume complete and symmetric graphs. The
graph seems not to be symmetric at first, as we need to visit the blue point before
red, and red before black, which adds direction to the edges. However, any correct
solution that uses only the edges that are allowed (i.e., red, blue or green edges)
can be trivially transformed to a valid solution by reversing the order of the visited
points if needed. The only constraint that we need to add for the solution to be
correct is that any red point in the solution path must be directly connect to a blue
and a black point, and we cannot have two blue or two red edges that are directly
connected.
Note that this constraint does not need to be checked in the algorithms proposed
in this thesis. It is only described as explanation of the relation between the defined
problem and the pure TSP. In order to summarize, the defined problem is a TSP
problem where we only visit a subset of all possible cities (nine cities in the given
example), and where we cannot have two blue edges or two red edges that are directly
connected. It is worth to note that the added constraints, if they must be evaluated
by an algorithm, can be evaluated in polynomial time.
One remaining detail is the completeness of the graph. It is quite common to
7
2. Analysis of the Problem
(a) Single tour with depot (b) Multiple subtours with depot
(c) Single tour without depot (d) Alternative single tour
Figure 2.2: Equivalent solutions of the problem.
(a) First solution (b) Second solution
Figure 2.3: Example solutions showing only relevant edges.
assign high weights to the edges that are not allowed, so they will not be present in
the optimal solutions [36]. However, just as with the above mentioned constraint,
the proposed algorithms do not require the completeness of the graph. In fact, the
proposed algorithms only use the relevant edges (red and blue edges) as shown
on figure 2.3. Nevertheless, the green edges can be used to illustrate the relations
between the proposed algorithms and the algorithms for the TSP. Furthermore, as
8
2.1. Drone delivering packages routing problem
discussed earlier, we can also find relations to other problems as the TSP is not the
only way to model the given problem. For example, solutions as shown on figures
2.3(a) and 2.3(b) could have been found with algorithms inspired on the job shop
scheduling, nursing scheduling or knapsack problem optimization, etc.
Another aspect that is not yet discussed in sufficient detail is the definition of the
weights of the graph. We can see from the discussions before that the edge weights
are not constant and are dependent on the final path. For example, as discussed
before, we have the grouping effects of the printing factories (maintenance penalties)
and distribution centers (discount when certain strategy is used often). Also other,
not well defined factors can influence the cost of an edge. In fact, it is very difficult
if not impossible to very well estimate the cost of a single edge (except for the
green edges that have zero cost and do not contribute to the quality of the solution).
Nevertheless, the edge weight vary only to a certain extend for different paths.
As a consequence, we can not use the heuristic based on the weight of a single
edge in the algorithms since that weight is not known explicitly. Since such heuristic
is very important in many combinatorial algorithms, this is then an important aspect
for this thesis where certain heuristics will be proposed that estimate a value of an
edge without knowing its true weight.
Furthermore, the total cost of a path is also very difficult to calculate. It might be
that not all aspects of the cost function are fully known. Nevertheless, it is assumed
that at least a very good estimation of the cost can be calculated with a use of a
black-box model that takes as input the red and blue edges (green edges are not
relevant to the quality of the solution and are not evaluated by the model). This
black-box model could be then defined by experimentation and the use of machine
learning algorithms, i.e., we have a model that can predict the cost with a certain
accuracy in polynomial time complexity.
2.1.2 Relation with the Hattrick problem
The soccer lineup optimizations in the context of the Hattrick [31] can be represented
with the same graph as the drone delivering packages routing problem described
earlier. Also, all of the discussions from section 2.1 still hold for this problem. In the
context of the Hattrick, we have fourteen possible positions on the field (red points
in the graph), we have around 20 players (blue points on the graph, the number
of available players can vary between different clubs) that can be assigned (blue
edges) to the positions on the field. At each position we have also a set of possible
individual instructions (black points on the graph) that we can assign (red edges on
the graph) to the players on respective positions. The individual instructions could
be then, for example, normal behavior, defensive strategy, offensive strategy, etc.
Note that not all possible instructions are available at all positions, for example, the
goalkeeper can be assigned only the normal behavior, etc. There are at most four
different strategies possible at any position, i.e., a subset of five different strategies in
total. The goal is then to assign eleven different players to eleven different positions
on the field and to assign one instruction at each position in an optimal way that
9
2. Analysis of the Problem
maximizes the winning probability (or minimizes the losing probability). In other
words, we look for an optimal set of blue and red edges, as shown on figure 2.3.
Arguably, since the graph representation is the same for both problems, all of
the discussions from previous section could have been introduced using the original
problem and introduction of a new problem with drones delivering packages seems
unnecessary. However, as discussed earlier, introduction of the new problem helps to
avoid certain pitfalls that might lead to very poor solutions of the original problem.
Also, the drone problem allows for a natural introduction of the green edges that
greatly contribute to the understanding of the algorithms discussed in this thesis. Not
only the new algorithms contributed by this thesis become more understandable. The
graph representation and especially the green edges allow to easily see the weaknesses
of the already existing algorithm [31] for this problem, and help to illustrate the way
how to address them, as it will be shown in the next chapter. For that reason the
references to the original soccer lineup problem and the Hattrick setting are brought
to the absolute minimum in this thesis as they mainly lead to confusions on the
principles of the workings of the algorithms. Nevertheless, a very good description of
the soccer lineup problem in the context of the Hattrick game can be found in [31].
It is then a recommended reading for any interested reader.
This section focuses then only on the relation of the soccer lineups optimization
in the context of Hattrick to the earlier described drone delivering packages problem.
One important aspect in that context is then the calculation of the cost of the path
(or the fitness function in the terms of genetic algorithms). In the context of the drone
problem, it was explained that we only have the black-box model that estimates the
cost with certain accuracy. Also, as mentioned earlier, we only use the blue and red
edges for evaluation in the black-box model, as the green edges have zero cost and
do not contribute to the final solution, i.e., the green edges are not relevant for the
total cost.
In the context of the Hattrick, the black-box model is then the match outcome
prediction tool developed with Weka [17] in the context of [31] and it is integrally
reused in the context of this thesis. According to the description from [31], the
outcome for the match is predicted based on the formation of the final state and
the subcategory ratings of the opponent. It is not possible to directly request an
outcome from the Hattrick match simulator. The outcome is also nondeterministic.
Therefore, Verachtert [31] has developed a prediction system based on a collection of
data instances for previously played matches with the use of machine learning.
Two different tools where developed with two different outcome representations:
a nominal and a numeric representation. From the findings in [31], the numeric
representation results in more accurate algorithms and it is then used in this thesis.
Also, just as in the original work [31], in this thesis the goal difference is used as the
fitness function, i.e., the prediction of the outcome is transformed to one numeric
value that corresponds to the difference between the goals made by the own team
and the goals made by the opponent (positive value indicates winning the match
and negative value losing). Notice that we want then to maximize the fitness. In the
context of the drone delivering packages problem the goal was to minimize the cost.
One can be trivially transformed to the other by flipping the sign, i.e., the cost is
10
2.2. Solution methodologies for routing optimization problems
then the negative of the fitness.
One other important aspect is the estimation of the weight of a single edge. As
mentioned before, weight of a single edge is of importance for many algorithms as a
heuristic. However, since we have a black-box model for the cost function of the total
solution, it is not easy, if not impossible, to define a heuristic for the single edge. One
interesting solution for that is described in [31], where such a heuristic is used for the
estimation of the value of a pair of edges, i.e., a red and a blue edge with a common
red point in the terms of the earlier described graph representation (or a player
with assigned position and strategy in the terms of Hattrick). For that purpose the
used heuristic is the VnukStats [19]. This heuristic is designed for evaluation of a
full lineup, and as described in [31], it is a good indicator of the performance of a
lineup. VnukStats evaluates the whole lineup, however, we can let certain positions
empty and they simply do not contribute to the total value. This is then exploited
by comparing a partial lineup with the same partial lineup having one extra player
assigned to a position and with defined strategy. The difference of the values between
the two partial lineups indicates then the value of the extra assignments, i.e., the
extra red and blue edges. As described in [31], this type of heuristic works very
well for this problem. This can be extended to a single edge by, for example, fixing
the strategy to the standard strategy and varying only the assigned player, what
would give an estimate of the blue edges. An other possibility is fixing the assigned
player-position pair (blue edge) and varying the strategy (red edge), what would
give an estimate for a red edge.
However, VnukStats also illustrates that the contribution of a single edge to
the total solution is quite complex and it is not constant. For example, when we
fill all midfield positions, there will be some penalty and the total weight of the
midfield edges would be lower. In the context of the drone delivering packages it
was described as a grouping factor, e.g., the maintenance penalty if certain factories
are all working at full capacity. This illustrates that the heuristic as described above
gets more accurate when we evaluate certain edge in the context were more edges
are fixed. When we would like to use that heuristic to evaluate a single edge without
any other edges in the context, this estimation would be less accurate.
2.2 Solution methodologies for routing optimization
problems
The graph representation of the problem as described above shows that we can
inspire our algorithms on the algorithms for the routing problems. Also links to
other combinatorial problems could be made, nevertheless, the routing problems and
especially the problems related with the Traveling Salesman Problem might be an
interesting source of inspiration. Therefore, the graph representation and the TSP
are very important for these thesis.
Since we can inspire the algorithms on the TSP problem, it is then interesting
to briefly discuss the existing approaches for solving that problem. However, the
TSP is a classical problem in computer science and there is large amount of different
11
2. Analysis of the Problem
algorithms designed in that context [2]. Almost every domain of computer science
has its approach. There is even separate study domain, combinatorial optimization,
that specifically focuses on combinatorial problems, where the TSP is one of the
most important problems that are studied. Because of that, we only focus on few
approaches that are close to the context of this thesis, i.e., within the domain of
Artificial Intelligence and Machine Learning.
In the context of Artificial Intelligence, Genetic Algorithms are one of the classical
approaches to the TSP. For example, Affenzeller et al. explain the genetic algorithms
in a book [1] which uses the TSP and its generalized variant, Vehicle Routing Problem,
in many of its examples. That book is then an important inspiration for this thesis.
Also, Genetic Algorithms end Evolutionary Computing are a part of the domain of
nature inspired algorithms. In that context, for example, ant colony [13] is a widely
known approach to the TSP.
This are only few of many approaches in this context. For example, if we look at
the machine learning specifically, we can also notice some interesting examples. One
such interesting approach, not only to TSP but to combinatorial problems in general,
is the use of Hopfield network [20] in the context of artificial neural networks. In
fact, TSP is one of the most known and studied problems, and almost all domains
have its own approach to it.
2.3 Chosen approach
Because of the large amount of possible algorithms for the combinatorial problems,
as discussed earlier, certain selection must be made for this thesis. Since this thesis
can be seen as continuation of the work presented in [31], this thesis investigates the
Monte Carlo Tree Search (MCTS) algorithm presented there and proposes certain
improvements. As an additional approach Genetic Algorithms are chosen, as they
are a generic approach to optimization problems and can be easily adapted and
implemented in the context of the problem as discussed in this thesis. The two
approaches can be then studied in more depth and be compared with each other.
Also, both approaches as described in this thesis are greatly inspired on the TSP. The
graph representation as described in this chapter is then an important illustration
for better understanding of the proposed algorithms.
2.4 Conclusion
This chapter has described the lineup optimization problem from a new perspective
where the link with the combinatorial problems and the TSP problem in particular
became apparent. The resulting graph representation is then an important repre-
sentation of this problem and will be used extensively in the following chapters of
this thesis. The graph representation will then serve as an important tool for better
understanding of the existing algorithm [31] and the newly proposed algorithms.
12
Chapter 3
Monte Carlo Tree Search
(MCTS) Approach
This thesis can be seen as continuation of the work presented in [31]. Therefore, it
is interesting to study more closely the algorithm proposed by that work. Section
3.1 describes then the already existing algorithm. Next, in section 3.2 certain
improvements to that algorithm are proposed. The resulting algorithms are then
compared with the existing algorithms in section 3.3. Section 3.4 concludes this
chapter.
3.1 Existing work
Although certain Monte Carlo Tree Search algorithms where already proposed in
the context of the combinatorial problems, like for example in [6], these are typically
applied on different types of problems and are not directly applicable on the problem
as presented in this thesis. For example, the work presented in [6] discusses an
algorithm for Repeated-task Canadian traveler problem. In that problem we also
have a not fully connected graph, as in our problem, but we have a specific start
city and a target city, as illustrated on figure 3.1. We can thus naturally derive the
tree representation for that problem, where the root of the tree is the start city. We
consider only paths that visit each city at most once (otherwise the path would not
be optimal, as we would have a loop that would make the solution not optimal).
Furthermore, not all cities need to be visited, another similarity to our problem. A
valid solution starts then at the root of the tree (the start city) and ends at the target
city (one of the leafs of the tree). The cities correspond then with the nodes of the
tree, while the tree branches correspond to the edges on the graph. It is interesting
to notice that each path from the root of the tree down to a leaf represents a unique
solution path on the graph. This is then one of the typical problems, i.e., problems
that have natural tree representation, where the tree search methods and Monte
Carlo Tree Search in particular, are applied. In this example, Bnaya et al. apply
one of the UCT (Upper Confidence bound applied to Trees) variants of the Monte
13
3. Monte Carlo Tree Search (MCTS) Approach
Carlo Tree Search (MCTS) algorithm, also investigated in [31] in the context of our
problem.
Figure 3.1: Canadian traveler problem graph example, Bnaya et al. [6].
As discussed above, we can identify many similarities between the example above
and our problem. Nevertheless, in the optimal lineups problem we do not have start
and target nodes (we can assign position to players in any order), and we do not
have a natural tree representation of the problem. This makes the method described
in [6], as other MCTS algorithms applied on combinatorial problems, not directly
applicable on our problem. It is then interesting to investigate the tree representation
of our problem as presented in [31]. In that context, the analogy with the drone
delivering packages problem, as described in the previous chapter, contributes greatly
to better understanding of that representation and is used in the illustration of that
representation as shown on figure 3.2.
In that work the root is the initial state and does not contain any elements of
the solution. In the terms of the graph representation, the root does not contain any
blue or red (or green) edges. Using the analogy with the drone delivering packages
problem, the root can be represented by the drone depot, i.e., the green point of the
graph. The initial situation is then illustrated on figure 3.2(a) in the context of the
drone problem.
In the context of the drone problem, we can go from the depot (green point)
to any ink producing factory (blue point) and then to any of the printing factories
(red point), followed by a visit in one of the corresponding distribution centers
(black point). In the context of lineup optimizations, we can assign a player to a
position and then assign a behavior. Such combination (assigning player, position
and behavior) is one action in the model described in [31], i.e., it is represented by
a single node. In other words, a node contains a red and a blue edge, where such
14
3.1. Existing work
(a) Initial state (b) Expansion
(c) Selection (d) Complete path
Figure 3.2: Tree representation of the problem.
pair must be connected in a single red point. Figure 3.2(b) shows the possible red
and blue edges for one of the red points. For the clarity of the figure, the other
possible red and blue edges (connected with the other red points), are not shown on
the figure, but they are also valid choices.
We select then one of the nodes and we move to that node. This is illustrated on
figure 3.2(c), where the blue and the red edge represent a single node, and the green
edge represents the branch from the root (green point) to that node. We can then
move deeper in the structure of the tree, and go to the next node, etc., until the
path is complete. In the context of the lineup optimizations, the path is complete
when we have selected eleven nodes. In other words, the depth of the tree is eleven,
where at the lowest level we have the leaf nodes. In the simple example as shown on
figure 3.2(d), the complete path contains three nodes, i.e., we visit three out of four
printing factories and thus the depth of the tree is three. It is important to notice
that the branches of the tree are represented by the green edges, while the nodes are
represented by the pairs of blue and red edges.
It is then interesting to investigate the properties of the proposed tree representa-
tion of the problem. In the context of the drone delivering packages problem, we can
complete the path for the drone by simply adding a green edge from the last visited
distribution center (the last black point in the path shown on figure 3.2(d)) back
to the drone depot (the green point on the graph, or the root of the tree). This is
15
3. Monte Carlo Tree Search (MCTS) Approach
shown on figure 3.3(a). Notice that this extra edge does not increase the complexity
of the solution, i.e., the solution is still represented by the same path in the tree as
shown on figure 3.2(d). The resulting path is very similar to what we had in the
Canadian traveler problem described earlier, except the start and target are both
in the same point on the graph, i.e., the drone depot. However, in the Canadian
traveler problem, each path in the tree representation from the root to the leaf was
unique. In the situation with the drone depot, where the start and target are at
the same place, we have redundancy of factor 2. This can be illustrated with the
path shown on figure 3.3(a), where we always start in the root, but we can do the
tour clockwise or counterclockwise. Both tours, clockwise and counterclockwise, are
exactly the same when shown on the graph (and thus are equivalent solutions), but
they are two different paths in the tree where we branch from the root in different
directions and we visit totally different nodes of the tree, i.e., the paths have only
the root in common. This can be counter intuitive, since the nodes contain exactly
the same edges for both paths, but since they are on different branches of the tree,
they are different nodes of the tree.
(a) Explicit root (b) Implicit root
Figure 3.3: Equivalent tours in the tree representation.
However, redundancy of factor 2 does not influence the tree search algorithms
greatly, and as described in the previous chapter, there are examples of Monte
Carlo Tree Search applied on the Vehicle Routing Problems with a depot, similar to
what is shown on figure 3.3(a). More importantly, the drone depot is not relevant
to the problem and could be placed anywhere in the graph without changing the
solution. In fact, as discussed in chapter 2, we can transform the solution to a tour
representation as shown on figure 3.3(b), where the shown solution is equivalent with
the one shown on figure 3.3(a). In terms of the tree representation, figure 3.2(d)
shows the root explicitly, where figure figure 3.3(a) does not show the root of the
tree, i.e., the root is implicit and could be placed on any of the green edges. In other
words, we could start the tour at any pair of a blue and a red edge (node at level
one in the tree) and follow the tour clockwise or counterclockwise. The green edges
still represent the branches of the tree, where one of the green edges is implicit and
does not show in the tree, i.e., it connects the leaf of the tree with the first node we
visited starting from the root. The redundancy factor is then equal to the depth of
16
3.1. Existing work
the tree d multiplied by two, i.e., d × 2.
This can be nicely illustrated in the context of the Traveling Salesman Problem.
One of the popular representations in the context of the genetic algorithms for that
problem is the path representation. In that representation, we number the cities
and represent the tour as a list containing the numbers representing the cities. The
resulting tour is then obtained by going to the first city in the list, then to the next,
etc., until the last city on the list is reached, where we go back to the first city on
the list in order to complete the tour. In the terms of the drone delivering packages
problem, this is illustrated on figure 3.4(a). We number the pairs of edges (a red and
a blue edge with a common red point) that are a part of the solution with 1 to 3.
Note that the same pair of edges does not necessary correspond with the same node
in the tree. For example, a tour (1 2 3) in path representation is the same as the
tour (2 3 1) or (3 2 1), etc., and are just different representations of the tour
shown on figure 3.4(a). These are, however, different paths in the tree, and thus they
visit different nodes of the tree. We can then notice that the path representation is a
convenient way of notating paths in the tree and that both representations (tree and
path) introduce the same redundancy in representing the same tour (e.g., the tour
shown on figure 3.4(a)). We can thus find d × 2 (d is then the tree depth or the tour
length) different path representations for each unique tour, i.e., we can start in any
city and do the tour clockwise or counterclockwise.
(a) First tour (b) Second tour
Figure 3.4: Equivalent tours in the tree representation.
At first, redundancy of d × 2 may seem very large, as we replace the original
search-space with a new search-space that is d × 2 larger than the original one. In
other words, the redundancy factor rises with the complexity of the problem what
imposes limits on how complex problems we can solve. However, path representation
is very widely used in the context of genetic algorithms and some very good algorithms
[1] are known that use that representation. Nevertheless, as stated earlier, the green
edges are not relevant to the optimal lineup problem. In other words, not all unique
tours that we can find in the graph representation are unique solutions to the problem.
For example, figure 3.4(b) shows a tour that is different from the tour shown on
figure 3.4(a), yet, both tours represent exactly the same solution.
17
3. Monte Carlo Tree Search (MCTS) Approach
This is further illustrated on figure 3.5 where two different example solutions are
shown. The solution shown on figure 3.5(a) has than the edge pair 1 replaced with
an different edge pair, not seen before, that gets the number 4 as shown on figure
3.5(b). The remaining question is then what is the redundancy factor for this type
of problem introduced by the tree representation as described earlier? As described
before, each path from the root of the tree to one of its leaves can be described
using the path notation. Any path notation that exists of the same edge pairs
(contains the same numbers) is then redundant. In other words, all permutations
of one path represent exactly the same solution and are different paths in the tree.
The path representation has the length equal to the depth of the tree, what brings
the redundancy factor to d!. For the lineups optimization problem the depth of the
tree is fixed and is equal to eleven. Thus, the redundancy introduced by the tree
representation of lineups optimization is 11! = 39916800 (almost forty million).
(a) First solution (b) Second solution
Figure 3.5: Different solutions of the problem.
In fact, the problem is severe enough such that the algorithm proposed by [31]
can’t handle the complexity of the problem as described so far. When running the
algorithm on the full problem, the algorithm crashes as it runs out of memory. The
only case that it can handle is the simplified problem, where only 11 players are
assignable (11 players in the club) to positions and only 11 positions are considered
(fixed subset of all 14 positions on the field). Also, only one type of behavior is
assignable, the normal behavior. All experiments presented in [31] are done on the
simplified version, as the algorithm crashes on more complex variants.
Remarkably, the complexity of the simplified problem is equal to the redundancy
introduced by the tree representation of the problem. We have only one type of
behavior we can assign, so this disappears from the problem, as each position gets
the same default behavior (no red edges in the solution). We are left then with
11 players that we need to assign to 11 positions. When we fix the positions in a
representation and we represent the solutions as permutations of the eleven players,
i.e., first player from the permutation is assigned to the first fixed position, second
player to the second position, etc., we have 11! different solutions to the problem.
The depth of the tree is still equal to 11, so the redundancy factor remains at 11!.
In other words, the original problem with complexity 11! = 3.9916800 × 107 gets
18
3.1. Existing work
transformed to a new search-space by the tree representation with complexity of
(11!)2 = 1.5933509 × 1015 (i.e., almost 1.6 quadrillion). This makes the problem
exponentially harder by power of 2.
All of algorithms described in [31] use the same tree representation of the problem.
It might be that the tree search approach is not the best possible one for this type
of a problem, and any decent attempt should tackle the redundancy problem first.
Since improving the algorithms proposed by [31] is one of the goals of this thesis, an
interesting solution for that problem is presented in the next section. Notice that
all obvious tree representation for this problem require introduction of the green
edges and that the exponential explosion of complexity is a inherent consequence to
that. This makes then the application of a tree search algorithm on this problem an
interesting challenge.
However, before we move to the section with the proposed solution, some aspects
of the Monte Carlo Tree Search algorithms need to be described. So far, only the
tree representation of the problem has been discussed. It is then important to also
discuss how the search itself is done. First important thing to notice, is that Monte
Carlo Tree Search is an iterative algorithm. We start from the root and it is the
only node present in the tree at that moment. Then we execute the steps of the
algorithm iteratively until a stop criterion is reached. It could be, for example, that
we have found a solution that is good enough, or that we have reached a maximum
number of iteration, or that the time foreseen for execution has run out, etc. During
the iterations we add the nodes to the tree as we perform the search. The goal of
the algorithm is then to identify the nodes that lead to an optimal solution and to
expand the tree in that direction. The algorithm tries to find a balance between the
exploitation of high yielding paths and exploration in uncertain directions.
The steps of the algorithm are then Selection, Expansion, Simulation and Back-
propagation. Based on [31], the following definition of these steps can be given using
the context of the earlier described tree representation of the given problem.
In the Selection step the search tree is traversed from the root to a leaf. For each
internal node in the descent, a child node is selected based on a specified selection
method. This selection method is executed recursively until a leaf node k has been
reached. We say that a node has been visited if it is selected during this step. This
was illustrated on figures 3.2(c) and 3.2(d), where the leaf node is a node in level
11, i.e., after selection a complete path is formed and the algorithm can move to
the Backpropagation step. If the leaf node is on intermediate level, the path is not
complete and we move to the Expansion step.
In the Expansion step all possible (valid in the context of the already built path)
red and blue edge combinations with a common red point are computed for the leaf
node k. A new node is added as a child node of k for each resulting combination.
An alternative would be to only consider a subset of the possible combinations, but
that alternative is not explored in the context of lineups optimization (neither in
this work nor in the original work). Based on the selection method, a child node l
is selected. If there are no child nodes, k is selected as l. The Expansion step was
illustrated on figure 3.2(b) where only the red and blue edges are shown for one of
the red points. The full expansion set considers all possible combinations (valid in
19
3. Monte Carlo Tree Search (MCTS) Approach
the context of the already built path), but this would overload the figure.
In the Simulation step the path from l is sequentially extended by adding random
red and blue edge combinations with a common red point (containing only red and
blue points not yet visited in the subpath) to it until a full path has been completed.
The gradation of randomness is defined in the actual implementation of the algorithm.
It is possible to base the probability of choosing an action on domain knowledge. A
possible complete path after the Simulation step was illustrated on figure 3.2(d).
In the Backpropagation step the fitness function is executed on the full solution
(complete path obtained after the Selection and/or the Simulation step). The
obtained fitness value is then propagated to all ancestors of l. The most used and
most effective valuation method (i.e., assigning the values to the nodes as used in
the selection step) is to choose the average of the fitness of all simulations results
that were executed in a descendant node, it is then also used in all of the discussed
algorithms.
The whole process is illustrated on figure 3.6 in more general terms. The figure
is borrowed from Chaslot et al. [10].
Figure 3.6: Outline of a Monte-Carlo Tree Search, Chaslot et al. [10].
The definitions as given above leave the choice of the selection and simulation
strategies open. In the original work[31] many different strategies in combination
with two kinds of fitness evaluation are proposed and experimented on. Therefore,
we briefly look at the performed experiments in order to identify the best approach,
that will be described in more detail in the remainder of this section. That approach
will be than used as basis for improving on in the next section. For the explanation of
the working of the other approaches and more details on the workings of the Monte
Carlo Tree Search algorithms in general, the reader is referred to the original work
[31].
Before looking at the results, first the used evaluation method is explained. As
mentioned above, two variants of fitness evaluation are used: the nominal and the
numeric. The nominal fitness evaluation is based on the nominal representation of
the prediction (the black-box evaluation of the solution as described before) that
20
3.1. Existing work
classifies a match as either a win or a loss. In the terms of fitness evaluation, we can
say that the fitness is either good or bad, what can be represented in a form usable to
algorithms as either 1 or 0. The numeric fitness evaluation is based on the numeric
representation that uses the difference between the scores of the teams as outcome.
Note that this representation can produce continues values, despite the fact that
the score difference is always an integer, by means of regression embedded in the
black-box evaluation. It is then directly usable as fitness function that produces
continues values.
Further, in the evaluation of the performance of the algorithm, the quality of the
obtained solution is measured by comparing the obtained solution with a baseline
solution. The baseline solution itself is obtained for each match using the greedy
approach with VnukStats as the heuristic, where the assignments yielding highest
increase of that heuristic are chosen progressively. The quality of the obtained
solution is then measured as a percentage increase in the numerical fitness function
(regardless of the used type of fitness function in the experiment, as the nominal
fitness function is not very well suited for that purpose) of the obtained solution
against the fitness value of the baseline solution with the following formula [31]:
quality =
fitnesssolution − fitnessbaseline
|fitnessbaseline|
The performance of the Monte Carlo Tree Search algorithms is then measured
with the above described methodology every 1000 iterations of the algorithm. The
solution path is obtained by descending the search tree from the root to a leaf node,
each time selecting the child node with the highest node value. If the obtained path
is not complete (the depth eleven is not reached), it is temporarily completed using
the greedy completion method described for the baseline solution. The resulting
performance is then averaged over 90 matches: all combinations of one of six own
teams and one of fifteen opponents, where each match outcome is averaged over
15 runs of the algorithm, as the result of the Monte Carlo Tree Search is non-
deterministic. Each execution is limited to 50000 iterations of the algorithm.
Furthermore, the experiments explore two types of a simulation step: fully random
simulation and semi-random simulation. In the simulation step, a path selected after
the expansion step is sequentially expanded by random actions until a complete path
has been reached. In Hattrick lineups, a complete path is a complete choice set of
eleven assignments of football players to positions and individual instructions. The
choice of these assignments in order to complete the path can be either fully random
or semi-random. Fully random means that each possible assignment has an equal
probability of being chosen. Semi-random means that the probability is proportional
to a heuristic valuation of the assignment, i.e., the assignments are chosen with a
roulette wheel selection method. The VnukStats (as described earlier) is used as the
heuristic: the probability of choosing a set of assignments (a red and a blue edge
with common red point) is proportional to the difference in VnukStats value between
the original path and the extended path. VnukStats estimates then the strength of a
(partial) path (lineup).
21
3. Monte Carlo Tree Search (MCTS) Approach
Finally, different selection strategies are explored. The explored selection strate-
gies are Objective Monte-Carlo (OMC) [9], OMC in one player context (OMC in one
player context) [8] (a variant of the OMC strategy), Probability to be Better than
Best Move (PBBM) [11] (another variant of OMC), Upper Confidence bound applied
to Trees (UCT) [21] and UCB1-TUNED [15] (a variant of UCT). UCB1-TUNED
was originally proposed for the multi armed bandit problem by Auer et al. [3]. The
results of the experiments are shown on figure 3.7 for the numeric fitness evaluation,
and on figure 3.8 for the nominal fitness evaluation.
Figure 3.7: Experiments with the numeric fitness evaluation. Two types of
simulation are explored: fully random (left) and semi-random (right), Verachtert
[31].
Figure 3.8: Experiments with the nominal fitness evaluation. Two types of
simulation are explored: fully random (left) and semi-random (right), Verachtert
[31].
We can observe that the numeric fitness evaluation performs significantly better
than the nominal fitness evaluation (i.e., good or bad fitness). The numeric fitness
evaluation is then the only type of fitness evaluation further considered in this thesis.
We can also notice that the semi-random simulation works better than full random
22
3.2. Proposed improvements
simulation for this problem. Especially the UCB1-TUNED selection method with
semi-random simulation (and numeric fitness evaluation) performs significantly better
then any other tested algorithm. It is then the algorithm that will be considered for
improvements in the next section. However, before we move to the next section, the
used selection method (UCB1-TUNED) needs to be described in more detail first.
The description method for the other selection strategies can be found in the original
sources [9, 8, 11, 21, 15, 3], or in [31].
As mentioned earlier, UCB1-TUNED is a variant of Upper Confidence bound
applied to Trees (UCT). In the context of the given problem, UCT [21] in its pure
form selects nodes based on the average fitness of the evaluated paths passing by the
current node and a bias term. The bias term balances exploration and exploitation
by increasing the total value used in the selection step for the nodes that have been
less explored. The selection value function is then:
s(i) = µi + C ×
ln np
ni
where i is the node under consideration, p is the parent node of i, µi is the mean
fitness value of the evaluated paths passing by the current node i, n is the number
of times a given node was explored and C is a parameter that has to be determined
by the user based on the problem.
UCB1-TUNED [15] is then a variant of UCT that was originally proposed for
the multi armed bandit problem by Auer et al. [3]. The adjusted selection value
function in the context of this problem is defined in this strategy as follows:
s(i) = µi + C ×
ln np
ni
× min
1
4
, Vi(ni)
where
Vi(ni) = σ2
i +
2 ln np
ni
In the formula above, σ2
i is then the variance of the fitness value of the evaluated
paths passing by the current node i. In the selection step, the node with the highest
value of the selection value function s(i) is selected.
3.2 Proposed improvements
As discussed in the previous section, the tree representation exponentially increases
the complexity of the problem by factor d!, where d is the depth of the tree, i.e.,
by the factor 11! in this case. Also, it is not obvious to define a tree representation
for this problem where such increase in complexity can be avoided. Since the
algorithms as described in the previous section need to search in exponentially
expanded search-space, it might be useful to try the Random Search algorithm for
comparison.
23
3. Monte Carlo Tree Search (MCTS) Approach
The proposed Random Search algorithm is identical to the full random simulation
step described in the previous section. In other words, we chose the first pair of a
blue and a red edge with a common red point (assignment of a player to position
and choice of an instruction) at random. This selection makes all other red and blue
edges that have the already chosen red point in common not longer valid for further
selection. From the remaining edges, we chose the next pair at random, etc., until a
full selection is formed.
We can notice that we can also illustrate this in a graph where we have green
edges between two selections. However, the resulting selection is not bound to a tree
or graph representation and the same selection could have been done by following a
different path of the tree. All selections happen at random, what makes any of the
paths we could follow that would result in the same selection equally probable, i.e.,
if we look at this in the reverse and we have a specific solution that the algorithm
presents, any path that leads to that specific solution is equally probable. In other
words, despite the fact that we can represent this process as traversing a path in a
tree, we avoid the increase of the complexity introduced by the tree representation,
as the resulting solution is not bound to such representation and could have been
achieved by other means, e.g, by enumerating all possible solutions to the problem
and choosing one at random.
When we look at the alternative solution with enumerating all possible solutions,
we can also notice that both approaches, fully random simulation step and enumer-
ating the solutions, have equal probability of collisions, i.e., generating the same
solution twice or more during the same run of the algorithm. Since all the paths are
equally probable and we have the same number of paths for any solution, the choice
of the same path twice or more is equally probable as the choice of the same solution
twice or more from the pool of possible solutions. Thus, the algorithms are equivalent
in terms of randomness of the generated solutions, however, the implemented Random
Search has a tree representation as it uses the full random simulation step from the
earlier described random search algorithms and can be viewed as a variant on the
tree search algorithms, i.e., Random Tree Search.
This is a very interesting insight as it motivates the possibility of using the Monte
Carlo Tree Search algorithm in the context of lineups optimization. It is important
to notice that Monte Carlo Tree Search algorithm is an incremental algorithm and
starts with only the root node, while other nodes are created while being explored.
Furthermore, unexplored children of a node have equal probability of being selected
as they have the same value, what makes the probability of collision small when
the problem is sufficiently large with a high branching factor like in the lineups
optimization problem, and when the tree is not yet explored for a large part. The
more we explore the tree, the more collisions will happen. However, if the algorithm
converges quickly enough (i.e., we have a sufficiently large exploitation bias) to a good
solution, it might still work very well even with the large redundancy factor. This
could also partially explain why semi-random (proportional) simulation worked better
in the experiments described in the previous section as it increases the exploitation
bias. We will discuss the type of bias that is created by the semi-random selection
strategy in a tree context later in this section, when the new Monte Carlo Tree
24
3.2. Proposed improvements
Search algorithm is proposed.
An other interesting insight is that when we run the Random Search for the
equal number of iteration, it will always finish before the Monte Carlo Tree Search
algorithm, as it does not have the overhead of creating nodes, executing selection
steps, etc. In fact, the Random Search algorithm as described above needs around 4
seconds to execute 50000 iteration compared to 40 seconds for the Monte Carlo Tree
Search on the same Linux machine (both algorithms use the same fitness evaluation
function in Java, although the tested Random Search is implemented in Objective C).
Since the algorithms are usually bound by the execution time, for a fair comparison an
other version of the Random Search is used in the tests, next to the Random Search
that uses the same number of iterations (we refer to this algorithm simply as Random
Search), a Random Search that runs for approximately the same time on the same
machine as the Monte Carlo Tree Search. We refer to the second Random Search
algorithm as Random Search Tuned. Both Monte Carlo Tree Search algorithms, the
UCB1-TUNED described in the previous section and the new version described later
in this section, need around 40 seconds to complete 50000 iterations on the simplified
problem as described in the previous section. The random search tuned is then set
to 500000 iterations, what results in approximately the same running time.
We can now move to the improvements to the Monte Carlo Tree Search algorithm
in the context of the lineups optimization as proposed in this thesis. As discussed
earlier, any tree representation in the context of this problem introduces the green
edges. If we look at the possible paths in the context of the TSP, when we fix one
of the cities in the root, we reduce the redundancy from d × 2 to 2. Thus, we can
reduce the complexity of the problem by factor d. Therefore, for the lineups problem,
we would reduce the redundancy from d! to (d − 1)! (d is always equal to 11 in the
context of lineups optimization as only the green edges cause the redundancy, i.e.,
if we have the tree depth of 22, we still have 11 green edges), i.e., we would still
have exponential increase of the redundancy. However, one interesting thing can
be observed in the context of lineups optimization. We could, for example, fix the
goalkeeper position in the root, as all strong lineups have a goalkeeper assigned. In
this case, we would only have paths that include the goalkeeper, i.e., we reduce the
pool of possible solutions. If we do not fix the goalkeeper, we must chose 11 positions
out of 14 on the field and we have the following number of combinations:
14
11
=
14!
11!(14 − 11)!
=
12 × 13 × 14
1 × 2 × 3
= 364
When we fix the goalkeeper position, we have the following number of combinations:
13
10
=
13!
10!(13 − 10)!
=
11 × 12 × 13
1 × 2 × 3
= 286
In other words, we decrease the complexity by factor 11 × 364/286 = 15.64. This
may seem not much, since the redundancy remains exponential, but we also do not
consider solutions without the goalkeeper that potentially have much lower quality.
In other words, we work with the average, i.e., the nodes that are unlucky and have
25
3. Monte Carlo Tree Search (MCTS) Approach
a solution without the goalkeeper in their path would have their average decreased
by that. Even if a given node can lead to good solutions, it will be explored less if
its average is lowered by a bad solution. It is not the case in the simplified problem,
as all solutions have a goalkeeper since the positions are fixed.
Finally, we can notice that in the algorithms described in the previous section
the nodes in level one of the tree contain all of the possible edges (red and blue) that
can be used in a solution. In other words, the nodes directly connected with the
root contain all possible combinations for the assignments that we can make. In that
aspect, any node lower in the tree can only contain one of the edges already present
in the tree and can be viewed as redundant. Therefore, the proposed algorithm
builds a tree of level 1 and never grows larger. Since the tree is fixed after the
expansion step from the root, the initialization is combined with the expansion step
and there is only one expansion step in the proposed algorithm. The whole tree fits
in memory and the algorithm can be used also in the problem with full complexity.
The initialization/expansion step is shown on figure 3.9.
Figure 3.9: Initialization/expansion step.
As can be seen on figure 3.9, the root node is marked green. It can be visualized
in the graph representation as, for example, the drone depot, i.e., a point in the
graph that can be located anywhere. From the root we have then the green branches
(edges) to the child nodes of the root that each contain one edge, either red or blue.
Therefore, the level one nodes are marked red and blue, corresponding to the color
of the edge they contain. The choice to separate the red edges from the blue edges
reduces the complexity of the tree, as each edge is placed exactly once in the tree. If
we do not separate the red and blue edges, we need to store each valid combination
of the red and blue edge separately in the node, as it is the case in the algorithms
described in the previous section. For example, in the lineups optimization problem
we have 14 different positions on the field and 44 individual instructions in total
(3.14 individual instructions per position on average). Thus we have 44 red nodes in
the tree. If we take as an example a club with 22 players, we have 308 blue nodes in
26
3.2. Proposed improvements
the tree. In total, there are 352 nodes in the tree plus the root node. If we do not
separate the edges, we would have 968 nodes plus the root in the tree, i.e., 3.14 times
more level one nodes. In other words, in the new tree representation as illustrated
on figure 3.9, and in the context of lineup optimization, the full tree has 353 nodes
in total and thus easily fits in memory.
The next step in a Monte Carlo Tree search algorithm is the selection step.
Since all the nodes that are a part of the solution are already present in the first
level, selecting only one node would only be a part of the solution. Therefore, the
algorithms selects all nodes that are a part of the solution at once. However, since
Monte Carlo Tree search algorithm depends on randomness, hence the Monte Carlo
name, simply selecting the nodes by maximal value would not produce sufficient
randomness. The step that introduces the randomness is the simulation step that is
not yet present in the algorithm. Therefore, in the proposed algorithm the selection
and simulation steps are combined into one selection/simulation step. The used
simulation is then semi-random simulation, similar to the semi-random simulation as
described in the previous section, except that the probability of selection of a node
is proportional to its value (the same UCB1-TUNED formula is used as described
in the previous section) and not to the VnukStats heuristic valuation as used in
the algorithms described in the previous section. The selection/simulation step is
illustrated on figure 3.10.
Figure 3.10: Selection/simulation step.
The selection/simulation step works as follows. First a blue edge is selected from
the pool of all blue edges that are a valid choice in the given moment. For example,
if we put the constrain that a solution must contain a goalkeeper, we place the
goalkeeper position at the root of the selection/simulation step and only blue edges
that contain the goalkeeper position are valid for the selection starting from the
root. The selection probability is then proportional to the node value containing the
specific edge (i.e., in this case the selection method is the roulette wheel selection).
Once a blue edge is selected, a red edge is selected that completes the red and blue
27
3. Monte Carlo Tree Search (MCTS) Approach
edge pair, i.e., we select the red edge from a pool of valid red edges (containing the
specific red point) at the given moment as only a red edge can be selected after
a blue was selected. Also, only a blue edge can be selected after a red edge was
select, and only a blue edge that does not contain a point already present in the
solution can be considered. The constraints described above can be formalized as
a localized constraint that is put on the pool of all possible edges, and only edges
that are conform with the given constraint are considered for the selection. Then we
repeat the steps described above until a full solution has been formed. This process
can be seen as recursion with the selection of one edge (or node, i.e., we can replace
edge with node in the pseudo code below for an equivalent result in a more general
context) as the recursive step:
SelectEdge ( allEdges , constraint , solution ){
selectableEdges = constraint . applyOn ( AllEdges ) ;
selectedEdge = SelectionMethod ( selectableEdges ) ;
solution . add ( selectedEdge ) ;
constraint . updateAccordingTo ( solution ) ;
i f ( solution . isComplete ()){
return ;
} else {
SelectEdge ( allEdges , constraint , solution ) ;
}
}
The final step is the backpropagation step. This step is exactly the same as in the
original algorithm and is illustrated on figure 3.11. We can then summarize the algo-
rithm as beginning with the initialization/expansion step that places the whole tree
in memory. Then we execute the selection/simulation followed by backpropagation
steps iteratively until the algorithm stops according to a stop criterion.
Figure 3.11: Backpropagation step.
28
3.2. Proposed improvements
Because of the recursive selection/simulation step, the proposed algorithm is called
Recursive Monte Carlo Tree Search (RMCTS). We can notice that this algorithm is
then usable in broader context, as it can be used for example for solving the Traveling
Salesmen Problem. In the TSP context, we would start a tour from a fixed city. It is
then the root constraint that is passed in the call to the recursive selection/simulation
step. In that step we can then select any edge that contains the start city. Once
the edge is selected it contains another city that becomes the new start point for
the following selection. The constraint is then updated with that city and with an
additional constraint stating that edges that contain any of the cities already present
in the solution are not allowed. We repeat the selection recursively until a full tour
is formed. Also here the nodes of the tree contain all possible edges and the tree
depth is one, i.e., the whole algorithm remains the same except for the constraint
mechanism that is problem specific. Therefore, as stated in previous chapters, the
algorithms proposed in this thesis are inspired on the Traveling Salesman Problem,
including the Recursive Monte Carlo Tree Search.
In the context of the lineup optimization, we can then notice that we indeed form
a complete route in the selection/simulation step. In other words, the green edges
are also passed in that route and that the selection/simulation step has identical
graph representation (and thus tree representation) as the algorithms described in
the previous section. This creates the same type of tree bias in all algorithms. One
type of bias is that not all solutions are equally probable, as we use proportional
selection, and thus we have exploitation effect. The other part is when we look at the
obtained solution, we can notice that not all paths in the tree are equally probable.
Since the simulation is recursive and all possible edges are carried with it (the edges
are preselected according to a localized constraint), edges with a high value are more
likely selected closer to the root. Closer to the leaf the constraint eliminates more
edges, and the selection is done on a smaller pool that possibly does not contain
the strongest edges anymore. This kind of tree bias (some paths are more probable
than other) makes the algorithms described in the previous section perform better
as this helps to cope with the redundancy. However, this has also consequence on
the probabilities of the solutions from the pool of solutions for being selected, as the
edges with highest value tend to be selected first, i.e., we create a type of greedy
bias.
The greedy (tree) bias described above is mitigated in the RMCTS by the tree
representation as described above (tree of depth one). Thus, the solution obtained
with the selection/simulation step is viewed as a solution in a pool of all possible
solutions (the redundancy of d! is removed from the representation, in fact, all
possible redundancy is removed) and all nodes used in a solution get updated with
the same value, what does not leave any trace of how the particular solution was
obtained. Nevertheless, we can identify an other risk with this approach. We can find
by chance quite good solution early in the run of the algorithm and it can dominate
the selection process, i.e., we could have the snowball effect. Especially the used
roulette wheel selection method is known for having high selection pressure and can
cause a snowball effect. It is then interesting to investigate lowering of the selection
pressure. The proposed approach is using the K-Tournament selection method with
29
3. Monte Carlo Tree Search (MCTS) Approach
K set to 2. In this method we chose two nodes at random and we select the one with
higher value.
We can notice that this approach not only reduces the selection pressure, i.e.,
increases the exploration, but it is also much more efficient. Where the roulette wheel
selection requires computing the sum of all values, it has linear time complexity (pro-
portional to the number of nodes being considered for selection), the K-Tournament
can be executed in constant time, regardless of the size of the tree. Also, since this
increases the exploration and decreases the exploitation, potentially this method
requires more iterations to converge to a good solution. For a fair comparison, this
method is then set to the number of iterations that can be executed in the same time
as other methods, i.e., we stick to the 40 seconds limit and we can set the number of
iterations to 150000 for this method. We call this algorithm the Recursive Monte
Carlo Tree Search Tuned (RMCTS-Tuned).
Finally, we need to define how a solution from a single run of an algorithm is
selected. We could follow the definition from previous section and select the nodes
having the highest value to form the final solution. However, this is an optimization
problem, and we are interested in the best solution found. Therefore, the algorithms
discussed in this section (Random Search (RS), Random Search Tuned (RS-Tuned),
RMCTS and RMCTS-Tuned) return the best solution found during the run of the
algorithm.
3.3 Comparison of the MCTS algorithms for Hattrick
Before we start with the experiments, it is worth to briefly discuss the tuning of the
C parameter as used in the formulas given in section 3.1. As stated in section 3.1 this
parameter is problem specific and can be tuned empirically (using experimentation)
for a given problem. In the context of lineups optimization, each match is a different
problem, as the average fitness value and the variance would be different for each
match. However, even simple tuning gives huge advantage to the Random Search
Tuned algorithm, as it can potentially scan 500000 solutions with each run of the
Monte Carlo Tree Search algorithm. For example, if we would try only 8 different
values for this parameters and run 10 experiments, 80 experiments in total, Random
Search Tuned can scan 40 million solutions in the corresponding time. This is also
the complexity of the simplified problem as discussed before.
Therefore, at that point it is better to replace the Random Search with Complete
Search that would give globally optimal solution for the given simplified problem.
This is not the case for the full problem, as the complexity is much larger and the
computation of the exact solution is not feasible in an acceptable computation time.
Nevertheless, the value used for the parameter C in the experiments described in
this section is set to 1 in all experiments, the same value as used in the experiments
described in section 3.1. This also guarantees a fair comparison between the algo-
rithms. On the other hand, the value 1 for the C parameter might seem unfair for
the UCT algorithm described in section 3.1. We can notice that when the variance
30
3.3. Comparison of the MCTS algorithms for Hattrick
of the fitness is large for a given node in the UCB1-TUNED algorithm, the term
min
1
4
, Vi(ni)
would often evaluate to 1/4 since
Vi(ni) = σ2
i +
2 ln np
ni
Therefore, setting the value of C to 1/2 might boost the performance of the UCT
variant. However, this remains out of the scope of this thesis, as we focus only on
the UCB1-TUNED algorithm, the best algorithm from the experiments described in
section 3.1.
Furthermore, each algorithm is run 100 times in the experiments described in
this section. This includes the Random Search Tuned algorithm. It has then a very
good probability of finding the exact solution for the given simplified problem. Even
more so knowing that the optimal solutions might not be unique in the context of the
Hattrick problem. This was observed in the experimentation where the algorithms
often return 5 or more unique solutions that have the same fitness value, what makes
in its turn the probability of having more than one globally optimal solution quite
large.
We start then with the simplified problem as described before, such that we can
compare the algorithms proposed in the previous section (RS, RS-Tuned, RMCTS
and RMCTS-Tuned) with the UCB1-TUNED algorithm described in section 3.1. We
can then also use the quality measure as described in that section for comparing the
results. However, instead of aggregating the results over many different matches, we
focus only on three different cases: one match against a weak opponent, one against
a strong opponent and one against a similar strength opponent. We can then analyze
how the specific algorithms perform in the specific situations. We could for example
detect when a specific algorithm would benefit from tuning, etc.
The first case that we consider is the match against a similar opponent. For that
purpose, we select one of the matches from the experiments described in section 3.1
that has the baseline solution with fitness value close to 0. The selected match has
a baseline solution with fitness 0.26 and thus qualifies as a match against a similar
opponent, i.e., the score difference (fitness value) is close to zero. This is then a very
important case as increasing the score difference (i.e., the fitness value) would make
the probability of winning higher [31]. The box-plots of the results are shown on
figure 3.12. The results are then summarized in table 3.1.
The RMCTS-Tuned is clearly the best algorithm in this test and produces quality
of 535%. We can also notice that the Random Search Tuned (RS-Tuned) has larger
mean fitness (and thus quality) than the UCB1-TUNED. However, the difference is
small and is not significant statistically. The p-value from a two sided t-test with
95% confidence is 0.2243, hence there is no significant difference between the mean
fitness of the RS-Tuned and the UCB1-Tuned. The RMCTS has better mean fitness
than the Random Search Tuned (p-value is 1.639 × 10−12) and better mean fitness
31
3. Monte Carlo Tree Search (MCTS) Approach
Figure 3.12: Box-plots of the results against similar opponent, simplified problem.
Algorithm mean variance quality (%)
Random Search (RS) 1.428983 0.0148 448.10
Random Search Tuned (RS-Tuned) 1.585576 0.0012 508.16
UCB1-TUNED 1.576269 0.0045 504.59
RMCTS 1.618149 0.0006 520.66
RMCTS-Tuned 1.656401 0.0001 535.33
Table 3.1: Experiment results against similar opponent, simplified problem.
than the UCB1-TUNED (p-value is 4.803 × 10−8). The best algorithm from this
test is RMCTS-Tuned with significantly better mean fitness value from the RMCTS
(p-value is lower than 2.2 × 10−16 and cannot be computed exactly).
We can notice that we have obtained vary large quality in this test. The average
quality of the UCB1-Tuned was 70% in the tests described in section 3.1, while
in this test (it is only one of the matches considered in the experiments explained
in section 3.1) we get almost 505% for this method. This is easily explained by
the fact that the baseline fitness is close to zero. As explained earlier, this is an
important case, as the higher predicted score difference makes the probability of
winning higher and thus pulls the average quality of obtained solutions higher. In
other words, we can expect the quality of the solutions to drop significantly when
we run experiments against strong or weak opponents, even if the absolute value of
the difference between the baseline fitness and the fitness of the obtained solution is
higher as in this experiment (around one goal difference).
We can now move to the case against a strong opponent. We chose that one of
the matches with low baseline fitness. The chosen match has −3.465714 baseline
fitness. This is also an important case, as when we obtain sufficiently high quality of
the solution, our team might win the match. In order to reach 0 fitness, i.e., zero
score difference, we need 100% quality. In other words, we need at least 100% quality
32
3.3. Comparison of the MCTS algorithms for Hattrick
to win the match. The box-plots of the results from that experiment are shown on
figure 3.13. The results are then summarized in table 3.2.
Figure 3.13: Box-plots of the results against strong opponent, simplified problem.
Algorithm mean variance quality (%)
Random Search (RS) −2.241449 0.0074 35.33
Random Search Tuned (RS-Tuned) −2.128714 0.0009 38.58
UCB1-TUNED −2.108591 0.0034 39.16
RMCTS −2.099439 0.0004 39.42
RMCTS-Tuned −2.086433 0 39.80
Table 3.2: Experiment results against strong opponent, simplified problem.
From the results we can notice that this time the UCB1-TUNED has better
quality than the Random Search Tuned. In fact, the mean fitness value is statistically
significantly different for these algorithms (p-value is 0.002817). Furthermore, it
performs as good as the RMCTS algorithm, as there is no significant difference
between the mean fitness value for these algorithms (p-value is 0.1419). Nevertheless,
the RMCTS-Tuned is again best algorithm and it has the mean value significantly
different from the UCB1-TUNED (p-value is 0.0002779) and from the RMCTS (p-
value is 1.293 × 10−9). Interestingly, the RMCTS-Tuned algorithm returned solution
with the same fitness value (−2.086433) in each of its 100 runs and has 0 variance.
We can also notice, as expected, that the quality of the obtained solutions is
much lower than in the previous test. Also here we have around one goal difference
in the score difference prediction, but the quality remains below 40%, even for the
RMCTS-Tuned algorithm that seems to have returned globally optimal solution in
each run.
We can now move to the last test with the simplified problem, the match against
a weak opponent. This time the baseline solution for the chosen match has fitness
value 4.684158. It is quite large value, so we expect only very small gain in the
33
3. Monte Carlo Tree Search (MCTS) Approach
quality of the solutions. The box-plots of the results from that experiment are shown
on figure 3.14. The results are then summarized in table 3.3.
Figure 3.14: Box-plots of the results against weak opponent, simplified problem.
Algorithm mean variance quality (%)
Random Search (RS) 5.816697 0.0033 24.18
Random Search Tuned (RS-Tuned) 5.906818 0.0009 26.10
UCB1-TUNED 5.790016 0.0002 23.61
RMCTS 5.939261 0.0004 26.79
RMCTS-Tuned 5.962179 0.00003 27.28
Table 3.3: Experiment results against weak opponent, simplified problem.
Also from this test the RMCTS-Tuned is the best algorithm. It is significantly
better than the runner-up RMCTS (p-value is lower than 2.2 × 10−16). We can also
notice the strange behavior of the UCB1-TUNED. It is worse than Random Search
in this test (p-value is 1.394 × 10−5). It seems to converge to a specific solution
and it is the only algorithm in this test that did not find the optimal value as other
algorithms did. Nevertheless, the difference between the quality is low between all
algorithms.
We can also investigate the average quality of the solutions generated by the
algorithms in the three tests. The qualities are shown in table 3.4.
We can notice that the Random Search produces the lowest quality solutions with
around 20% lower quality than the UCB1-TUNED. Nevertheless, it remains quite
close to the other algorithms ad might end up quite high in the tests described in
section 3.1, among the other algorithms tested in this section. We can also notice that
Random Search Tuned has higher quality by almost 2% than the UCB1-TUNED.
However, as seen in the separate tests, this difference might not be statistically
significant (it was not significant in one case and significant in two cases, where
34
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski
thesis_Eryk_Kulikowski

More Related Content

What's hot

M.Tech_Thesis _surendra_singh
M.Tech_Thesis _surendra_singhM.Tech_Thesis _surendra_singh
M.Tech_Thesis _surendra_singhsurendra singh
 
Best of numerical
Best of numericalBest of numerical
Best of numericalCAALAAA
 
FDTD-FEM Hybrid
FDTD-FEM HybridFDTD-FEM Hybrid
FDTD-FEM Hybridkickaplan
 
Numerical methods by Jeffrey R. Chasnov
Numerical methods by Jeffrey R. ChasnovNumerical methods by Jeffrey R. Chasnov
Numerical methods by Jeffrey R. Chasnovankushnathe
 
Machine learning solutions for transportation networks
Machine learning solutions for transportation networksMachine learning solutions for transportation networks
Machine learning solutions for transportation networksbutest
 
Pawar-Ajinkya-MASc-MECH-December-2016
Pawar-Ajinkya-MASc-MECH-December-2016Pawar-Ajinkya-MASc-MECH-December-2016
Pawar-Ajinkya-MASc-MECH-December-2016Ajinkya Pawar
 
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...SSA KPI
 
Study of different approaches to Out of Distribution Generalization
Study of different approaches to Out of Distribution GeneralizationStudy of different approaches to Out of Distribution Generalization
Study of different approaches to Out of Distribution GeneralizationMohamedAmineHACHICHA1
 
vector spaces algebras geometries
vector spaces algebras geometriesvector spaces algebras geometries
vector spaces algebras geometriesRichard Smith
 
The analysis of doubly censored survival data
The analysis of doubly censored survival dataThe analysis of doubly censored survival data
The analysis of doubly censored survival dataLonghow Lam
 
Notes for GNU Octave - Numerical Programming - for Students 01 of 02 by Arun ...
Notes for GNU Octave - Numerical Programming - for Students 01 of 02 by Arun ...Notes for GNU Octave - Numerical Programming - for Students 01 of 02 by Arun ...
Notes for GNU Octave - Numerical Programming - for Students 01 of 02 by Arun ...ssuserd6b1fd
 

What's hot (15)

M.Tech_Thesis _surendra_singh
M.Tech_Thesis _surendra_singhM.Tech_Thesis _surendra_singh
M.Tech_Thesis _surendra_singh
 
Best of numerical
Best of numericalBest of numerical
Best of numerical
 
FDTD-FEM Hybrid
FDTD-FEM HybridFDTD-FEM Hybrid
FDTD-FEM Hybrid
 
Machine learning-cheat-sheet
Machine learning-cheat-sheetMachine learning-cheat-sheet
Machine learning-cheat-sheet
 
Differential equations
Differential equationsDifferential equations
Differential equations
 
Numerical methods by Jeffrey R. Chasnov
Numerical methods by Jeffrey R. ChasnovNumerical methods by Jeffrey R. Chasnov
Numerical methods by Jeffrey R. Chasnov
 
Machine learning solutions for transportation networks
Machine learning solutions for transportation networksMachine learning solutions for transportation networks
Machine learning solutions for transportation networks
 
Pawar-Ajinkya-MASc-MECH-December-2016
Pawar-Ajinkya-MASc-MECH-December-2016Pawar-Ajinkya-MASc-MECH-December-2016
Pawar-Ajinkya-MASc-MECH-December-2016
 
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
 
Study of different approaches to Out of Distribution Generalization
Study of different approaches to Out of Distribution GeneralizationStudy of different approaches to Out of Distribution Generalization
Study of different approaches to Out of Distribution Generalization
 
t
tt
t
 
vector spaces algebras geometries
vector spaces algebras geometriesvector spaces algebras geometries
vector spaces algebras geometries
 
The analysis of doubly censored survival data
The analysis of doubly censored survival dataThe analysis of doubly censored survival data
The analysis of doubly censored survival data
 
Notes for GNU Octave - Numerical Programming - for Students 01 of 02 by Arun ...
Notes for GNU Octave - Numerical Programming - for Students 01 of 02 by Arun ...Notes for GNU Octave - Numerical Programming - for Students 01 of 02 by Arun ...
Notes for GNU Octave - Numerical Programming - for Students 01 of 02 by Arun ...
 
Non omniscience
Non omniscienceNon omniscience
Non omniscience
 

Similar to thesis_Eryk_Kulikowski

Measuring Aspect-Oriented Software In Practice
Measuring Aspect-Oriented Software In PracticeMeasuring Aspect-Oriented Software In Practice
Measuring Aspect-Oriented Software In PracticeHakan Özler
 
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingBig Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingGabriela Agustini
 
A Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency AlgorithmsA Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency AlgorithmsSandra Long
 
Aspect_Category_Detection_Using_SVM
Aspect_Category_Detection_Using_SVMAspect_Category_Detection_Using_SVM
Aspect_Category_Detection_Using_SVMAndrew Hagens
 
Fundamentals of computational fluid dynamics
Fundamentals of computational fluid dynamicsFundamentals of computational fluid dynamics
Fundamentals of computational fluid dynamicsAghilesh V
 
Classification System for Impedance Spectra
Classification System for Impedance SpectraClassification System for Impedance Spectra
Classification System for Impedance SpectraCarl Sapp
 
biometry MTH 201
biometry MTH 201 biometry MTH 201
biometry MTH 201 musadoto
 
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOKMth201 COMPLETE BOOK
Mth201 COMPLETE BOOKmusadoto
 
Viktor Kajml diploma thesis
Viktor Kajml diploma thesisViktor Kajml diploma thesis
Viktor Kajml diploma thesiskajmlv
 
Mathematical modeling models, analysis and applications ( pdf drive )
Mathematical modeling  models, analysis and applications ( pdf drive )Mathematical modeling  models, analysis and applications ( pdf drive )
Mathematical modeling models, analysis and applications ( pdf drive )UsairamSheraz
 
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...valentincivil
 
Reading Materials for Operational Research
Reading Materials for Operational Research Reading Materials for Operational Research
Reading Materials for Operational Research Derbew Tesfa
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsArinto Murdopo
 
Lecturenotesstatistics
LecturenotesstatisticsLecturenotesstatistics
LecturenotesstatisticsRekha Goel
 

Similar to thesis_Eryk_Kulikowski (20)

Measuring Aspect-Oriented Software In Practice
Measuring Aspect-Oriented Software In PracticeMeasuring Aspect-Oriented Software In Practice
Measuring Aspect-Oriented Software In Practice
 
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable ComputingBig Data and the Web: Algorithms for Data Intensive Scalable Computing
Big Data and the Web: Algorithms for Data Intensive Scalable Computing
 
Big data-and-the-web
Big data-and-the-webBig data-and-the-web
Big data-and-the-web
 
A Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency AlgorithmsA Comparative Study Of Generalized Arc-Consistency Algorithms
A Comparative Study Of Generalized Arc-Consistency Algorithms
 
Aspect_Category_Detection_Using_SVM
Aspect_Category_Detection_Using_SVMAspect_Category_Detection_Using_SVM
Aspect_Category_Detection_Using_SVM
 
Fundamentals of computational fluid dynamics
Fundamentals of computational fluid dynamicsFundamentals of computational fluid dynamics
Fundamentals of computational fluid dynamics
 
Classification System for Impedance Spectra
Classification System for Impedance SpectraClassification System for Impedance Spectra
Classification System for Impedance Spectra
 
biometry MTH 201
biometry MTH 201 biometry MTH 201
biometry MTH 201
 
HonsTokelo
HonsTokeloHonsTokelo
HonsTokelo
 
main
mainmain
main
 
Mak ms
Mak msMak ms
Mak ms
 
Mth201 COMPLETE BOOK
Mth201 COMPLETE BOOKMth201 COMPLETE BOOK
Mth201 COMPLETE BOOK
 
Viktor Kajml diploma thesis
Viktor Kajml diploma thesisViktor Kajml diploma thesis
Viktor Kajml diploma thesis
 
Mathematical modeling models, analysis and applications ( pdf drive )
Mathematical modeling  models, analysis and applications ( pdf drive )Mathematical modeling  models, analysis and applications ( pdf drive )
Mathematical modeling models, analysis and applications ( pdf drive )
 
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
 
Jmetal4.5.user manual
Jmetal4.5.user manualJmetal4.5.user manual
Jmetal4.5.user manual
 
Reading Materials for Operational Research
Reading Materials for Operational Research Reading Materials for Operational Research
Reading Materials for Operational Research
 
thesis
thesisthesis
thesis
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
Lecturenotesstatistics
LecturenotesstatisticsLecturenotesstatistics
Lecturenotesstatistics
 

thesis_Eryk_Kulikowski

  • 1. Learning optimal line-ups for Soccer Games Ir. Eryk Kulikowski Thesis submitted for the degree of Master of Science in Artificial Intelligence, option Engineering and Computer Science Thesis supervisor: Prof. dr. Jesse Davis Assessors: Prof. dr. ir. Dirk Roose Ir. Jan Van Haaren Mentors: Ir. Aäron Verachtert Ir. Jan Van Haaren Academic year 2014 – 2015
  • 2. c Copyright KU Leuven Without written permission of the thesis supervisor and the author it is forbidden to reproduce or adapt in any form or by any means any part of this publication. Requests for obtaining the right to reproduce or utilize parts of this publication should be addressed to the Departement Computerwetenschappen, Celestijnenlaan 200A bus 2402, B-3001 Heverlee, +32-16-327700 or by email info@cs.kuleuven.be. A written permission of the thesis supervisor is also required to use the methods, products, schematics and programs described in this work for industrial or commercial use, and for submitting this publication in scientific contests.
  • 3. Preface I would like to thank my supervisor professor Jesse Davis for giving me the opportunity to work on this thesis. I would also like to thank my mentors, Jan Van Haaren and Aäron Verachtert for their advise, and especially Aäron Verachtert for making his thesis and his code available as a starting point for this thesis. I would also like to thank my wife Debby Van Dam and other family members and friends for their encouragement and support. Ir. Eryk Kulikowski i
  • 4. Contents Preface i Abstract iii List of Figures and Tables iv 1 Introduction 1 1.1 Background and motivations . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Analysis of the Problem 3 2.1 Drone delivering packages routing problem . . . . . . . . . . . . . . 4 2.2 Solution methodologies for routing optimization problems . . . . . . 11 2.3 Chosen approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Monte Carlo Tree Search (MCTS) Approach 13 3.1 Existing work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.2 Proposed improvements . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Comparison of the MCTS algorithms for Hattrick . . . . . . . . . . . 30 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4 Genetic Algorithm Approach 37 4.1 A brief introduction to Genetic Algorithm . . . . . . . . . . . . . . . 37 4.2 Proposed Mutation operators . . . . . . . . . . . . . . . . . . . . . . 40 4.3 Proposed Crossover operators . . . . . . . . . . . . . . . . . . . . . . 43 4.4 Comparison with the MCTS algorithms . . . . . . . . . . . . . . . . 57 4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5 Conclusion 65 A The Evaluation Experiment Results 69 Bibliography 73 ii
  • 5. Abstract This thesis describes the lineup optimization problem from a new perspective, where the link with the combinatorial problems and the TSP problem in particular becomes apparent. From this new perspective the graph representation of the problem is derived and then consequently used throughout the whole thesis. That graph representation is then used to identify the weaknesses in the existing algorithm [31] for lineups optimization. These weaknesses are then addressed and a new variant of Monte Carlo Tree Search, the Recursive Monte Carlo Tree Search (RMCTS) algorithm is derived that runs on the problem in full complexity. The proposed algorithm is described in general terms and could be used in other combinatorial problems, e.g., the Traveling Salesman Problem. This thesis also describes a greedy search variant, the Greedy Search Tuned algorithm, that performs very well in the context of Hattrick lineups optimization. This algorithm performs better than all variants of the Monte Carlo Tree Search and is only outperformed by the genetic algorithms described in this thesis. Furthermore, this thesis investigates the possibility of using machine learning algorithms for improvement of the performance of the genetic algorithms. This investigation is done using the Rechenberg’s success rule for labeling of the offspring where two methods are derived for using these labels in the repair process. One of the methods uses the Naive Bayes approach while the other method derives a predictive formula for valuation of the building blocks in the repair process. These methods are then shown to be applicable in other combinatorial problems like Traveling Salesman Problem and job shop scheduling, possibly also beyond the repair step. As an alternative to the use of the Rechenberg’s success rule an other algorithm, i.e., the Recursive Monte Carlo Tree Search proposed in this thesis, is investigated in the context of the repair step. iii
  • 6. List of Figures and Tables List of Figures 2.1 Vertices from the graph representation of the problem. . . . . . . . . . . 6 2.2 Equivalent solutions of the problem. . . . . . . . . . . . . . . . . . . . . 8 2.3 Example solutions showing only relevant edges. . . . . . . . . . . . . . . 8 3.1 Canadian traveler problem graph example, Bnaya et al. [6]. . . . . . . . 14 3.2 Tree representation of the problem. . . . . . . . . . . . . . . . . . . . . . 15 3.3 Equivalent tours in the tree representation. . . . . . . . . . . . . . . . . 16 3.4 Equivalent tours in the tree representation. . . . . . . . . . . . . . . . . 17 3.5 Different solutions of the problem. . . . . . . . . . . . . . . . . . . . . . 18 3.6 Outline of a Monte-Carlo Tree Search, Chaslot et al. [10]. . . . . . . . . 20 3.7 Experiments with the numeric fitness evaluation. Two types of simulation are explored: fully random (left) and semi-random (right), Verachtert [31]. 22 3.8 Experiments with the nominal fitness evaluation. Two types of simulation are explored: fully random (left) and semi-random (right), Verachtert [31]. 22 3.9 Initialization/expansion step. . . . . . . . . . . . . . . . . . . . . . . . . 26 3.10 Selection/simulation step. . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.11 Backpropagation step. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.12 Box-plots of the results against similar opponent, simplified problem. . . 32 3.13 Box-plots of the results against strong opponent, simplified problem. . . 33 3.14 Box-plots of the results against weak opponent, simplified problem. . . . 34 4.1 Individual. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 The mutation operator for the simplified problem. . . . . . . . . . . . . 40 4.3 The swap two players mutation operator. . . . . . . . . . . . . . . . . . 41 4.4 The change behavior mutation operator. . . . . . . . . . . . . . . . . . . 41 4.5 The change position mutation operator. . . . . . . . . . . . . . . . . . . 42 4.6 Crossover. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.7 Box-plots of the results against similar opponent. . . . . . . . . . . . . . 54 4.8 Box-plots of the results against strong opponent. . . . . . . . . . . . . . 54 4.9 Box-plots of the results against weak opponent. . . . . . . . . . . . . . . 55 4.10 Box-plots of the results from the TSP experiment. . . . . . . . . . . . . 57 4.11 Box-plots of the results against similar opponent. . . . . . . . . . . . . . 59 iv
  • 7. 4.12 Box-plots of the results against strong opponent. . . . . . . . . . . . . . 59 4.13 Box-plots of the results against weak opponent. . . . . . . . . . . . . . . 60 4.14 Box-plot of the paired difference between GA and GST (GA-GST) from the evaluation experiment. . . . . . . . . . . . . . . . . . . . . . . . . . . 61 List of Tables 3.1 Experiment results against similar opponent, simplified problem. . . . . 32 3.2 Experiment results against strong opponent, simplified problem. . . . . 33 3.3 Experiment results against weak opponent, simplified problem. . . . . . 34 3.4 Average quality of the solutions generated by the tested algorithms, simplified problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.1 Exemplary edge map of the parent tours for an ERX operator, Affenzeller et al. [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 Experiment results against similar opponent, simplified problem. . . . . 52 4.3 Experiment results against strong opponent, simplified problem. . . . . 52 4.4 Experiment results against weak opponent, simplified problem. . . . . . 53 4.5 Experiment results against similar opponent. . . . . . . . . . . . . . . . 53 4.6 Experiment results against strong opponent. . . . . . . . . . . . . . . . . 55 4.7 Experiment results against weak opponent. . . . . . . . . . . . . . . . . 55 4.8 Experiment results from the TSP experiment. . . . . . . . . . . . . . . . 57 4.9 Experiment results against similar opponent. . . . . . . . . . . . . . . . 58 4.10 Experiment results against strong opponent. . . . . . . . . . . . . . . . . 60 4.11 Experiment results against weak opponent. . . . . . . . . . . . . . . . . 60 A.1 The evaluation experiment results. . . . . . . . . . . . . . . . . . . . . . 72 v
  • 8.
  • 9. Chapter 1 Introduction 1.1 Background and motivations This thesis investigates algorithms for lineup optimization for soccer games in the context of Hattrick online football management game [18]. One of the tasks in that game is defining the lineups for the upcoming game, where choosing the right set of players and placing them strategically on the field can influence the outcome of the game, i.e., it can make the difference between loosing and winning. Hattrick also allows giving individual instructions to the players on the field, which further increases the complexity of choosing a well performing lineup. The choice for using an online game is convenient since all possibilities for the lineup specification are well defined within the game rules. Furthermore, an initial work has been done in that area [31], where large amount of data was collected and a tool was made that can predict the outcome of the game based on the information on the opponent and the lineup defined for the own team. This allows to evaluate the proposed lineups and thus design algorithms that learn the optimal lineups for specific Hattrick games. However, the proposed algorithms in that context could be also used in the context of different football management games for lineups optimization, and possibly even in the context of real-life soccer games. Nevertheless, the evaluation of proposed lineups remains a hard problem on itself for real soccer games and is out of scope in this thesis. Furthermore, lineups optimization is a combinatorial problem. As a consequence, not only the algorithms for lineups optimization can borrow ideas from the algorithms in the context of combinatorial problems, but also the algorithms designed for lineups optimization could inspire new ideas for solving other combinatorial problems. Therefore, this thesis takes a broader perspective on the lineups optimizations and views them in the light of the combinatorial problems, where the Traveling Salesman Problem (TSP) gets special attention in this thesis. In the Traveling Salesman Problem [36] the goal is to find the shortest possible route visiting all cities from the list of given cities. In this problem the distances between each pair of cities are given and each city must be visited exactly once. The tour then originates and ends in the same city. It is one of the classical problems in 1
  • 10. 1. Introduction combinatorial optimization and it is important in theoretical computer science and operations research. 1.2 Objectives of the thesis As mentioned earlier, initial work has been already done in the context of lineups optimizations for Hattrick games [31]. That work includes definition of several variants of Monte Carlo Tree Search algorithm for lineups optimization. However, these algorithms have a drawback of large memory requirements and as a consequence can only be used in the simplified version of the problem of lineups optimization. The first objective of this thesis is then to investigate the possibility of improving the existing algorithms such that they can also be used on the problem with full complexity. The second objective of this thesis is to propose new algorithms for lineups optimization. For this part, the possibility of using genetic algorithms is investigated. The proposed algorithms should be then usable in the context of the problem in full complexity and run in acceptable computation time. The third objective is then to evaluate the performance of the algorithms in terms of the quality of the obtainable solutions within an acceptable computation time. All algorithms described in this thesis are then evaluated in the context of chosen specific cases, where the best algorithms will be evaluated in a more realistic context using the actual teams from the Hattrick competition. 1.3 Overview A good understanding of the given problem is important. Therefore chapter 2 analyzes the given problem in more details and makes the link to other combinatorial problems. This chapter also proposes a graph representation of the given problem that will be used throughout the whole thesis. The existing work is studied in more detail in chapter 3, where also the improvements to the existing algorithms are proposed. The resulting new algorithms are compared with the existing algorithms in the context of the simplified problem in that chapter. The usability of genetic algorithm in the context of the given problem is then studied in chapter 4. This chapter also evaluates the best algorithms in the context of the actual teams and matches from the Hattrick competition. Chapter 5 concludes this thesis. 2
  • 11. Chapter 2 Analysis of the Problem NP-hard (Non-deterministic Polynomial-time hard) problems [35] can be solved using different techniques. When solving an NP-hard problem, this problem is often modeled as one of the known problems, e.g., Traveling Salesman Problem, Knapsack Problem, or Nursing Scheduling Problem, and than it is solved using one of the techniques known to produce good solutions for that problem. The goal of this chapter is then to model the given problem, i.e., optimizing the soccer lineups in the context of the Hattrick setting, in terms that correspond more to the classical combinatorial problems. Doing that permits to gain new insights, where the link between the given problem and the other NP-hard problems becomes more clear. This new insights can be then used in the following chapters for designing better algorithms, either based on Monte Carlo or Genetic Algorithm techniques. The chosen approach for this thesis is to model the given problem as a routing problem. The choice of how the problem is modeled has influence on the performance of the algorithms, as resulting representations can be handled differently with algorithms specifically designed for these representations. Also, the NP-hard problems are related with each other and it is commonly assumed that finding an efficient solution to one problem can lead to efficient solutions for the other [35]. Therefore, it is a good idea to keep in mind how this type of problems are solved in different contexts. For example, one of the studies [4] shows how capacitated vehicle routing problems and job shop scheduling problems are related and how insights from these methods can improve the algorithmic performance on both types of problems (although, that article focuses mainly on the vehicle routing problems). As it should be clear after reading this chapter, both of these types of problems are related to the soccer lineup optimization problem. Also the bin packing problem [34], or more in general cutting stock problem, or more specifically knapsack problem, are related to the given problem, etc. Nevertheless, as it can be seen later in this text, very interesting insights can be gained with looking at the problem of finding optimal lineups as a Traveling Salesman Problem [36]. It is a well studied problem, especially in the context of the genetic algorithms [1], where it is one of the classical problems. Also Monte Carlo 3
  • 12. 2. Analysis of the Problem methods have been studied in this context [25, 23, 5, 30, 7]. When solving the lineups problem in the context of genetic algorithms, very good operators can be used (as shown later) based on the operators designed for the Traveling Salesman Problem (TSP), where certain simplifications can be made to further improve the algorithm. Also the Monte Carlo Tree Search technique can be improved in that context, where the work in the context of optimal lineups [31] has a drawback with exponential use of memory. Insights from studying the TSP problem not only reduce exponentially the memory usage in that technique, but also improve the computation time. Therefore, it is crucial to understand how the optimal lineups problem relates to the above mentioned problems. In order to make that link clear, section 2.1 describes the drone delivering packages routing problem, which is a routing problem by definition. As can be seen from that section, this problem is closely related to the TSP and the capacitated vehicle routing problems, and thus as mentioned earlier, to the job shop scheduling problems, etc. That section also describes how this new defined problem relates to the optimal lineup search for soccer games. Section 2.2 builds on these findings and discusses possible solution strategies that can be chosen. Finally, section 2.3 describes and motivates the solution strategies chosen for this thesis. 2.1 Drone delivering packages routing problem In order to avoid certain pitfalls, like for example Hattrick is a game and thus the optimal lineups can be found with an algorithm designed for game playing and this problem has nothing to do with the TSP, it seems to the author of this thesis that it is useful to start from a new perspective and forget the lineups optimization and Hattrick setting for a moment. Therefore, section 2.1.1 describes a new type of problem, a specific case of a drone delivering packages routing problem. Only then, when the new problem is defined and the links to the other combinatorial problems are clear, section 2.1.2 explains that by solving the newly defined problem, the problem of the optimal lineups is also solved. 2.1.1 Problem description Imagine the following problem. A certain company has different factories producing ink for 3D printers. The factories are different, as they produce different types of ink (e.g., with different physical properties of the ink, like different colors, granularity, drying time, etc.). These factories have also different locations geographically. Further, the same company has different printing factories where it can print different objects. However, when using ink from different ink producing factories, the results vary for a specific factory, i.e., ink from some factories produces better results than from other factories in a given printer. Also, the cost of printing an object, even when the resulting object has the same quality, can be different when using different ink. Furthermore, the printing factories have also different locations geographically. 4
  • 13. 2.1. Drone delivering packages routing problem In this situation, any ink can be used on any printer, however, the result will have different printing cost and can result in a different value for the printed object. Also, some factories could be grouped by sharing the same resources. For example, there could be a group of five printing factories of similar properties producing very good results, but they share the maintenance team. Therefore, having all of these five printing factories working on full capacity could result in a penalty (e.g., small delay of delivering the products) caused by the constraints on the maintenance team. Nevertheless, there are more printing factories then strictly necessary to meet the demand on the printed objects. Also the ink producing factories have a larger capacity than strictly necessary. Furthermore, all factories (ink producing factories and the printers) are automated, and it is more efficient to make some factories work at full capacity and keep the remaining factories idle, than to make all of the factories work at limited capacity. Once that the objects are printed, they need to be shipped to the customers. More in particular, the printed objects can be shipped using different strategies. For example, the printed objects could be delivered using drones, local postal services, commercial package delivering services (e.g., DHL, UPS), etc. Also here we have different geographical locations of the distribution centers (i.e., there can be several post services distribution centers located near several printing factories, etc.), and the printed objects need to be transported to the distribution centers. Furthermore, some distribution centers could be to far from the printing factories and are not considered as valid options for certain printing factories. In other words, only a subset of all possible options are available as shipment strategies from a specific printing factory. Also, since the distances to the local distribution centers vary between the different printing factories, the cost of using specific distribution strategy will be different at each factory. Nevertheless, just as with the grouping of printing factories (e.g., shared mainte- nance team, as described before), also the distribution strategies can have grouping effects. For example, when one particular strategy is used often, the company could get better deal on the shipments. It is possible that we would also have more local effects, for example, where several printing factories are located near a particular distribution center, using that one center could reduce the cost, etc. We can summarize the problem described above as a logistics problem, where we need to transport ink from ink producing factories to printing factories, and the printed objects from printing factories to distribution centers. Note that we do not necessarily need to wait for the objects to be printed at the factory after delivering the ink. For example, the printing factory could have limited stock of the ink from all possible factories and start printing once that the whole process is scheduled. The printed object would be then ready on arrival, and the delivered ink would then serve as replacement of the used ink. Now, imagine that the company wants to use drones for the transporting of the ink and the objects. Such drone can then only carry one package at a time, either ink or a printed object. Assume that this drone uses additional fuel when carrying a package, but when it is flying empty it is only using solar power. Also, there is no time constraint, i.e., all scheduled deliveries can be performed in a single 5
  • 14. 2. Analysis of the Problem day during the daylight. In other words, only the edges (i.e., when we represent the problem as graph, see also later) between the ink producing factories and the printing factories, and the edges between the printing factories and the distribution centers are relevant to the problem. The goal is then to minimize the total cost of the production, printing, distribution and transportation. Furthermore, we add a constraint that each ink producing factory and each printing factory can be visited at most once in a single tour (just like in the Traveling Salesman Problem). This can be motivated by the requirement that we want to spread the load over as many factories as possible. For example, the company could heavily rely on solar power and using the power at place where it is produced is most efficient, while storing the electrical power is quite expensive (we want to use as much as possible the solar power when it is being produced). It is then interesting to represent this problem with a graph. Figure 2.1 shows the vertices (or the graph nodes) for a simple variant of the above described problem. The red points are the printing factories. As can be seen on that figure, there are four printing factories. Lets assume that we are interested in an optimal solution that visits exactly three printing factories, i.e., one printing factory is not visited in the solution. The blue points are then the ink producing factories (there are eight ink producing factories). As any ink can be used in any printer, all edges that connect a blue and a red point are valid. Since we visit only three printing factories, any valid solution would also visit only three ink producing factories. The black points represent then the distribution centers. As mentioned above, only the distribution centers that are near the printing factory can be visited from that printing factory. In practice, we could simply enumerate the valid edges from the red to black points. For the purpose of this illustration, we can easily see on the figure that there are two printing factories that have two nearby distribution centers, one printing factory that has one nearby distribution center, and one factory that has three nearby distribution centers. Finally, the green point represents then the drone depot. Figure 2.1: Vertices from the graph representation of the problem. Note that the graph representation is a weighted graph. In the given example, for illustration purposes, the weights of the edges can be assumed to be proportional to the distance between the points. However, the weight of the edge includes also the cost of production or distribution, not only the cost of the drone fuel. For the 6
  • 15. 2.1. Drone delivering packages routing problem edges between the red and blue points (or the blue edges on figure 2.2), the weight would be then mainly defined by the quality of the match between the ink and the printer, as these edges include the cost of the ink and the printing. We can assume that the cost of the used fuel is low in comparison to the other cost, and does not influence the total cost much. The same can be said about the edges between the red and the black points (the red edges on figure 2.2), where the cost of the fuel is also negligible and the weight of these edges is mainly defined by the distribution cost. The remaining edges between the black and the blue points, or between the green point (the drone depot) and a blue or black point (the green edges on figure 2.2) have zero cost, as the drone is flying without the load. Figure 2.2 illustrates then several equivalent solutions that could have been found using different methodologies. Figure 2.2(a) shows then a solution with single tour where we start and stop the tour at the drone depot. This kind of solution could be found with an algorithm similar to an algorithm used for the Vehicle Routing Problem with Pickup and Delivery (VRPPD) [37]. Figure 2.2(b) shows a solution more similar to the classical Vehicle Routing Problem (VRP) [37], where we have three subtours from the depot, i.e., we could use three drones in that situation, where each drone does only one subtour. Finally, figures 2.2(c) and 2.2(d) show more classical TSP situation, where we do not have the drone depot and we make a tour where we visit only a subset of all possible cities. This solution is then closely related to the traveling purchaser problem, where the TSP [36] is a special case of that problem. Since the green edges have zero cost, all of these solutions are perfectly equivalent. As mentioned earlier, this thesis mainly takes its inspiration from the TSP for the algorithms, it is then interesting to take a closer look at the figures 2.2(c) and 2.2(d). In this representation there is no drone depot. We could model that, for example, as the case where each ink factory has a drone at its disposal and the route can originate at any of the scheduled ink factories with a drone present there, etc. Also, many of the TSP algorithms assume complete and symmetric graphs. The graph seems not to be symmetric at first, as we need to visit the blue point before red, and red before black, which adds direction to the edges. However, any correct solution that uses only the edges that are allowed (i.e., red, blue or green edges) can be trivially transformed to a valid solution by reversing the order of the visited points if needed. The only constraint that we need to add for the solution to be correct is that any red point in the solution path must be directly connect to a blue and a black point, and we cannot have two blue or two red edges that are directly connected. Note that this constraint does not need to be checked in the algorithms proposed in this thesis. It is only described as explanation of the relation between the defined problem and the pure TSP. In order to summarize, the defined problem is a TSP problem where we only visit a subset of all possible cities (nine cities in the given example), and where we cannot have two blue edges or two red edges that are directly connected. It is worth to note that the added constraints, if they must be evaluated by an algorithm, can be evaluated in polynomial time. One remaining detail is the completeness of the graph. It is quite common to 7
  • 16. 2. Analysis of the Problem (a) Single tour with depot (b) Multiple subtours with depot (c) Single tour without depot (d) Alternative single tour Figure 2.2: Equivalent solutions of the problem. (a) First solution (b) Second solution Figure 2.3: Example solutions showing only relevant edges. assign high weights to the edges that are not allowed, so they will not be present in the optimal solutions [36]. However, just as with the above mentioned constraint, the proposed algorithms do not require the completeness of the graph. In fact, the proposed algorithms only use the relevant edges (red and blue edges) as shown on figure 2.3. Nevertheless, the green edges can be used to illustrate the relations between the proposed algorithms and the algorithms for the TSP. Furthermore, as 8
  • 17. 2.1. Drone delivering packages routing problem discussed earlier, we can also find relations to other problems as the TSP is not the only way to model the given problem. For example, solutions as shown on figures 2.3(a) and 2.3(b) could have been found with algorithms inspired on the job shop scheduling, nursing scheduling or knapsack problem optimization, etc. Another aspect that is not yet discussed in sufficient detail is the definition of the weights of the graph. We can see from the discussions before that the edge weights are not constant and are dependent on the final path. For example, as discussed before, we have the grouping effects of the printing factories (maintenance penalties) and distribution centers (discount when certain strategy is used often). Also other, not well defined factors can influence the cost of an edge. In fact, it is very difficult if not impossible to very well estimate the cost of a single edge (except for the green edges that have zero cost and do not contribute to the quality of the solution). Nevertheless, the edge weight vary only to a certain extend for different paths. As a consequence, we can not use the heuristic based on the weight of a single edge in the algorithms since that weight is not known explicitly. Since such heuristic is very important in many combinatorial algorithms, this is then an important aspect for this thesis where certain heuristics will be proposed that estimate a value of an edge without knowing its true weight. Furthermore, the total cost of a path is also very difficult to calculate. It might be that not all aspects of the cost function are fully known. Nevertheless, it is assumed that at least a very good estimation of the cost can be calculated with a use of a black-box model that takes as input the red and blue edges (green edges are not relevant to the quality of the solution and are not evaluated by the model). This black-box model could be then defined by experimentation and the use of machine learning algorithms, i.e., we have a model that can predict the cost with a certain accuracy in polynomial time complexity. 2.1.2 Relation with the Hattrick problem The soccer lineup optimizations in the context of the Hattrick [31] can be represented with the same graph as the drone delivering packages routing problem described earlier. Also, all of the discussions from section 2.1 still hold for this problem. In the context of the Hattrick, we have fourteen possible positions on the field (red points in the graph), we have around 20 players (blue points on the graph, the number of available players can vary between different clubs) that can be assigned (blue edges) to the positions on the field. At each position we have also a set of possible individual instructions (black points on the graph) that we can assign (red edges on the graph) to the players on respective positions. The individual instructions could be then, for example, normal behavior, defensive strategy, offensive strategy, etc. Note that not all possible instructions are available at all positions, for example, the goalkeeper can be assigned only the normal behavior, etc. There are at most four different strategies possible at any position, i.e., a subset of five different strategies in total. The goal is then to assign eleven different players to eleven different positions on the field and to assign one instruction at each position in an optimal way that 9
  • 18. 2. Analysis of the Problem maximizes the winning probability (or minimizes the losing probability). In other words, we look for an optimal set of blue and red edges, as shown on figure 2.3. Arguably, since the graph representation is the same for both problems, all of the discussions from previous section could have been introduced using the original problem and introduction of a new problem with drones delivering packages seems unnecessary. However, as discussed earlier, introduction of the new problem helps to avoid certain pitfalls that might lead to very poor solutions of the original problem. Also, the drone problem allows for a natural introduction of the green edges that greatly contribute to the understanding of the algorithms discussed in this thesis. Not only the new algorithms contributed by this thesis become more understandable. The graph representation and especially the green edges allow to easily see the weaknesses of the already existing algorithm [31] for this problem, and help to illustrate the way how to address them, as it will be shown in the next chapter. For that reason the references to the original soccer lineup problem and the Hattrick setting are brought to the absolute minimum in this thesis as they mainly lead to confusions on the principles of the workings of the algorithms. Nevertheless, a very good description of the soccer lineup problem in the context of the Hattrick game can be found in [31]. It is then a recommended reading for any interested reader. This section focuses then only on the relation of the soccer lineups optimization in the context of Hattrick to the earlier described drone delivering packages problem. One important aspect in that context is then the calculation of the cost of the path (or the fitness function in the terms of genetic algorithms). In the context of the drone problem, it was explained that we only have the black-box model that estimates the cost with certain accuracy. Also, as mentioned earlier, we only use the blue and red edges for evaluation in the black-box model, as the green edges have zero cost and do not contribute to the final solution, i.e., the green edges are not relevant for the total cost. In the context of the Hattrick, the black-box model is then the match outcome prediction tool developed with Weka [17] in the context of [31] and it is integrally reused in the context of this thesis. According to the description from [31], the outcome for the match is predicted based on the formation of the final state and the subcategory ratings of the opponent. It is not possible to directly request an outcome from the Hattrick match simulator. The outcome is also nondeterministic. Therefore, Verachtert [31] has developed a prediction system based on a collection of data instances for previously played matches with the use of machine learning. Two different tools where developed with two different outcome representations: a nominal and a numeric representation. From the findings in [31], the numeric representation results in more accurate algorithms and it is then used in this thesis. Also, just as in the original work [31], in this thesis the goal difference is used as the fitness function, i.e., the prediction of the outcome is transformed to one numeric value that corresponds to the difference between the goals made by the own team and the goals made by the opponent (positive value indicates winning the match and negative value losing). Notice that we want then to maximize the fitness. In the context of the drone delivering packages problem the goal was to minimize the cost. One can be trivially transformed to the other by flipping the sign, i.e., the cost is 10
  • 19. 2.2. Solution methodologies for routing optimization problems then the negative of the fitness. One other important aspect is the estimation of the weight of a single edge. As mentioned before, weight of a single edge is of importance for many algorithms as a heuristic. However, since we have a black-box model for the cost function of the total solution, it is not easy, if not impossible, to define a heuristic for the single edge. One interesting solution for that is described in [31], where such a heuristic is used for the estimation of the value of a pair of edges, i.e., a red and a blue edge with a common red point in the terms of the earlier described graph representation (or a player with assigned position and strategy in the terms of Hattrick). For that purpose the used heuristic is the VnukStats [19]. This heuristic is designed for evaluation of a full lineup, and as described in [31], it is a good indicator of the performance of a lineup. VnukStats evaluates the whole lineup, however, we can let certain positions empty and they simply do not contribute to the total value. This is then exploited by comparing a partial lineup with the same partial lineup having one extra player assigned to a position and with defined strategy. The difference of the values between the two partial lineups indicates then the value of the extra assignments, i.e., the extra red and blue edges. As described in [31], this type of heuristic works very well for this problem. This can be extended to a single edge by, for example, fixing the strategy to the standard strategy and varying only the assigned player, what would give an estimate of the blue edges. An other possibility is fixing the assigned player-position pair (blue edge) and varying the strategy (red edge), what would give an estimate for a red edge. However, VnukStats also illustrates that the contribution of a single edge to the total solution is quite complex and it is not constant. For example, when we fill all midfield positions, there will be some penalty and the total weight of the midfield edges would be lower. In the context of the drone delivering packages it was described as a grouping factor, e.g., the maintenance penalty if certain factories are all working at full capacity. This illustrates that the heuristic as described above gets more accurate when we evaluate certain edge in the context were more edges are fixed. When we would like to use that heuristic to evaluate a single edge without any other edges in the context, this estimation would be less accurate. 2.2 Solution methodologies for routing optimization problems The graph representation of the problem as described above shows that we can inspire our algorithms on the algorithms for the routing problems. Also links to other combinatorial problems could be made, nevertheless, the routing problems and especially the problems related with the Traveling Salesman Problem might be an interesting source of inspiration. Therefore, the graph representation and the TSP are very important for these thesis. Since we can inspire the algorithms on the TSP problem, it is then interesting to briefly discuss the existing approaches for solving that problem. However, the TSP is a classical problem in computer science and there is large amount of different 11
  • 20. 2. Analysis of the Problem algorithms designed in that context [2]. Almost every domain of computer science has its approach. There is even separate study domain, combinatorial optimization, that specifically focuses on combinatorial problems, where the TSP is one of the most important problems that are studied. Because of that, we only focus on few approaches that are close to the context of this thesis, i.e., within the domain of Artificial Intelligence and Machine Learning. In the context of Artificial Intelligence, Genetic Algorithms are one of the classical approaches to the TSP. For example, Affenzeller et al. explain the genetic algorithms in a book [1] which uses the TSP and its generalized variant, Vehicle Routing Problem, in many of its examples. That book is then an important inspiration for this thesis. Also, Genetic Algorithms end Evolutionary Computing are a part of the domain of nature inspired algorithms. In that context, for example, ant colony [13] is a widely known approach to the TSP. This are only few of many approaches in this context. For example, if we look at the machine learning specifically, we can also notice some interesting examples. One such interesting approach, not only to TSP but to combinatorial problems in general, is the use of Hopfield network [20] in the context of artificial neural networks. In fact, TSP is one of the most known and studied problems, and almost all domains have its own approach to it. 2.3 Chosen approach Because of the large amount of possible algorithms for the combinatorial problems, as discussed earlier, certain selection must be made for this thesis. Since this thesis can be seen as continuation of the work presented in [31], this thesis investigates the Monte Carlo Tree Search (MCTS) algorithm presented there and proposes certain improvements. As an additional approach Genetic Algorithms are chosen, as they are a generic approach to optimization problems and can be easily adapted and implemented in the context of the problem as discussed in this thesis. The two approaches can be then studied in more depth and be compared with each other. Also, both approaches as described in this thesis are greatly inspired on the TSP. The graph representation as described in this chapter is then an important illustration for better understanding of the proposed algorithms. 2.4 Conclusion This chapter has described the lineup optimization problem from a new perspective where the link with the combinatorial problems and the TSP problem in particular became apparent. The resulting graph representation is then an important repre- sentation of this problem and will be used extensively in the following chapters of this thesis. The graph representation will then serve as an important tool for better understanding of the existing algorithm [31] and the newly proposed algorithms. 12
  • 21. Chapter 3 Monte Carlo Tree Search (MCTS) Approach This thesis can be seen as continuation of the work presented in [31]. Therefore, it is interesting to study more closely the algorithm proposed by that work. Section 3.1 describes then the already existing algorithm. Next, in section 3.2 certain improvements to that algorithm are proposed. The resulting algorithms are then compared with the existing algorithms in section 3.3. Section 3.4 concludes this chapter. 3.1 Existing work Although certain Monte Carlo Tree Search algorithms where already proposed in the context of the combinatorial problems, like for example in [6], these are typically applied on different types of problems and are not directly applicable on the problem as presented in this thesis. For example, the work presented in [6] discusses an algorithm for Repeated-task Canadian traveler problem. In that problem we also have a not fully connected graph, as in our problem, but we have a specific start city and a target city, as illustrated on figure 3.1. We can thus naturally derive the tree representation for that problem, where the root of the tree is the start city. We consider only paths that visit each city at most once (otherwise the path would not be optimal, as we would have a loop that would make the solution not optimal). Furthermore, not all cities need to be visited, another similarity to our problem. A valid solution starts then at the root of the tree (the start city) and ends at the target city (one of the leafs of the tree). The cities correspond then with the nodes of the tree, while the tree branches correspond to the edges on the graph. It is interesting to notice that each path from the root of the tree down to a leaf represents a unique solution path on the graph. This is then one of the typical problems, i.e., problems that have natural tree representation, where the tree search methods and Monte Carlo Tree Search in particular, are applied. In this example, Bnaya et al. apply one of the UCT (Upper Confidence bound applied to Trees) variants of the Monte 13
  • 22. 3. Monte Carlo Tree Search (MCTS) Approach Carlo Tree Search (MCTS) algorithm, also investigated in [31] in the context of our problem. Figure 3.1: Canadian traveler problem graph example, Bnaya et al. [6]. As discussed above, we can identify many similarities between the example above and our problem. Nevertheless, in the optimal lineups problem we do not have start and target nodes (we can assign position to players in any order), and we do not have a natural tree representation of the problem. This makes the method described in [6], as other MCTS algorithms applied on combinatorial problems, not directly applicable on our problem. It is then interesting to investigate the tree representation of our problem as presented in [31]. In that context, the analogy with the drone delivering packages problem, as described in the previous chapter, contributes greatly to better understanding of that representation and is used in the illustration of that representation as shown on figure 3.2. In that work the root is the initial state and does not contain any elements of the solution. In the terms of the graph representation, the root does not contain any blue or red (or green) edges. Using the analogy with the drone delivering packages problem, the root can be represented by the drone depot, i.e., the green point of the graph. The initial situation is then illustrated on figure 3.2(a) in the context of the drone problem. In the context of the drone problem, we can go from the depot (green point) to any ink producing factory (blue point) and then to any of the printing factories (red point), followed by a visit in one of the corresponding distribution centers (black point). In the context of lineup optimizations, we can assign a player to a position and then assign a behavior. Such combination (assigning player, position and behavior) is one action in the model described in [31], i.e., it is represented by a single node. In other words, a node contains a red and a blue edge, where such 14
  • 23. 3.1. Existing work (a) Initial state (b) Expansion (c) Selection (d) Complete path Figure 3.2: Tree representation of the problem. pair must be connected in a single red point. Figure 3.2(b) shows the possible red and blue edges for one of the red points. For the clarity of the figure, the other possible red and blue edges (connected with the other red points), are not shown on the figure, but they are also valid choices. We select then one of the nodes and we move to that node. This is illustrated on figure 3.2(c), where the blue and the red edge represent a single node, and the green edge represents the branch from the root (green point) to that node. We can then move deeper in the structure of the tree, and go to the next node, etc., until the path is complete. In the context of the lineup optimizations, the path is complete when we have selected eleven nodes. In other words, the depth of the tree is eleven, where at the lowest level we have the leaf nodes. In the simple example as shown on figure 3.2(d), the complete path contains three nodes, i.e., we visit three out of four printing factories and thus the depth of the tree is three. It is important to notice that the branches of the tree are represented by the green edges, while the nodes are represented by the pairs of blue and red edges. It is then interesting to investigate the properties of the proposed tree representa- tion of the problem. In the context of the drone delivering packages problem, we can complete the path for the drone by simply adding a green edge from the last visited distribution center (the last black point in the path shown on figure 3.2(d)) back to the drone depot (the green point on the graph, or the root of the tree). This is 15
  • 24. 3. Monte Carlo Tree Search (MCTS) Approach shown on figure 3.3(a). Notice that this extra edge does not increase the complexity of the solution, i.e., the solution is still represented by the same path in the tree as shown on figure 3.2(d). The resulting path is very similar to what we had in the Canadian traveler problem described earlier, except the start and target are both in the same point on the graph, i.e., the drone depot. However, in the Canadian traveler problem, each path in the tree representation from the root to the leaf was unique. In the situation with the drone depot, where the start and target are at the same place, we have redundancy of factor 2. This can be illustrated with the path shown on figure 3.3(a), where we always start in the root, but we can do the tour clockwise or counterclockwise. Both tours, clockwise and counterclockwise, are exactly the same when shown on the graph (and thus are equivalent solutions), but they are two different paths in the tree where we branch from the root in different directions and we visit totally different nodes of the tree, i.e., the paths have only the root in common. This can be counter intuitive, since the nodes contain exactly the same edges for both paths, but since they are on different branches of the tree, they are different nodes of the tree. (a) Explicit root (b) Implicit root Figure 3.3: Equivalent tours in the tree representation. However, redundancy of factor 2 does not influence the tree search algorithms greatly, and as described in the previous chapter, there are examples of Monte Carlo Tree Search applied on the Vehicle Routing Problems with a depot, similar to what is shown on figure 3.3(a). More importantly, the drone depot is not relevant to the problem and could be placed anywhere in the graph without changing the solution. In fact, as discussed in chapter 2, we can transform the solution to a tour representation as shown on figure 3.3(b), where the shown solution is equivalent with the one shown on figure 3.3(a). In terms of the tree representation, figure 3.2(d) shows the root explicitly, where figure figure 3.3(a) does not show the root of the tree, i.e., the root is implicit and could be placed on any of the green edges. In other words, we could start the tour at any pair of a blue and a red edge (node at level one in the tree) and follow the tour clockwise or counterclockwise. The green edges still represent the branches of the tree, where one of the green edges is implicit and does not show in the tree, i.e., it connects the leaf of the tree with the first node we visited starting from the root. The redundancy factor is then equal to the depth of 16
  • 25. 3.1. Existing work the tree d multiplied by two, i.e., d × 2. This can be nicely illustrated in the context of the Traveling Salesman Problem. One of the popular representations in the context of the genetic algorithms for that problem is the path representation. In that representation, we number the cities and represent the tour as a list containing the numbers representing the cities. The resulting tour is then obtained by going to the first city in the list, then to the next, etc., until the last city on the list is reached, where we go back to the first city on the list in order to complete the tour. In the terms of the drone delivering packages problem, this is illustrated on figure 3.4(a). We number the pairs of edges (a red and a blue edge with a common red point) that are a part of the solution with 1 to 3. Note that the same pair of edges does not necessary correspond with the same node in the tree. For example, a tour (1 2 3) in path representation is the same as the tour (2 3 1) or (3 2 1), etc., and are just different representations of the tour shown on figure 3.4(a). These are, however, different paths in the tree, and thus they visit different nodes of the tree. We can then notice that the path representation is a convenient way of notating paths in the tree and that both representations (tree and path) introduce the same redundancy in representing the same tour (e.g., the tour shown on figure 3.4(a)). We can thus find d × 2 (d is then the tree depth or the tour length) different path representations for each unique tour, i.e., we can start in any city and do the tour clockwise or counterclockwise. (a) First tour (b) Second tour Figure 3.4: Equivalent tours in the tree representation. At first, redundancy of d × 2 may seem very large, as we replace the original search-space with a new search-space that is d × 2 larger than the original one. In other words, the redundancy factor rises with the complexity of the problem what imposes limits on how complex problems we can solve. However, path representation is very widely used in the context of genetic algorithms and some very good algorithms [1] are known that use that representation. Nevertheless, as stated earlier, the green edges are not relevant to the optimal lineup problem. In other words, not all unique tours that we can find in the graph representation are unique solutions to the problem. For example, figure 3.4(b) shows a tour that is different from the tour shown on figure 3.4(a), yet, both tours represent exactly the same solution. 17
  • 26. 3. Monte Carlo Tree Search (MCTS) Approach This is further illustrated on figure 3.5 where two different example solutions are shown. The solution shown on figure 3.5(a) has than the edge pair 1 replaced with an different edge pair, not seen before, that gets the number 4 as shown on figure 3.5(b). The remaining question is then what is the redundancy factor for this type of problem introduced by the tree representation as described earlier? As described before, each path from the root of the tree to one of its leaves can be described using the path notation. Any path notation that exists of the same edge pairs (contains the same numbers) is then redundant. In other words, all permutations of one path represent exactly the same solution and are different paths in the tree. The path representation has the length equal to the depth of the tree, what brings the redundancy factor to d!. For the lineups optimization problem the depth of the tree is fixed and is equal to eleven. Thus, the redundancy introduced by the tree representation of lineups optimization is 11! = 39916800 (almost forty million). (a) First solution (b) Second solution Figure 3.5: Different solutions of the problem. In fact, the problem is severe enough such that the algorithm proposed by [31] can’t handle the complexity of the problem as described so far. When running the algorithm on the full problem, the algorithm crashes as it runs out of memory. The only case that it can handle is the simplified problem, where only 11 players are assignable (11 players in the club) to positions and only 11 positions are considered (fixed subset of all 14 positions on the field). Also, only one type of behavior is assignable, the normal behavior. All experiments presented in [31] are done on the simplified version, as the algorithm crashes on more complex variants. Remarkably, the complexity of the simplified problem is equal to the redundancy introduced by the tree representation of the problem. We have only one type of behavior we can assign, so this disappears from the problem, as each position gets the same default behavior (no red edges in the solution). We are left then with 11 players that we need to assign to 11 positions. When we fix the positions in a representation and we represent the solutions as permutations of the eleven players, i.e., first player from the permutation is assigned to the first fixed position, second player to the second position, etc., we have 11! different solutions to the problem. The depth of the tree is still equal to 11, so the redundancy factor remains at 11!. In other words, the original problem with complexity 11! = 3.9916800 × 107 gets 18
  • 27. 3.1. Existing work transformed to a new search-space by the tree representation with complexity of (11!)2 = 1.5933509 × 1015 (i.e., almost 1.6 quadrillion). This makes the problem exponentially harder by power of 2. All of algorithms described in [31] use the same tree representation of the problem. It might be that the tree search approach is not the best possible one for this type of a problem, and any decent attempt should tackle the redundancy problem first. Since improving the algorithms proposed by [31] is one of the goals of this thesis, an interesting solution for that problem is presented in the next section. Notice that all obvious tree representation for this problem require introduction of the green edges and that the exponential explosion of complexity is a inherent consequence to that. This makes then the application of a tree search algorithm on this problem an interesting challenge. However, before we move to the section with the proposed solution, some aspects of the Monte Carlo Tree Search algorithms need to be described. So far, only the tree representation of the problem has been discussed. It is then important to also discuss how the search itself is done. First important thing to notice, is that Monte Carlo Tree Search is an iterative algorithm. We start from the root and it is the only node present in the tree at that moment. Then we execute the steps of the algorithm iteratively until a stop criterion is reached. It could be, for example, that we have found a solution that is good enough, or that we have reached a maximum number of iteration, or that the time foreseen for execution has run out, etc. During the iterations we add the nodes to the tree as we perform the search. The goal of the algorithm is then to identify the nodes that lead to an optimal solution and to expand the tree in that direction. The algorithm tries to find a balance between the exploitation of high yielding paths and exploration in uncertain directions. The steps of the algorithm are then Selection, Expansion, Simulation and Back- propagation. Based on [31], the following definition of these steps can be given using the context of the earlier described tree representation of the given problem. In the Selection step the search tree is traversed from the root to a leaf. For each internal node in the descent, a child node is selected based on a specified selection method. This selection method is executed recursively until a leaf node k has been reached. We say that a node has been visited if it is selected during this step. This was illustrated on figures 3.2(c) and 3.2(d), where the leaf node is a node in level 11, i.e., after selection a complete path is formed and the algorithm can move to the Backpropagation step. If the leaf node is on intermediate level, the path is not complete and we move to the Expansion step. In the Expansion step all possible (valid in the context of the already built path) red and blue edge combinations with a common red point are computed for the leaf node k. A new node is added as a child node of k for each resulting combination. An alternative would be to only consider a subset of the possible combinations, but that alternative is not explored in the context of lineups optimization (neither in this work nor in the original work). Based on the selection method, a child node l is selected. If there are no child nodes, k is selected as l. The Expansion step was illustrated on figure 3.2(b) where only the red and blue edges are shown for one of the red points. The full expansion set considers all possible combinations (valid in 19
  • 28. 3. Monte Carlo Tree Search (MCTS) Approach the context of the already built path), but this would overload the figure. In the Simulation step the path from l is sequentially extended by adding random red and blue edge combinations with a common red point (containing only red and blue points not yet visited in the subpath) to it until a full path has been completed. The gradation of randomness is defined in the actual implementation of the algorithm. It is possible to base the probability of choosing an action on domain knowledge. A possible complete path after the Simulation step was illustrated on figure 3.2(d). In the Backpropagation step the fitness function is executed on the full solution (complete path obtained after the Selection and/or the Simulation step). The obtained fitness value is then propagated to all ancestors of l. The most used and most effective valuation method (i.e., assigning the values to the nodes as used in the selection step) is to choose the average of the fitness of all simulations results that were executed in a descendant node, it is then also used in all of the discussed algorithms. The whole process is illustrated on figure 3.6 in more general terms. The figure is borrowed from Chaslot et al. [10]. Figure 3.6: Outline of a Monte-Carlo Tree Search, Chaslot et al. [10]. The definitions as given above leave the choice of the selection and simulation strategies open. In the original work[31] many different strategies in combination with two kinds of fitness evaluation are proposed and experimented on. Therefore, we briefly look at the performed experiments in order to identify the best approach, that will be described in more detail in the remainder of this section. That approach will be than used as basis for improving on in the next section. For the explanation of the working of the other approaches and more details on the workings of the Monte Carlo Tree Search algorithms in general, the reader is referred to the original work [31]. Before looking at the results, first the used evaluation method is explained. As mentioned above, two variants of fitness evaluation are used: the nominal and the numeric. The nominal fitness evaluation is based on the nominal representation of the prediction (the black-box evaluation of the solution as described before) that 20
  • 29. 3.1. Existing work classifies a match as either a win or a loss. In the terms of fitness evaluation, we can say that the fitness is either good or bad, what can be represented in a form usable to algorithms as either 1 or 0. The numeric fitness evaluation is based on the numeric representation that uses the difference between the scores of the teams as outcome. Note that this representation can produce continues values, despite the fact that the score difference is always an integer, by means of regression embedded in the black-box evaluation. It is then directly usable as fitness function that produces continues values. Further, in the evaluation of the performance of the algorithm, the quality of the obtained solution is measured by comparing the obtained solution with a baseline solution. The baseline solution itself is obtained for each match using the greedy approach with VnukStats as the heuristic, where the assignments yielding highest increase of that heuristic are chosen progressively. The quality of the obtained solution is then measured as a percentage increase in the numerical fitness function (regardless of the used type of fitness function in the experiment, as the nominal fitness function is not very well suited for that purpose) of the obtained solution against the fitness value of the baseline solution with the following formula [31]: quality = fitnesssolution − fitnessbaseline |fitnessbaseline| The performance of the Monte Carlo Tree Search algorithms is then measured with the above described methodology every 1000 iterations of the algorithm. The solution path is obtained by descending the search tree from the root to a leaf node, each time selecting the child node with the highest node value. If the obtained path is not complete (the depth eleven is not reached), it is temporarily completed using the greedy completion method described for the baseline solution. The resulting performance is then averaged over 90 matches: all combinations of one of six own teams and one of fifteen opponents, where each match outcome is averaged over 15 runs of the algorithm, as the result of the Monte Carlo Tree Search is non- deterministic. Each execution is limited to 50000 iterations of the algorithm. Furthermore, the experiments explore two types of a simulation step: fully random simulation and semi-random simulation. In the simulation step, a path selected after the expansion step is sequentially expanded by random actions until a complete path has been reached. In Hattrick lineups, a complete path is a complete choice set of eleven assignments of football players to positions and individual instructions. The choice of these assignments in order to complete the path can be either fully random or semi-random. Fully random means that each possible assignment has an equal probability of being chosen. Semi-random means that the probability is proportional to a heuristic valuation of the assignment, i.e., the assignments are chosen with a roulette wheel selection method. The VnukStats (as described earlier) is used as the heuristic: the probability of choosing a set of assignments (a red and a blue edge with common red point) is proportional to the difference in VnukStats value between the original path and the extended path. VnukStats estimates then the strength of a (partial) path (lineup). 21
  • 30. 3. Monte Carlo Tree Search (MCTS) Approach Finally, different selection strategies are explored. The explored selection strate- gies are Objective Monte-Carlo (OMC) [9], OMC in one player context (OMC in one player context) [8] (a variant of the OMC strategy), Probability to be Better than Best Move (PBBM) [11] (another variant of OMC), Upper Confidence bound applied to Trees (UCT) [21] and UCB1-TUNED [15] (a variant of UCT). UCB1-TUNED was originally proposed for the multi armed bandit problem by Auer et al. [3]. The results of the experiments are shown on figure 3.7 for the numeric fitness evaluation, and on figure 3.8 for the nominal fitness evaluation. Figure 3.7: Experiments with the numeric fitness evaluation. Two types of simulation are explored: fully random (left) and semi-random (right), Verachtert [31]. Figure 3.8: Experiments with the nominal fitness evaluation. Two types of simulation are explored: fully random (left) and semi-random (right), Verachtert [31]. We can observe that the numeric fitness evaluation performs significantly better than the nominal fitness evaluation (i.e., good or bad fitness). The numeric fitness evaluation is then the only type of fitness evaluation further considered in this thesis. We can also notice that the semi-random simulation works better than full random 22
  • 31. 3.2. Proposed improvements simulation for this problem. Especially the UCB1-TUNED selection method with semi-random simulation (and numeric fitness evaluation) performs significantly better then any other tested algorithm. It is then the algorithm that will be considered for improvements in the next section. However, before we move to the next section, the used selection method (UCB1-TUNED) needs to be described in more detail first. The description method for the other selection strategies can be found in the original sources [9, 8, 11, 21, 15, 3], or in [31]. As mentioned earlier, UCB1-TUNED is a variant of Upper Confidence bound applied to Trees (UCT). In the context of the given problem, UCT [21] in its pure form selects nodes based on the average fitness of the evaluated paths passing by the current node and a bias term. The bias term balances exploration and exploitation by increasing the total value used in the selection step for the nodes that have been less explored. The selection value function is then: s(i) = µi + C × ln np ni where i is the node under consideration, p is the parent node of i, µi is the mean fitness value of the evaluated paths passing by the current node i, n is the number of times a given node was explored and C is a parameter that has to be determined by the user based on the problem. UCB1-TUNED [15] is then a variant of UCT that was originally proposed for the multi armed bandit problem by Auer et al. [3]. The adjusted selection value function in the context of this problem is defined in this strategy as follows: s(i) = µi + C × ln np ni × min 1 4 , Vi(ni) where Vi(ni) = σ2 i + 2 ln np ni In the formula above, σ2 i is then the variance of the fitness value of the evaluated paths passing by the current node i. In the selection step, the node with the highest value of the selection value function s(i) is selected. 3.2 Proposed improvements As discussed in the previous section, the tree representation exponentially increases the complexity of the problem by factor d!, where d is the depth of the tree, i.e., by the factor 11! in this case. Also, it is not obvious to define a tree representation for this problem where such increase in complexity can be avoided. Since the algorithms as described in the previous section need to search in exponentially expanded search-space, it might be useful to try the Random Search algorithm for comparison. 23
  • 32. 3. Monte Carlo Tree Search (MCTS) Approach The proposed Random Search algorithm is identical to the full random simulation step described in the previous section. In other words, we chose the first pair of a blue and a red edge with a common red point (assignment of a player to position and choice of an instruction) at random. This selection makes all other red and blue edges that have the already chosen red point in common not longer valid for further selection. From the remaining edges, we chose the next pair at random, etc., until a full selection is formed. We can notice that we can also illustrate this in a graph where we have green edges between two selections. However, the resulting selection is not bound to a tree or graph representation and the same selection could have been done by following a different path of the tree. All selections happen at random, what makes any of the paths we could follow that would result in the same selection equally probable, i.e., if we look at this in the reverse and we have a specific solution that the algorithm presents, any path that leads to that specific solution is equally probable. In other words, despite the fact that we can represent this process as traversing a path in a tree, we avoid the increase of the complexity introduced by the tree representation, as the resulting solution is not bound to such representation and could have been achieved by other means, e.g, by enumerating all possible solutions to the problem and choosing one at random. When we look at the alternative solution with enumerating all possible solutions, we can also notice that both approaches, fully random simulation step and enumer- ating the solutions, have equal probability of collisions, i.e., generating the same solution twice or more during the same run of the algorithm. Since all the paths are equally probable and we have the same number of paths for any solution, the choice of the same path twice or more is equally probable as the choice of the same solution twice or more from the pool of possible solutions. Thus, the algorithms are equivalent in terms of randomness of the generated solutions, however, the implemented Random Search has a tree representation as it uses the full random simulation step from the earlier described random search algorithms and can be viewed as a variant on the tree search algorithms, i.e., Random Tree Search. This is a very interesting insight as it motivates the possibility of using the Monte Carlo Tree Search algorithm in the context of lineups optimization. It is important to notice that Monte Carlo Tree Search algorithm is an incremental algorithm and starts with only the root node, while other nodes are created while being explored. Furthermore, unexplored children of a node have equal probability of being selected as they have the same value, what makes the probability of collision small when the problem is sufficiently large with a high branching factor like in the lineups optimization problem, and when the tree is not yet explored for a large part. The more we explore the tree, the more collisions will happen. However, if the algorithm converges quickly enough (i.e., we have a sufficiently large exploitation bias) to a good solution, it might still work very well even with the large redundancy factor. This could also partially explain why semi-random (proportional) simulation worked better in the experiments described in the previous section as it increases the exploitation bias. We will discuss the type of bias that is created by the semi-random selection strategy in a tree context later in this section, when the new Monte Carlo Tree 24
  • 33. 3.2. Proposed improvements Search algorithm is proposed. An other interesting insight is that when we run the Random Search for the equal number of iteration, it will always finish before the Monte Carlo Tree Search algorithm, as it does not have the overhead of creating nodes, executing selection steps, etc. In fact, the Random Search algorithm as described above needs around 4 seconds to execute 50000 iteration compared to 40 seconds for the Monte Carlo Tree Search on the same Linux machine (both algorithms use the same fitness evaluation function in Java, although the tested Random Search is implemented in Objective C). Since the algorithms are usually bound by the execution time, for a fair comparison an other version of the Random Search is used in the tests, next to the Random Search that uses the same number of iterations (we refer to this algorithm simply as Random Search), a Random Search that runs for approximately the same time on the same machine as the Monte Carlo Tree Search. We refer to the second Random Search algorithm as Random Search Tuned. Both Monte Carlo Tree Search algorithms, the UCB1-TUNED described in the previous section and the new version described later in this section, need around 40 seconds to complete 50000 iterations on the simplified problem as described in the previous section. The random search tuned is then set to 500000 iterations, what results in approximately the same running time. We can now move to the improvements to the Monte Carlo Tree Search algorithm in the context of the lineups optimization as proposed in this thesis. As discussed earlier, any tree representation in the context of this problem introduces the green edges. If we look at the possible paths in the context of the TSP, when we fix one of the cities in the root, we reduce the redundancy from d × 2 to 2. Thus, we can reduce the complexity of the problem by factor d. Therefore, for the lineups problem, we would reduce the redundancy from d! to (d − 1)! (d is always equal to 11 in the context of lineups optimization as only the green edges cause the redundancy, i.e., if we have the tree depth of 22, we still have 11 green edges), i.e., we would still have exponential increase of the redundancy. However, one interesting thing can be observed in the context of lineups optimization. We could, for example, fix the goalkeeper position in the root, as all strong lineups have a goalkeeper assigned. In this case, we would only have paths that include the goalkeeper, i.e., we reduce the pool of possible solutions. If we do not fix the goalkeeper, we must chose 11 positions out of 14 on the field and we have the following number of combinations: 14 11 = 14! 11!(14 − 11)! = 12 × 13 × 14 1 × 2 × 3 = 364 When we fix the goalkeeper position, we have the following number of combinations: 13 10 = 13! 10!(13 − 10)! = 11 × 12 × 13 1 × 2 × 3 = 286 In other words, we decrease the complexity by factor 11 × 364/286 = 15.64. This may seem not much, since the redundancy remains exponential, but we also do not consider solutions without the goalkeeper that potentially have much lower quality. In other words, we work with the average, i.e., the nodes that are unlucky and have 25
  • 34. 3. Monte Carlo Tree Search (MCTS) Approach a solution without the goalkeeper in their path would have their average decreased by that. Even if a given node can lead to good solutions, it will be explored less if its average is lowered by a bad solution. It is not the case in the simplified problem, as all solutions have a goalkeeper since the positions are fixed. Finally, we can notice that in the algorithms described in the previous section the nodes in level one of the tree contain all of the possible edges (red and blue) that can be used in a solution. In other words, the nodes directly connected with the root contain all possible combinations for the assignments that we can make. In that aspect, any node lower in the tree can only contain one of the edges already present in the tree and can be viewed as redundant. Therefore, the proposed algorithm builds a tree of level 1 and never grows larger. Since the tree is fixed after the expansion step from the root, the initialization is combined with the expansion step and there is only one expansion step in the proposed algorithm. The whole tree fits in memory and the algorithm can be used also in the problem with full complexity. The initialization/expansion step is shown on figure 3.9. Figure 3.9: Initialization/expansion step. As can be seen on figure 3.9, the root node is marked green. It can be visualized in the graph representation as, for example, the drone depot, i.e., a point in the graph that can be located anywhere. From the root we have then the green branches (edges) to the child nodes of the root that each contain one edge, either red or blue. Therefore, the level one nodes are marked red and blue, corresponding to the color of the edge they contain. The choice to separate the red edges from the blue edges reduces the complexity of the tree, as each edge is placed exactly once in the tree. If we do not separate the red and blue edges, we need to store each valid combination of the red and blue edge separately in the node, as it is the case in the algorithms described in the previous section. For example, in the lineups optimization problem we have 14 different positions on the field and 44 individual instructions in total (3.14 individual instructions per position on average). Thus we have 44 red nodes in the tree. If we take as an example a club with 22 players, we have 308 blue nodes in 26
  • 35. 3.2. Proposed improvements the tree. In total, there are 352 nodes in the tree plus the root node. If we do not separate the edges, we would have 968 nodes plus the root in the tree, i.e., 3.14 times more level one nodes. In other words, in the new tree representation as illustrated on figure 3.9, and in the context of lineup optimization, the full tree has 353 nodes in total and thus easily fits in memory. The next step in a Monte Carlo Tree search algorithm is the selection step. Since all the nodes that are a part of the solution are already present in the first level, selecting only one node would only be a part of the solution. Therefore, the algorithms selects all nodes that are a part of the solution at once. However, since Monte Carlo Tree search algorithm depends on randomness, hence the Monte Carlo name, simply selecting the nodes by maximal value would not produce sufficient randomness. The step that introduces the randomness is the simulation step that is not yet present in the algorithm. Therefore, in the proposed algorithm the selection and simulation steps are combined into one selection/simulation step. The used simulation is then semi-random simulation, similar to the semi-random simulation as described in the previous section, except that the probability of selection of a node is proportional to its value (the same UCB1-TUNED formula is used as described in the previous section) and not to the VnukStats heuristic valuation as used in the algorithms described in the previous section. The selection/simulation step is illustrated on figure 3.10. Figure 3.10: Selection/simulation step. The selection/simulation step works as follows. First a blue edge is selected from the pool of all blue edges that are a valid choice in the given moment. For example, if we put the constrain that a solution must contain a goalkeeper, we place the goalkeeper position at the root of the selection/simulation step and only blue edges that contain the goalkeeper position are valid for the selection starting from the root. The selection probability is then proportional to the node value containing the specific edge (i.e., in this case the selection method is the roulette wheel selection). Once a blue edge is selected, a red edge is selected that completes the red and blue 27
  • 36. 3. Monte Carlo Tree Search (MCTS) Approach edge pair, i.e., we select the red edge from a pool of valid red edges (containing the specific red point) at the given moment as only a red edge can be selected after a blue was selected. Also, only a blue edge can be selected after a red edge was select, and only a blue edge that does not contain a point already present in the solution can be considered. The constraints described above can be formalized as a localized constraint that is put on the pool of all possible edges, and only edges that are conform with the given constraint are considered for the selection. Then we repeat the steps described above until a full solution has been formed. This process can be seen as recursion with the selection of one edge (or node, i.e., we can replace edge with node in the pseudo code below for an equivalent result in a more general context) as the recursive step: SelectEdge ( allEdges , constraint , solution ){ selectableEdges = constraint . applyOn ( AllEdges ) ; selectedEdge = SelectionMethod ( selectableEdges ) ; solution . add ( selectedEdge ) ; constraint . updateAccordingTo ( solution ) ; i f ( solution . isComplete ()){ return ; } else { SelectEdge ( allEdges , constraint , solution ) ; } } The final step is the backpropagation step. This step is exactly the same as in the original algorithm and is illustrated on figure 3.11. We can then summarize the algo- rithm as beginning with the initialization/expansion step that places the whole tree in memory. Then we execute the selection/simulation followed by backpropagation steps iteratively until the algorithm stops according to a stop criterion. Figure 3.11: Backpropagation step. 28
  • 37. 3.2. Proposed improvements Because of the recursive selection/simulation step, the proposed algorithm is called Recursive Monte Carlo Tree Search (RMCTS). We can notice that this algorithm is then usable in broader context, as it can be used for example for solving the Traveling Salesmen Problem. In the TSP context, we would start a tour from a fixed city. It is then the root constraint that is passed in the call to the recursive selection/simulation step. In that step we can then select any edge that contains the start city. Once the edge is selected it contains another city that becomes the new start point for the following selection. The constraint is then updated with that city and with an additional constraint stating that edges that contain any of the cities already present in the solution are not allowed. We repeat the selection recursively until a full tour is formed. Also here the nodes of the tree contain all possible edges and the tree depth is one, i.e., the whole algorithm remains the same except for the constraint mechanism that is problem specific. Therefore, as stated in previous chapters, the algorithms proposed in this thesis are inspired on the Traveling Salesman Problem, including the Recursive Monte Carlo Tree Search. In the context of the lineup optimization, we can then notice that we indeed form a complete route in the selection/simulation step. In other words, the green edges are also passed in that route and that the selection/simulation step has identical graph representation (and thus tree representation) as the algorithms described in the previous section. This creates the same type of tree bias in all algorithms. One type of bias is that not all solutions are equally probable, as we use proportional selection, and thus we have exploitation effect. The other part is when we look at the obtained solution, we can notice that not all paths in the tree are equally probable. Since the simulation is recursive and all possible edges are carried with it (the edges are preselected according to a localized constraint), edges with a high value are more likely selected closer to the root. Closer to the leaf the constraint eliminates more edges, and the selection is done on a smaller pool that possibly does not contain the strongest edges anymore. This kind of tree bias (some paths are more probable than other) makes the algorithms described in the previous section perform better as this helps to cope with the redundancy. However, this has also consequence on the probabilities of the solutions from the pool of solutions for being selected, as the edges with highest value tend to be selected first, i.e., we create a type of greedy bias. The greedy (tree) bias described above is mitigated in the RMCTS by the tree representation as described above (tree of depth one). Thus, the solution obtained with the selection/simulation step is viewed as a solution in a pool of all possible solutions (the redundancy of d! is removed from the representation, in fact, all possible redundancy is removed) and all nodes used in a solution get updated with the same value, what does not leave any trace of how the particular solution was obtained. Nevertheless, we can identify an other risk with this approach. We can find by chance quite good solution early in the run of the algorithm and it can dominate the selection process, i.e., we could have the snowball effect. Especially the used roulette wheel selection method is known for having high selection pressure and can cause a snowball effect. It is then interesting to investigate lowering of the selection pressure. The proposed approach is using the K-Tournament selection method with 29
  • 38. 3. Monte Carlo Tree Search (MCTS) Approach K set to 2. In this method we chose two nodes at random and we select the one with higher value. We can notice that this approach not only reduces the selection pressure, i.e., increases the exploration, but it is also much more efficient. Where the roulette wheel selection requires computing the sum of all values, it has linear time complexity (pro- portional to the number of nodes being considered for selection), the K-Tournament can be executed in constant time, regardless of the size of the tree. Also, since this increases the exploration and decreases the exploitation, potentially this method requires more iterations to converge to a good solution. For a fair comparison, this method is then set to the number of iterations that can be executed in the same time as other methods, i.e., we stick to the 40 seconds limit and we can set the number of iterations to 150000 for this method. We call this algorithm the Recursive Monte Carlo Tree Search Tuned (RMCTS-Tuned). Finally, we need to define how a solution from a single run of an algorithm is selected. We could follow the definition from previous section and select the nodes having the highest value to form the final solution. However, this is an optimization problem, and we are interested in the best solution found. Therefore, the algorithms discussed in this section (Random Search (RS), Random Search Tuned (RS-Tuned), RMCTS and RMCTS-Tuned) return the best solution found during the run of the algorithm. 3.3 Comparison of the MCTS algorithms for Hattrick Before we start with the experiments, it is worth to briefly discuss the tuning of the C parameter as used in the formulas given in section 3.1. As stated in section 3.1 this parameter is problem specific and can be tuned empirically (using experimentation) for a given problem. In the context of lineups optimization, each match is a different problem, as the average fitness value and the variance would be different for each match. However, even simple tuning gives huge advantage to the Random Search Tuned algorithm, as it can potentially scan 500000 solutions with each run of the Monte Carlo Tree Search algorithm. For example, if we would try only 8 different values for this parameters and run 10 experiments, 80 experiments in total, Random Search Tuned can scan 40 million solutions in the corresponding time. This is also the complexity of the simplified problem as discussed before. Therefore, at that point it is better to replace the Random Search with Complete Search that would give globally optimal solution for the given simplified problem. This is not the case for the full problem, as the complexity is much larger and the computation of the exact solution is not feasible in an acceptable computation time. Nevertheless, the value used for the parameter C in the experiments described in this section is set to 1 in all experiments, the same value as used in the experiments described in section 3.1. This also guarantees a fair comparison between the algo- rithms. On the other hand, the value 1 for the C parameter might seem unfair for the UCT algorithm described in section 3.1. We can notice that when the variance 30
  • 39. 3.3. Comparison of the MCTS algorithms for Hattrick of the fitness is large for a given node in the UCB1-TUNED algorithm, the term min 1 4 , Vi(ni) would often evaluate to 1/4 since Vi(ni) = σ2 i + 2 ln np ni Therefore, setting the value of C to 1/2 might boost the performance of the UCT variant. However, this remains out of the scope of this thesis, as we focus only on the UCB1-TUNED algorithm, the best algorithm from the experiments described in section 3.1. Furthermore, each algorithm is run 100 times in the experiments described in this section. This includes the Random Search Tuned algorithm. It has then a very good probability of finding the exact solution for the given simplified problem. Even more so knowing that the optimal solutions might not be unique in the context of the Hattrick problem. This was observed in the experimentation where the algorithms often return 5 or more unique solutions that have the same fitness value, what makes in its turn the probability of having more than one globally optimal solution quite large. We start then with the simplified problem as described before, such that we can compare the algorithms proposed in the previous section (RS, RS-Tuned, RMCTS and RMCTS-Tuned) with the UCB1-TUNED algorithm described in section 3.1. We can then also use the quality measure as described in that section for comparing the results. However, instead of aggregating the results over many different matches, we focus only on three different cases: one match against a weak opponent, one against a strong opponent and one against a similar strength opponent. We can then analyze how the specific algorithms perform in the specific situations. We could for example detect when a specific algorithm would benefit from tuning, etc. The first case that we consider is the match against a similar opponent. For that purpose, we select one of the matches from the experiments described in section 3.1 that has the baseline solution with fitness value close to 0. The selected match has a baseline solution with fitness 0.26 and thus qualifies as a match against a similar opponent, i.e., the score difference (fitness value) is close to zero. This is then a very important case as increasing the score difference (i.e., the fitness value) would make the probability of winning higher [31]. The box-plots of the results are shown on figure 3.12. The results are then summarized in table 3.1. The RMCTS-Tuned is clearly the best algorithm in this test and produces quality of 535%. We can also notice that the Random Search Tuned (RS-Tuned) has larger mean fitness (and thus quality) than the UCB1-TUNED. However, the difference is small and is not significant statistically. The p-value from a two sided t-test with 95% confidence is 0.2243, hence there is no significant difference between the mean fitness of the RS-Tuned and the UCB1-Tuned. The RMCTS has better mean fitness than the Random Search Tuned (p-value is 1.639 × 10−12) and better mean fitness 31
  • 40. 3. Monte Carlo Tree Search (MCTS) Approach Figure 3.12: Box-plots of the results against similar opponent, simplified problem. Algorithm mean variance quality (%) Random Search (RS) 1.428983 0.0148 448.10 Random Search Tuned (RS-Tuned) 1.585576 0.0012 508.16 UCB1-TUNED 1.576269 0.0045 504.59 RMCTS 1.618149 0.0006 520.66 RMCTS-Tuned 1.656401 0.0001 535.33 Table 3.1: Experiment results against similar opponent, simplified problem. than the UCB1-TUNED (p-value is 4.803 × 10−8). The best algorithm from this test is RMCTS-Tuned with significantly better mean fitness value from the RMCTS (p-value is lower than 2.2 × 10−16 and cannot be computed exactly). We can notice that we have obtained vary large quality in this test. The average quality of the UCB1-Tuned was 70% in the tests described in section 3.1, while in this test (it is only one of the matches considered in the experiments explained in section 3.1) we get almost 505% for this method. This is easily explained by the fact that the baseline fitness is close to zero. As explained earlier, this is an important case, as the higher predicted score difference makes the probability of winning higher and thus pulls the average quality of obtained solutions higher. In other words, we can expect the quality of the solutions to drop significantly when we run experiments against strong or weak opponents, even if the absolute value of the difference between the baseline fitness and the fitness of the obtained solution is higher as in this experiment (around one goal difference). We can now move to the case against a strong opponent. We chose that one of the matches with low baseline fitness. The chosen match has −3.465714 baseline fitness. This is also an important case, as when we obtain sufficiently high quality of the solution, our team might win the match. In order to reach 0 fitness, i.e., zero score difference, we need 100% quality. In other words, we need at least 100% quality 32
  • 41. 3.3. Comparison of the MCTS algorithms for Hattrick to win the match. The box-plots of the results from that experiment are shown on figure 3.13. The results are then summarized in table 3.2. Figure 3.13: Box-plots of the results against strong opponent, simplified problem. Algorithm mean variance quality (%) Random Search (RS) −2.241449 0.0074 35.33 Random Search Tuned (RS-Tuned) −2.128714 0.0009 38.58 UCB1-TUNED −2.108591 0.0034 39.16 RMCTS −2.099439 0.0004 39.42 RMCTS-Tuned −2.086433 0 39.80 Table 3.2: Experiment results against strong opponent, simplified problem. From the results we can notice that this time the UCB1-TUNED has better quality than the Random Search Tuned. In fact, the mean fitness value is statistically significantly different for these algorithms (p-value is 0.002817). Furthermore, it performs as good as the RMCTS algorithm, as there is no significant difference between the mean fitness value for these algorithms (p-value is 0.1419). Nevertheless, the RMCTS-Tuned is again best algorithm and it has the mean value significantly different from the UCB1-TUNED (p-value is 0.0002779) and from the RMCTS (p- value is 1.293 × 10−9). Interestingly, the RMCTS-Tuned algorithm returned solution with the same fitness value (−2.086433) in each of its 100 runs and has 0 variance. We can also notice, as expected, that the quality of the obtained solutions is much lower than in the previous test. Also here we have around one goal difference in the score difference prediction, but the quality remains below 40%, even for the RMCTS-Tuned algorithm that seems to have returned globally optimal solution in each run. We can now move to the last test with the simplified problem, the match against a weak opponent. This time the baseline solution for the chosen match has fitness value 4.684158. It is quite large value, so we expect only very small gain in the 33
  • 42. 3. Monte Carlo Tree Search (MCTS) Approach quality of the solutions. The box-plots of the results from that experiment are shown on figure 3.14. The results are then summarized in table 3.3. Figure 3.14: Box-plots of the results against weak opponent, simplified problem. Algorithm mean variance quality (%) Random Search (RS) 5.816697 0.0033 24.18 Random Search Tuned (RS-Tuned) 5.906818 0.0009 26.10 UCB1-TUNED 5.790016 0.0002 23.61 RMCTS 5.939261 0.0004 26.79 RMCTS-Tuned 5.962179 0.00003 27.28 Table 3.3: Experiment results against weak opponent, simplified problem. Also from this test the RMCTS-Tuned is the best algorithm. It is significantly better than the runner-up RMCTS (p-value is lower than 2.2 × 10−16). We can also notice the strange behavior of the UCB1-TUNED. It is worse than Random Search in this test (p-value is 1.394 × 10−5). It seems to converge to a specific solution and it is the only algorithm in this test that did not find the optimal value as other algorithms did. Nevertheless, the difference between the quality is low between all algorithms. We can also investigate the average quality of the solutions generated by the algorithms in the three tests. The qualities are shown in table 3.4. We can notice that the Random Search produces the lowest quality solutions with around 20% lower quality than the UCB1-TUNED. Nevertheless, it remains quite close to the other algorithms ad might end up quite high in the tests described in section 3.1, among the other algorithms tested in this section. We can also notice that Random Search Tuned has higher quality by almost 2% than the UCB1-TUNED. However, as seen in the separate tests, this difference might not be statistically significant (it was not significant in one case and significant in two cases, where 34